linux-ha

Oct 17 2007

Virtual Machine Replication

I don't know on which planet I have been for the past couple of years , days or hours but since when do
VMware’s Vmotion, XenSource’s Xenmotion or Virtual Iron’s Virtual Iron support Replication ?

Live Migration yes, but Replication , No.

I discussed this kind of technologies with Mark and Vincent , Moshe and others already a zillion times.. Continuously mirroring or realtime replication of a virtual machine is really difficult to do. And I haven't heard from a working scalable solution yet .. (Shared Memory issues such as we had with openMosix still are amongst the issue to be tackled)

Live Replication would mean that you mirror the full state of your virtual machine realtime to another running virtual machine. Every piece of disk/memory and screen you are using has to be replicated to the other side of the wire realtime. Yes you can take snapshots of filesystems and checkpoints of virtual machines. But continuous checkpointing over the network , I'd love to see that.. (outside of a lab)

So with a promise like that .. our good friends the CIO will be dreaming and the vendors will be blamed for not delivering what was promised to them.

But on the subject of using just Live Migration features as an alternative for a real High Availability solution , I know different vendors are singing this song, but it's a bad one.

Using Live migration in your infrastructure will give you the opportunity to move your applications away from a bad behaving machine when you notice it starts behaving badly, hence giving you a better overall uptime. If however you don't notice the machine is failing, or if it just suddenly stops working, or if your application crashes you are out of luck.
Live migration won't work anymore since you are to late, you can't migrate a machine that's dead. The only thing you can do is quickly redeploy your virtual machine on another node, which for me doesn't really qualify as a Clustered or HA solution.

Real HA looks at all the aspects of an application, the state of the application, the state of the server it is running on and the state of the network it is connected to. It has an alternative ready if any of these aspects fail. Session data is replicated, data storage is done redundantly and your network has multiple paths. If your monitoring decides something went wrong another alternative should take over with no visible interruption for the end user. You don't have to wait till your application is restarted on the other side of the network, you don't have to wait till your virtual machine is rebooted, your filesystems are rechecked and your database has recovered no it happens right away .

But Virtual Machine Replication as an alternative for HA ? I'd call that wishfull thinking and vapourware today