Le blog des salariés

Aller au contenu | Aller au menu | Aller à la recherche

vendredi, septembre 27 2013

When CEPH comes to rescue

Ceph

Here @Easter-eggs[1], like others, we start playing with the awesome CEPH[2] distributed object storage. Our current use of it is the hosting of virtual machines disks.

Our first cluster was just installed this week on tuesday. Some non production virtual machines where installed on it and the whole cluster added to our monitoring systems.

Cluster

On thursday evening, one of the cluster nodes went down due to cpu overhead (to be investigated, looks like a fan problem).

Monitoring systems send us alerts as usual, and we discovered that CEPH just did the job :) :

  • the server lost was detected by other nodes
  • CEPH started to replicate pgs between other nodes to maintain our replication level (this introduced a bit of load on virtual machines during the sync)
  • virtual machines that were running on the dead node were not alive anymore, but we just add to manually start them on another node (pacemaker is going to be setup on this cluster to manage this automagically)

On friday morning, we repaired the dead server, and boot it again:

  • the server automatically joined the CEPH cluster again
  • osd on this server were added automatically in the cluster
  • replication started to get an optimal replication state

Incident closed!

What to say else?

  • thanks to CEPH and the principle of server redundancy to let us sleep in our home instead of working a night in the datacenter
  • thanks to CEPH for being so magical
  • let's start the next step: configure pacemaker for automatic virtual machines failover on cluster nodes

Notes

[1] http://www.easter-eggs.com/

[2] http://ceph.com/

mardi, septembre 24 2013

[Libvirt] Migrating from on disk raw images to RBD storage

Ceph

As we just configured our first CEPH[1] cluster, we needed to move our current virtual machines (using raw images stored on standard filesystem) so they use the RBD block device provided by CEPH.

We use Libvirt[2] and Kvm[3] to manage our virtual machines.

Libvirt

Migration with virtual machine downtime




This step can be done offline:

  • stop the virtual machine
 virsh shutdown vmfoo
  • convert the image to rbd
 qemu-img convert -O rbd /var/lib/libvirt/images/vmfoo.img rbd:libvirt-pool/vmfoo
  • update the VM configuration file
 virsh edit vmfoo
 <disk type='file' device='disk'>
   <driver name='qemu' type='raw' cache='none'/>
   <source file='/var/lib/libvirt/images/vmfoo.img'/>
   <target dev='vda' bus='virtio'/>
   <alias name='virtio-disk0'/>
   <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
 </disk>

devient:

 <disk type='network' device='disk'>
   <driver name='qemu'/>
   <auth username='libvirt'>
     <secret type='ceph' uuid='sec-ret-uu-id'/>
   </auth>
   <source protocol='rbd' name='libvirt-pool/vmfoo'>
     <host name='10.0.0.1' port='6789'/>
     <host name='10.0.0.2' port='6789'/>
     ...
   </source>
   <target dev='vda' bus='virtio'/>
   <alias name='virtio-disk0'/>
   <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
 </disk>
  • restart the virtual machine
virsh start vmfoo

Migration without downtime

The trick here is to use migration support in libvirt/kvm and the ability to provide a different xml definition for the target virtual machine:

  • get the current vm disk informations
 qemu-img info /var/lib/libvirt/images/vmfoo.img
  • create an empty rbd of the same size
 qemu-img create -f rbd rbd:libvirt-pool/vmfoo XXG
  • get the current vm configuration
 virsh dumpxml vmfoo > vmfoo.xml
  • edit this configuration to replace the on disk image by the rbd one
 <disk type='file' device='disk'>
   <driver name='qemu' type='raw' cache='none'/>
   <source file='/var/lib/libvirt/images/vmfoo.img'/>
   <target dev='vda' bus='virtio'/>
   <alias name='virtio-disk0'/>
   <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
 </disk>

devient:

 <disk type='network' device='disk'>
   <driver name='qemu'/>
   <auth username='libvirt'>
     <secret type='ceph' uuid='sec-ret-uu-id'/>
   </auth>
   <source protocol='rbd' name='libvirt-pool/vmfoo'>
     <host name='10.0.0.1' port='6789'/>
     <host name='10.0.0.2' port='6789'/>
     ...
   </source>
   <target dev='vda' bus='virtio'/>
   <alias name='virtio-disk0'/>
   <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
 </disk>
  • start the migration process
virsh migrate --live --persistent --copy-storage-all --verbose --xml vmfoo.xml vmfoo qemu+ssh://target_node/system
  • wait until the process finished. The time to wait depends of your cluster performances and your VM size, but there is no interruption of the virtual machine!
  • you're done, your virtual machine is now running over rbd and once checked you can safelly archive or destroy your old disk image.

Notes:

  • of course you have to use libvirt/kvm with rbd support on target node
  • you have to use a recent version of kvm, we had memory exhaustion problems on the hypervisor during the migration process with debian wheezy version

Notes

[1] http://ceph.com/

[2] http://libvirt.org/

[3] http://www.linux-kvm.org/