SWY's technical notes

Relevant mostly to OS X admins

Alternate way to deal with XenServer storage issues

I’ve administered a small pool of 2 physical XenServers with shared storage for my employer’s Windows and Linux virtual servers for a few years.  One issue that has come up under both XenServer 5.6 and 6.x is a failure for XS to properly remove the disk files when a snapshot is deleted.  This can be an issue when using a VM backup tool such as PHDVirtual, where the backup server is another VM, which on a schedule has the hypervisor

  1. Take a snapshot of another running VM
  2. Attach that snap to the backup VM
  3. Read that drive, back up to the configured storage
  4. detach and delete the snap

Result is a buildup of undeleted snapshot files, taking up storage space on your SR.  One way to confirm if this is happening is to execute the following line on the XenServer console:

xe vdi-list is-a-snapshot=true | grep name-label | sort

If you don’t expect to see any snapshots, seeing them listed here is an issue.

02847365c4067b1afd46f2cfe684292d

These shouldn’t be here.

Citrix offers a command line tool to address this, outlined in Knowledgebase document CTX123400. This is called an Offline Coalesce, and is formatted as

xe host-call-plugin host-uuid=<UUID of the pool master Host> plugin=coalesce-leaf fn=leaf-coalesce args:vm_uuid=<uuid of the VM you want to coalesce>

Earlier this week, I was using the above in hopes to address the large discrepancies in my SR used vs allocated values from the not deleted, but not seen in XenCenter snapshots seen above:

146d77cf1b7262bd8b13cbcc46ea21a8

Problem is, it didn’t work.  Tailing /var/log/SMlog, the standout clue was “no space to coalesce”.  So great… a bug in snapshots takes up all your space, and when you try to use the tool to fix it, it can’t because there’s no space.

A day later, this became quite concerning.  I don’t know what Xen will do when an SR hits 100% capacity, but I doubt I’ll enjoy it.

8629fac8974e8652861acfd9f61b35cb

I decided to take a gamble and see what happens if I move a disk to a different SR.  Lacking Storage XenMotion, I shut down a non-critical VM- the Windev2 machine listed above, detached the C: drive, and asked Xen to move it to a different SR.  It took much longer than expected (about 30 minutes for 48 gigs), but in the end, I regained much more than 48 gigs of storage on my SR:

a7c2fe1bf586438e568b3f90b57fd721

557.6- 281.9= 275 gigs of wasted snapshot space due to XenServer bugs, restored by moving the disk to a different SR.  My nightly PHDVirtual backups were now able to take a snapshot last night and perform a proper backup.

Lessons learned:

  • When leaf-coalesece fails, moving a disk to different storage can clean up the wasted storage space
  • If I had to do it all over, I think I’d go VMWare. XenServer has had problems with snapshot management for a long time, and they still haven’t figured it out.

5 responses to “Alternate way to deal with XenServer storage issues

  1. EJ June 20, 2013 at 10:45 am

    I had this same problem in my pool of 5 xenservers. Have not yet tested the leaf-coalesece method, but I’m glad you shared a different solution in case that doesn’t work for me.

  2. syuroff June 20, 2013 at 11:29 am

    My advice is to start with the leaf-colaesce first- and be patient. I’m now experiencing the same SR_BACKEND_FAILURE_44 because a different VM has a 17 long snapshot chain. I started the leaf-coalesce over 2 hours ago, and it’s still chugging along…. use
    tail -f /var/log/SMlog
    to monitor. I’m keeping my fingers crossed that entries including “Num combined blocks” and “SUCCESS” are in my interest, and that if I just sit tight, it will get the job done. I have successfully leaf-coalesced before, so it can work, but it seems to fail if there isn’t much space left on the SR.

  3. syuroff June 25, 2013 at 8:29 am

    A little more… the last leaf-coalesce I did ran for around 20 hours, and didn’t work. I have a ticket open with Citrix, who says their official advice is to export the VM, then import it again, but that my method of detach, move, attach *should* work, but sometimes won’t.

  4. Mark Stoopman March 16, 2014 at 3:33 pm

    IF you have to perform an export and import, please note that this wil take much more time than moving virtual disks. Xensever cannot use more than 10% of the networkspeed of your management nic. If you have a 1 Gbit NIC, exporting AND importing will be processed at the stunning speed of only 10MBit/s!! Nice work Citrix!

    • syuroff March 17, 2014 at 5:55 am

      I’ve not heard of the 10% limitation, so I’m not going to confirm or deny that. But I’ll presume it accurate, and then say that if there’s any way you can move between Storage Repositories vs Export/Import, that’s what you’ll want to do.

Leave a reply to syuroff Cancel reply