Fixing random disk errors with Ubuntu 12.04 Precise on XenServer
Ubuntu 12.04 LTS Precise is hot off the press right now. Over the past few days, have been working on building new base images for PaaS.io, but was running into random issues where the root parition would encounter an error and freeze up. It normally happened just after it would finish booting and about 50% of the time.
A few times it happened after I already logged in, so was able to do basic read
only operations. In
dmesg, was able to see the following:
[ 6.748868] blkfront: barrier: empty write xvda op failed [ 6.748876] blkfront: xvda: barrier or flush: disabled [ 6.748890] end_request: I/O error, dev xvda, sector 6584768 [ 6.748908] end_request: I/O error, dev xvda, sector 6584768 [ 6.748943] Aborting journal on device xvda6-8. [ 6.767022] EXT4-fs error (device xvda6): ext4_journal_start_sb:327: Detected aborted journal [ 6.767046] EXT4-fs (xvda6): Remounting filesystem read-only
Or if you weren’t able to log in first, you might see this in the XenCenter console:
After some Googling, was able to track a similar error down by someone else. The fix was to update the mount options for the root partition. Mine are now:
The key is the
barrier=0. From some documentation, it is an option to help increase the
integrity of writes by ensuring everything is flushed to disk be committing to the
journal. However sometimes in a virtualized environment that is difficult to
guarantee. In my case, have disk->RAID->dom0->LVM->domU.
Figure many other people will be diving into Precise this weekend, potentially running into this issue like me.
Soon, will post some additional details about how to easily get a nice Precise template in XenServer 6. I’ve been getting my setup nicely tuned using a kickstart script for the base system and leveraging xenstore data to dynamically setup the IP and hostname on boot.