gfs

Jul 01 2009

DRBD2, OCFS2, Unexplained crashes

I was trying to setup a dual-primary DRBD environment, with a shared disk with either OCFS2 or GFS. The environment is a Centos 5.3 with DRBD82 (but also tried with DRBD83 from testing) .

Setting up a single primary disk and running bonnie++ on it worked Setting up a dual-primary disk, only mounting it on one node (ext3) and running bonnie++ worked

When setting up ocfs2 on the /dev/drbd0 disk and mounting it on both nodes, basic functionality seemed in place but usually less than 5-10 minutes after I start bonnie++ as a test on one of the nodes , both nodes power cycle with no errors in the logfiles, just a crash.

When at the console at the time of crash it looks like a disk IO (you can type , but actions happen) block happens then a reboot, no panics, no oops , nothing. ( sysctl panic values set to timeouts etc )
Setting up a dual-primary disk , with ocfs2 only mounting it on one node and starting bonnie++ causes only that node to crash.

On DRBD level I got the following error when that node disappears

  1. drbd0: PingAck did not arrive in time.
  2. drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure )
  3. pdsk(UpToDate -> DUnknown )
  4. drbd0: asender terminated
  5. drbd0: Terminating asender thread

That however is an expected error because of the reboot.

At first I assumed OCFS2 to be the root of this problem ..so I moved forward and setup an ISCSI target on a 3rd node, and used that device with the same OCFS2 setup. There no crashes occured and bonnie++ flawlessly completed it test run.

So my attention went back to the combination of DRBD and OCFS
I tried both DRBD 8.2 drbd82-8.2.6-1.el5.centos kmod-drbd82-8.2.6-2 and the 83 variant from Centos Testing

At first I was trying with the ocfs2 1.4.1-1.el5.i386.rpm verson but upgrading to 1.4.2-1.el5.i386.rpm didn't change the behaviour

Both the DRBD as the OCFS mailinglist were fairly supportive pointing me out that it was probably OCFS2 fencing both hosts after missing the heartbeat, and suggested increasing the deathtimetimeout values.

I however wanted to confirm that. As I got no entries in syslog I attached a Cyclades err Avocent Terminal server to the device in the hope that I'd capture the last kernel messsages there ... no such luck either.

On the OCFS2 mailinlist people pointed out that i'd use netconsole to catch the logs on a remote node
I set up netconsole using

  1. modprobe netconsole netconsole="@/,@172.16.32.1/"
  2. sysctl -w kernel.printk="7 4 1 7"

After which indeed I catched error on my remote host..

  1. [base-root@CCMT-A ~]# nc -l -u -p 6666
  2. (8,0):o2hb_write_timeout:166 ERROR: Heartbeat write timeout to device
  3. drbd0 after 478000 milliseconds
  4. (8,0):o2hb_stop_all_regions:1873 ERROR: stopping heartbeat on all active
  5. regions.
  6. ocfs2 is very sorry to be fencing this system by restarting

One'd think that it output over Serial console before it log over the network :) It doesn't .

Next step is that I`ll start fiddling some more with the timeout values :) (note the ":)")

Sep 14 2007

On the Future of Lustre


So Sun bought ClusterFS
. I`m wondering what their focus will be now. What will be the prime platform on which Lustre will be developed Solaris or Linux ? Will other efforts in the open source cluster filesystem area react on this ? Will Lustre development speed up ? Will management become less complex ?
Time will tell .. I`m keeping an eye on it

Sep 04 2007

LinuxConference Europe 2007 2/X

Sunday evening was the conference dinner, someone tought he was really funny to have us all walk about 3Km more than we needed to. The instructions on the back of our entrance tickets gave us a full tour of the Cambridge suburbs, I should have followed my guts.. not the people trying to read the instructions, would have saved us half an our at least .
Luckily we took the short way back. Dinner was typically english .. nuff said :)

So monday started out with a whole bunch of sessions related to filesystems and storage.
Bryn M Reeves gave a really good intro to LVM , then Jan Blunck took over and started talking about how to scale the Device Mapper snap shot solution. I tried to see Dag's talk on dstat but I`ll have to try again at T-Dose as I missed the largest part of the talk due to some phone calls :(
Next up was Olaf "thank god I`m not doing nfs anymore" Kirch (who also listens when you just shout Lars in the streets of Cambridge) who introduced us to iSNS.

So after lunch the filesystem track continued with Steven Whitehouse talking about VFS and cluster filesystems , Jorn talking to us about the future of Flashdisks ant their appropriate filesystems, and Chris Mason from Oracle finishing of with a talk on Btrfs.. pronounced "ButterFS"

There were 2 different busses to the Duxford air museum which was a bit of a pity since both groups didn't really meet eachother, so it wasn't really a social event where you could chat and meed with everybody at the conference.