linux-ha

Feb 10 2011

Ensure Running

Has anyone noticed that pretty much every puppet module one finds on the internet by default enables the service they try to configure in the module

When looking at it from a single machine point of view it makes sense to include the module , have it configure your service and directly enable it by default.

So I started wondering .. isn't there anybody out there who is building clusters ? Where services have to configured on multiple nodes but should NOT be running acitvely on all nodes by default because there is an external tool which manages that for you (Pacemaker framework eg.)

Agreed it's a small patch to get the functionality you want , but it brings an extra overhead when one upgrades the modules etc.

So if it doesn't bother you please split your puppet module in 2 parts.. one you call to configure the service, another which you call to enable the service , if you want to.

thnx !

Jan 24 2011

Upcoming 2011 Speaking Engagements

Lenz gave the good example so I`ll follow :)

Next weekend saturday I`ll be giving a talk about devops at StartUp Weekend Brussels, from what I've read so far it promises to be an audience that needs the talk,

The week after I`ll be speaking at the DrupalDevDays, again about devops , however this time with a touch of Drupal , giving a devops talk at Devoxx last year to a Java audience learned me that the devops evangelist need to go outside of their usual conference audiences and als talk to the people that are usually in the other silos.

Next march I`ll be speaking at the UKUUG spring conference in Leeds this time about my experiences on High Availability with Pacemaker

And who knows I might squeeze in a talk at Load this year also ..

If you are around at one of these confs and you want to talk Devops, Clustering, sipx or just have a beer .. don't hesitate ! There's already plenty of people promising me beers , and some even sushi :)

Nov 04 2010

High Availability MySQL Cookbook , the review

When I read on the internetz that Alex Davies was about the publish a Packt book on MySQL HA I pinged my contacts at Packt and suggested that I'd review the book .

I've ran into Alex at some UKUUG conferences before and he's got a solid background on MySQL Cluster and other HA alternatives so I was looking forward to reading the book.

Alex starts of with a couple of indepth chapters on MySQL Cluster, he does mention that it's not a fit for all problems, but I'd hoped he did it a bit more prominently ... an upfront chapter outlining the different approaches and when which approach is a match could have been better. The avid reader now might be 80 pages into MySQL cluster before he realizes it's not going to be a match for his problem.

I really loved the part where Alex correcly mentions that you should probably be using Puppet or so to manage the config files of your environment, rather than scp them around your different boxes ..

Alex then goes on to describe setting up MySQL replication and Multi Master replication with the different approaches one can take here, he gives some nice tips on using LVM to reduce the downtime of your MySQL when having to transfer the dataset of an already existing MySQL setup, good stuff.

He then goes on to describe MySQL with shared storage ... if you only mount your redundant sandisk once on your MySQL nodes my preference would probably be a Pacemaker stack rather than a RedHat Cluster based setup, but his setup seems to work too. Alex quickly touches on using GFS to have your data disk mounted simultaneously on both nodes (keep in mind with only 1 active MySQLd) and then goes on to describe a full DRBD based MySQL HA setup

The last chapter titled Performance tuning gives some very nice tips on both tuning your regular storage, as your
GFS setup but also the tuning parameters for MySQL Cluster

I was also really happy to see the Appendixes on the basic installation where he advocates the use of Cobbler , Kickstart and LVM ..

One of the better books I read the past couple of years .. certainly the best book from Packt so far , I hope there is more quality stuff coming from that direction !

Apr 07 2010

UKUUG Spring Conference 2010

Last week I was in Manchester for the 2010 UKUUG Spring Conference, right .. make that 2 weeks ago , :)

The UKUUG usually hosts the more interesting conferences around ... , it's not just the schedule that attrackts me , yes there's the strong focus towards Larger Scale Unix (and mostly Linux) deployments and how to manage them, but there's also the opportunity to chat in real life with the Devops from across the chunnel.

Spending time with R.I.Pienaar, Julian Simpson, Simon Wilkinson , Alex Davies , Simon Riggs , Josette, and many others is always fun .

As I was in town early I went to the preconference beer meetup and met with a lot of people and chatted about config management, virtualization and lots of other stuff ... after the pub the plan was to go for curries nearby .. and while walking to the , ahem Bus stop, I managed to recognise Ben Martin from meeting him back ages ago in Hamburg for LinuxKongress , always fun ..

Apart from having to jump on a bus and our group being split at the curry place , rather than being able to tell the latecomers where to walk to and being seeted upstairs with the whole group , the curries were interesting and fun.

As I had been pushing Simon Wardley on Twitter to submit a talk for the conference it was really great to finally see him present .. His talk was the perfect soft introduction to the conference ...

Simon's talk was followed by a talk on Security for the virtual datacenters, after I questionned the speaker if anyone actualy uses TPM outside an academic lab the talk suddenly changed into a commercial presentation for a Quack, nuff said.

The Ever energetic Matt S Trout talked about 21st century perl before Simon "Life is to short for SELinux" Wilkinson talked about his experiences in getting the openAFS crowd on Git.

Bummer Thierry Carrez didn't show us the real juice of UEC and just the installations of a Cloud Controller and a Node Controller , but he managed to do so in approx 30 minutes as promised .

A talk titled Coherent and Integrated Configuration of Virtual Infrastructures always cathces my eye.. however when that talk turns out to be a Coherent and Integrated configuration only within the Univerity of Edinborough (aka lcfg2) talk I`m dissapointed, specially since it pretty much didn't introduce any new concepts from the ones I introduced back in my Durham UKUUG presentation

Luckily Andrew Stribblehill gave a very interesting talk on MySQL scalability, in which I promised him some answers to his questions for the next day :)

The Conference dinner was without a doubt the best UKUUG dinner so far , no typical english "food", no weird location (Old Trafford, an abandoned warship) , but just a big chinese place and plenty of food !

I started thurday morning in the wrong track, I assumed to be in the Virtualization track, but I ended up in the Sun thinclient and Abusing Linux to serve weird desktops under the Green computing umbrella track, not my favourites ..

When Patrick and Julian started their Hudson hit my Puppet with a Cucumber talk (which featured some aweseom #devops content) I was a afraid that we'd had to look for a replacment PostgreSQL talk as Simon hadn't arrived yet .. Luckily he arrived in time for his presentation and he explained us about the new replication features that are slowly making it into PostgreSQL, one way ... log shipping ... not really up to par with other alternatives yet :(

So with no further ado .. here's the presentation I gave

PS. If at a Ukuug event and not sure about a person's name ... try Simon.. pretty good chance you're correct :)

Nov 18 2009

Got Interviewed

by @botchagalupe
on Virtualization, Open Source tools and DNS Problems

Oct 19 2009

Nines , Damn Nines and More Nines

Funny how different experiences lead to different evaluations of tools. The MySQL HA solutions the MySQL Performanceblog list, are almost listed in the complete opposited order of what my impressions are.

Ok agreed, I should probably not put my MySQL NDB experiences from 2-3 years ago with multiple Query of deaths and more problems than you into account anymore , but back then went in the list Less stable than a single node. I've had NDB POC setups going down for much more than 05:16 minutes
Ndb comes with a lot of restrictions, there are

As for MySQL on DRBD, I've said this before , I love DRBD, but having to wait for a long InnoDB recovery after a failover just kills your uptime ,
I remember being called by a customer during Fred last holiday who was waiting over 20 minutes for recovery , twice, so putting the DRBD/San setup second would not be my preference. But agreed .. it's only listed at 99.9% meaning almost 9 hours of downtime per year are allowed.

On the other hand we've seen database uptime of MySQL MultiMaster setups with Heartbeat reaching better figures than 99.99% Heck I've seen single nodes achieve better than 99.99% :)

So what does this teach us ... there is no golden rule for HA, lots of situations are different, it's the preferences of the customer, the size of the database, the kind of application , and much
more .. you always need to think and evaluate the environment ...

Feb 02 2009

Everything is a fine whitespace problem ...

Couple of days ago I was working on a Linux Heartbeat v2 setup.
Upon inserting an XML snippet into the cib cib-adm started eating memory fast until the oom killer kicked in.

The environment was running a fairly old heartbeat-2.0.8 version so I upgraded to heartbeat-2.1.4-2.1 and there I got a nice warning that my XML sintax wasn't correct.

There was a whitespace in the XML syntax.

  1. <expression attribute="#replicationvalue" id="is_lagged" operation ="gt" ... ><

Removing the whitespace solves the problem, also on the older version. So the problem is already fixed upstream.. but you might run into it anyhow.

Oct 25 2008

Wholesale High Availability

Alan just coined WholeSale HA. The idea of rebooting a whole virtual machine rather than just failing over 1 service.

He wants to have the best of both worlds in 1 framework, he however doesn't specify what parts he likes from the WholeSale HA setup

Yes you want to use it coupled with hardware predictive failure analysis tools. In order to achieve Higher Availabilty, but I don't think the WholeSale HA part is real HA.

WholeSale HA isn't going to be fast enough for most of the business critical environments.
You simply cannot afford to reboot or even boot a full machine and the related downtime that brings for your service.

So yes a best effort combination, but with a strong focus on the application state would be preferred. WholeSale is a good start .. but it's definitely not where you want to stop.

Sep 24 2008

Bug in ifconfig ?

So earlier this week I ran into the weirdest problem with Linux-HA. Heartbeat was happily adding an IP address as an active resource so one of my nodes when needed, but upon removal it failed to remove the IP from the stack. Further debugging learned that the Heartbeat scripts claimed the ip wasn't on the actual stack.

It was.. but it the output from ifconfig was different from what it expected it to be.

Heartbeat checks the output of ifconfig and expects to find the IP address it added itselve to be on a :0 or similar interface. Now ifconfig only seems output 8 characters for the interface name Which means that when you have an interface called eth0:0 the output perfectly lists it and heartbea
t is smart enough to remove the ip again when the node goes to standby. If however you have a vlan with 3 digits on a bond interface Heartbeat will add :0 to bond0.129 , the Heartbeat resource will add the ip address perfectly but opon checking all the :0 interfaces the bond0.129:0 interface won't be parsed as ifconfig outputs it as bond0.129 , hence resulting in a potential painfull situation where 2 nodes still share an IP address.

So where's the actual problem ifconfig, or heartbeat, I'd say both, but the easiest fix will be in Heartbeat, afterall there are other preferred ways of adding an ip addres to an interface. ip addr add comes to mind :)

So I filed a bug report :)

Dec 17 2007

Integrating HA and Virtualization

Alan Robertson is discussing how Managed Virtualization (including HA) conflicts with System Management

He has some interesting points regarding managing infrastructure , in his vision there are just too much layers that don't talk to eachother .
He also points out some of the issues with CIM and SNMP .

Alan thinks the ideal way to go is to have your HA solution manage your Virtualization also.

I`m wondering if this doesn't add too much complexity.

If you are already making sure the services in your virtual machines are highly available, then why would you want to add another layer of complexity ? Surely the idea of being able to migrate virtual machines around sounds tempting but do we really need that extra layer of complexity ?

I've explained that migrating a virtual machine to another server won't help you when your apps crash or when your physical server fails.

But keeping an overview of which services are running where from 1 place seems like an interesting idea.

I've been tinkering about using the resource concept of Linux-HA however to serve another purpose than pure high availability. You might want to use its constraints to define how many virtual machines should run on and how much resources they can use on a certain physical machine. Hence create a loadbalancing infrastructure with it.

(I`m really really hoping someone now replies to this with a url which gives me a HAResource that does Live Migration :))