Mar 29 2011

Vagrant & Rubylibs

I was testing some MySQL puppet modules on my Vagrant box earlier this week and one of them required augeas.
I kept running into "Could not find a default provider for augeas", however all the appropriate augeas , augeas-lib and ruby-augeas packages were installed. I inspected the different ruby directories and the files were perfectly in /usr/lib/ruby/site_ruby/1.8 where I expected them.

With all the files seemd to be in the right place, my next option was to strace a small ruby script that included augeas, guess what that showed ..

  1. stat64("/opt/ruby/lib/ruby/site_ruby/1.8/augeas.rb", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  2. stat64("/opt/ruby/lib/ruby/site_ruby/1.8/augeas.so", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  3. stat64("/opt/ruby/lib/ruby/site_ruby/1.8/i686-linux/augeas.rb", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  4. stat64("/opt/ruby/lib/ruby/site_ruby/1.8/i686-linux/augeas.so", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  5. stat64("/opt/ruby/lib/ruby/site_ruby/augeas.rb", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  6. stat64("/opt/ruby/lib/ruby/site_ruby/augeas.so", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  7. stat64("/opt/ruby/lib/ruby/vendor_ruby/1.8/augeas.rb", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  8. stat64("/opt/ruby/lib/ruby/vendor_ruby/1.8/augeas.so", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  9. stat64("/opt/ruby/lib/ruby/vendor_ruby/1.8/i686-linux/augeas.rb", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  10. stat64("/opt/ruby/lib/ruby/vendor_ruby/1.8/i686-linux/augeas.so", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  11. stat64("/opt/ruby/lib/ruby/vendor_ruby/augeas.rb", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  12. stat64("/opt/ruby/lib/ruby/vendor_ruby/augeas.so", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  13. stat64("/opt/ruby/lib/ruby/1.8/augeas.rb", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  14. stat64("/opt/ruby/lib/ruby/1.8/augeas.so", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  15. stat64("/opt/ruby/lib/ruby/1.8/i686-linux/augeas.rb", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  16. stat64("/opt/ruby/lib/ruby/1.8/i686-linux/augeas.so", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  17. stat64("./augeas.rb", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  18. stat64("./augeas.so", 0xbfd2af1c) = -1 ENOENT (No such file or directory)

Indeed ... vagrant throws the default ruby to /opt/ruby .. and obviously there were no ruby-augeas files in there.

Mar 10 2011

Watching the Guards

A couple of weeks ago I noticed a weird drop in web usage stats on the site you are browsing now. Kinda weird as the drop was right around Fosdem when usually there is a spike in traffic.

So before you start.. no I don't preach on practice on my own blog, it's a blog dammit, so I do the occasional upgrades on the actual platform , with backups available, do some sanity tests and move on, yes I break the theme pretty often but ya'll reading this trough RSS anyhow.

My backups showed me that drush had made a copy of the Piwik module somewhere early february, exactly when this drop started showing. I verified the module , I verified my Piwik , - Oh Piwik you say .. yes Piwik, if you want a free alternative to Google Analytics , Piwik rocks .. - I even checked other sites using the same piwik setup and they were all still functional happily humming and being analyzed.... everything fine ... but traffic stayed low ..

This taught me I actually had to upgrade my Piwik too ...

So that brings me to the point I`m actually wanting to make...
as according to @patrickdebois in his chapter on Monitoring "Quis custodiet ipsos custodes?" who's monitoring the monitoring tools, who's monitoring the analytics tools,

So not only should you monitor the availability of yor monitoring tools, you should also monitor if their api hasn't changed in some way or another.
Just like when you are monitoring an web app you shoulnd't just see if you can connect to the appropriate http port, but you should be checking if you get sensible results back from it , no gibberish.

But then again ... there's no revenue in my blog or its statistics :)

Mar 04 2011

24 hours of Puppet Drama

Over the past couple of days I've been fighting with a weird puppet problem , we eventually cracked it , but I promised a bunch of you to fully explain it here ;)

So we were deploying 2 Blade chassis at a pretty remote location with a mix of phyisical and virtual machines, some 48 instances in total. This is a pretty standard rollout, we've got a bunch of similar platforms in our lab , so we knew about a couple of glitches, what to expect etc.

I was just keeping an eye on the deployment, looking at the logs seeing if things were running fine, when suddenly a couple of puppet runs didn't come trough, we had seen such behaviour before, usually it's a matter of running them a gain a couple of times and they will come trough. (Upgrading ruby and putting passenger in front of puppet actually solved those issues,
We'd even had a loop built in the platform that runs puppet a couple of times till it returns with the correct exit code just to make sure. )

We were first scratching the A chain of our setup, so that in the event of failure we could still bring up the B chain of the platform and be up and running again. Actually machines were coming up.. slowly .. some of them took a bit longer . One of the machine's clock was seriously off .. the SSL was barfing on it , so we set the bios clock, and restarted .. it was the machine with 6VM's took a while but everything was back on schedule.. then suddenly things were going down fast more and more puppetruns started failing and .. , at some point in time actually none of our puppet runs were working again .
I'd see the puppetmaster perfectly compile it's catalog

  1. notice: Compiled catalog for ctl-0-a

Then the client .. not wanting to get it ..

  1. Mar 1 11:10:45 ctl-0-a puppet-agent[3674]: Not using expired catalog for ctl-0-a from cache; expired at Tue Mar 01 09:50:06 +0000 2011
  2. Mar 1 11:10:45 ctl-0-a puppet-agent[3674]: Using cached catalog
  3. Mar 1 11:10:45 ctl-0-a puppet-agent[3674]: Could not retrieve catalog; skipping run

We had gone from about 60% of our fresly deployed boxen working fine, to not one
So what do you do .. indeed .. you turn on debugging.
You put both your puppetmaster and client in debug. Nothing, no errors no nothing ..

I asked some collegues, asked on irc .. much ideas but none of them that actually cracked the problem. I did what I knew that solve similar problems before,

I switched our serialization format from yaml back to pson , and back, no luck.
I upgraded ruby to a version from the glei.ch repository. No luck.
I upgraded our Puppet version 2.6 to a version from the TMZ Epel repo , we cleaned out ssl the certificates on all sides multiple times. Cleaned out /var/lib/puppet , We uninstalled puppet and reinstalled it.
It wasn't a DNS Problem

I had started stripping my manifests to empty runs, those worked, then started uncommenting the actual manifests again ... Then in the middle of the debug our VPN connection to the remote location broke down, we'd only be getting it back in the morning ..about 12 hours later not fun. Murphy obviously ..

So the next morning we dived right back in ... making those manifests bigger again, removing all the stages, 1 or 2 successful runs, then with the same config .. back to failure. On and off.. successfull and unscussessful. ... it wasn't in the manifests ..

So we decided to roll the puppetmaster back to it's previous version, that one was known to be stable, there obviously was something really fishy going, so that was the safest bet.

Wrong, the machine came up, but it took longer than expected, and when trying to connect new clients to it .. nothing worked anymore .. same problem as before .. puppetmaster compiles catalog, clients didn't get anything. we started to suspect faulty hardware .. but how could that bee.. the puppetclient looked liked the only malfunctioning thing around .

Then Dim0 suggested me to look at the that one logfile I hadn't looked , /var/log/puppet/masterhttp.log and then we saw it . it was being flooded with ssl errors, ssl errors from clients that shouldn't even be connecting to the puppetmaster at all.

  1. [2011-03-02 13:32:00] ERROR OpenSSL::SSL::SSLError: tlsv1 alert decrypt error
  2. /usr/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:44:in `accept'
  3. /usr/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:44:in `listen'
  4. /usr/lib/ruby/1.8/webrick/server.rb:173:in `call'
  5. /usr/lib/ruby/1.8/webrick/server.rb:173:in `start_thread'
  6. /usr/lib/ruby/1.8/webrick/server.rb:162:in `start'
  7. /usr/lib/ruby/1.8/webrick/server.rb:162:in `start_thread'
  8. /usr/lib/ruby/1.8/webrick/server.rb:95:in `start'
  9. /usr/lib/ruby/1.8/webrick/server.rb:92:in `each'
  10. /usr/lib/ruby/1.8/webrick/server.rb:92:in `start'
  11. /usr/lib/ruby/1.8/webrick/server.rb:23:in `start'
  12. /usr/lib/ruby/1.8/webrick/server.rb:82:in `start'
  13. /usr/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:42:in `listen'
  14. /usr/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:41:in `initialize'
  15. /usr/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:41:in `new'
  16. /usr/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:41:in `listen'
  17. /usr/lib/ruby/1.8/thread.rb:135:in `synchronize'
  18. /usr/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:38:in `listen'
  19. /usr/lib/ruby/site_ruby/1.8/puppet/network/server.rb:127:in `listen'
  20. /usr/lib/ruby/site_ruby/1.8/puppet/network/server.rb:142:in `start'
  21. /usr/lib/ruby/site_ruby/1.8/puppet/daemon.rb:124:in `start'
  22. /usr/lib/ruby/site_ruby/1.8/puppet/application/master.rb:114:in `main'
  23. /usr/lib/ruby/site_ruby/1.8/puppet/application/master.rb:46:in `run_command'
  24. /usr/lib/ruby/site_ruby/1.8/puppet/application.rb:287:in `run'
  25. /usr/lib/ruby/site_ruby/1.8/puppet/application.rb:393:in `exit_on_fail'
  26. /usr/lib/ruby/site_ruby/1.8/puppet/application.rb:287:in `run'
  27. /usr/sbin/puppetmasterd:4
  28. [2011-03-02 13:32:00] ERROR OpenSSL::SSL::SSLError: tlsv1 alert decrypt error

What happened was that 'we' decided to bring of the one backup machines back online, afterall once the slow starting server came trough, it would be the passive node in the cluster , no worries there, right ? Wrong,
This physical machine had 6 virtual machines with old ssl certificates that got stuck in an loop which was put there to sure their puppetrun came trough correctly at boot time.

Those 7 rogue clients which generated little to no relevant traffic on the network were saturating the default webrick, killing them solved the problem and we were back to regular deployment in no time.

The sad part is that our upcoming release already has passenger , a fresher version of ruby etc .. and that most of the above mentioned errors won't occur anymore there.
But in short .. don't use the default webrick .. it will kill you :)

And no , not everything is a freaking dns problem, ssl is a big pain in the B too .. :)

Feb 23 2011

dhcpd on Shared Networks

And as I always forget (getting old remember) how to have dhcpd configured to serve for multiple networks on one interface (e.g with aliases)
(as in test setups eg. ) I`ll write it down here, So next time google can point right back to me

  1. shared-network thisismessy {
  2. subnet 10.8.0.0 netmask 255.255.0.0 {
  3. option routers 10.8.0.1;
  4. }
  5. subnet 10.12.0.0 netmask 255.255.0.0 {
  6. option routers 10.12.0.1;
  7. }
  8. subnet 10.16.0.0 netmask 255.255.0.0 {
  9. option routers 10.16.0.1;
  10. }
  11. }

Feb 23 2011

Converting KVM to VirtualBox

I have had most of my test environment, aka puppetmasters, test mysql setups etc running in KVM for the past couple of years .. (yes I`m still using a lot of Xen in production environments, but we've also been using KVM for a while already .. it's a good mix) , Virtual box has always been the lesser loved Virtualization platform , however while playing more and more with Vagrant Up I realized I needed to convirt some boxen (e.g my PuppetMaster) to Virtualbox, and google was really no good help(most people seem to go the other way , or want to use some proprietary tools )

So I remembered VBoxManage and apparently I hade blogged about it myselve already ..
I just hate it when I search for stuff and google points right back to me

So I converted my puppetmaster's disks

  1. VBoxManage convertdd Emtpy-clone.img PuppetMasterroot.vdi
  2. VBoxManage convertdd puppet-var.img PuppetMastervar.vdi

Now when booting the VM in Virtualbox , obviously the kernel panicked .. as my KVM disks are recognised as as /dev/hda and and Virtualbox defaults to /dev/sda and LVM doesn't really like disks to be on another names
No commandline fu here to help me, but using the VirtualBox gui to move the disks to the IDE controller rather than the SATA controller.

Now all I need to do is wait for some smart guy who comments that you probably could use VBoxManage storagectl to achieve the same goal :)

And wait till Vagrant Up start supporting KVM , so I can move back :)

Feb 10 2011

Ensure Running

Has anyone noticed that pretty much every puppet module one finds on the internet by default enables the service they try to configure in the module

When looking at it from a single machine point of view it makes sense to include the module , have it configure your service and directly enable it by default.

So I started wondering .. isn't there anybody out there who is building clusters ? Where services have to configured on multiple nodes but should NOT be running acitvely on all nodes by default because there is an external tool which manages that for you (Pacemaker framework eg.)

Agreed it's a small patch to get the functionality you want , but it brings an extra overhead when one upgrades the modules etc.

So if it doesn't bother you please split your puppet module in 2 parts.. one you call to configure the service, another which you call to enable the service , if you want to.

thnx !

Feb 06 2011

At Fosdem

  • on Friday evening , apparently having a confirmed reservation in a resto is not enough to actually be welcome at that restaurant.
  • at DrupalDevdays, only 2 laptops were open during our presentation
  • at DrupalDevdays, almost nobody in the room was already using CI
  • at Fosdem , the parking lot is full before 11:30 on a saturday
  • at Fosdem , much less Macs than last years .
  • at Fosdem , way too much rooms are already at full capacity so you need to have 2-3 backup alternatives ..
  • at Fosdem , people expect me to be in certain rooms, at the same time
  • at Fosdem , even with too much rooms already full one still misses a bunch of interresting talks
  • at Fosdem , one doesn't even realize friends are speaking there too ..
  • at Fosdem , Android is the standard ...
  • at Fosdem , you are confronted with the fact you probably forgot more names of people than you remember ;(
  • at Fosdem , you are surrounded by famous open source people, that aren't even on the schedule
  • at the MySQL Meetup Dinner, Monty brings Salmiakki
  • at Fosdem , you wonder how many other people have survived their 11th edition
  • at Fosdem , you can't get into any devroom on sunday morning
  • at Fosdem , begging on Twitter to get in to a devroom from the other side of the door works (at least for me :))
  • at Fosdem , netbooks are much less popular as opposed to 2-3 years ago ..
  • after fosdem ... you crash ..
  • Feb 05 2011

    Reservation Confirmed

    Apparently I don't understand french :(

    Copied from my inbox ...

    1. From: Info Falstaff <info@lefalstaff.be>
    2. To: 'Kris Buytaert' <Kris.Buytaert@youalreadyknowmyaddy.be>
    3. Subject: RE: Reservation Next Friday 4/2/2011
    4. Date: Fri, 28 Jan 2011 16:55:45 +0100
    5.  
    6. je confirme votre reservation pour le vendredi 04 02 2011
    7.  
    8. vous trouverez ci-joint nos differentes formules
    9.  
    10. bien à vous
    11.  
    12. guy Lommen chef de cuisine

    Feb 01 2011

    MySQL & Friends Meetup at Fosdem 2011

    I admit .. I`m lazy ... unlike last year I did not organize MySQL and Friends meetup at Fosdem.

    I outsourced it to Kenny

    More info and registration here

    Jan 24 2011

    Upcoming 2011 Speaking Engagements

    Lenz gave the good example so I`ll follow :)

    Next weekend saturday I`ll be giving a talk about devops at StartUp Weekend Brussels, from what I've read so far it promises to be an audience that needs the talk,

    The week after I`ll be speaking at the DrupalDevDays, again about devops , however this time with a touch of Drupal , giving a devops talk at Devoxx last year to a Java audience learned me that the devops evangelist need to go outside of their usual conference audiences and als talk to the people that are usually in the other silos.

    Next march I`ll be speaking at the UKUUG spring conference in Leeds this time about my experiences on High Availability with Pacemaker

    And who knows I might squeeze in a talk at Load this year also ..

    If you are around at one of these confs and you want to talk Devops, Clustering, sipx or just have a beer .. don't hesitate ! There's already plenty of people promising me beers , and some even sushi :)