Everything is a Freaking DNS problem

Nov 13 22:13

The return of the Dull Stack Engineer.

By: Kris Buytaert

Every day we get emails from recruiters who are looking for a devops engineer to help their customer.

When we ask them if they are looking for a developer or an operations person they have no clue what to answer. Because a devops engineer doesn't exist.

DevOps Engineer is not a title

The DevOps community has a long time ago established that DevOps is not a job title, it's not a Java developer running production, and it's not an Ops girl patching java code. Neither is it the person in charge of tooling. And it certainly isn't an engineer working in a DevOps team.

But for a lot of recruiters and some IT professionals that are still working in a a traditional top/down environment and who want to become more agile, a #devops engineer simply sounds better than a senior Linux engineer, which is what they typically are looking for.

The Myth of the Full Stack Engineer
And when that mythical all in one DevOps engineer that could both write code and design, run and manage platforms couldn't be found, they narrowed their search margin.

They started looking for a Full Stack Engineer, who pretty much needed to be capable of doing the same. Or not... Because most so called full stack engineers are actually MEAN stack engineers. Engineers that typically barely understand the stack he is really working with aka the Mean Stack... Which is the stack composed of MongoDB, Express.js AngularJS and Node.js.

What a full stack engineer really means is an engineer who has experience in all components of the stack that its application touches. From the Linux kernel, to the networking, the middleware and the database components, all the way up to the web server and the frontend application.

But how many people do you know that both have their names in the Linux Kernel, can performance tune a SQL stack, debug a Java Stack-trace and even build customer facing web applications ?

Indeed, we don't know that many people with those skills or experiences And the people that mark all the checkboxes have their experience stretched over multiple decades... They didn't do all of those things together.

So both the concept of the Full Stack Engineer and the #devops engineer have come with a common problem : those people really don't exist...

The Knowledge is in the team

Building cross-functional application teams, it is about getting people with all of those skills to collaborate with each other and understanding each other. It's not about that one human being doing all the things.

Hype Driven Development

The second pattern we've seen is the concept of hype driven, conference driven, or CV driven development. Three variants of the same problem.

A developer forgets about the operational requirements for the applications being built, forgets about the non-functional requirements and jumps on the latest and newest fancy technology he/she wants to learn. Either because he/she just heard about it at a conference, because the tool he/she wants to use is really really being hyped. Or just because they want to build experience with that tool so they can add it to their CV to get that next job they want to have.

Often tools need a couple of releases to mature before they are ready to use in production, and before they are stable enough to trust with your data. It's not uncommon that frequently used tools lack crucial functionality so it can be used in a production environment. Things like support for good metrics and monitoring, or consistent data snapshots for backups.

But they jump on the hype... We take the fancy new tool and act like the cool kids. The fact that this new tool doesn't even come close to solving the business need they have, the fact that it actually often widens the gap between developers and operations people isn't important to them.

We fall into the trap and forget the most important #DevOps principle: Collaboration to bring business value.

Boring is Powerful, Boring is stable

The container movement of the past 4 years is often sadly a good example of this tendency to be driven by hype. Many organizations have jumped on Docker and other container tools, thinking that a tool adoption will help them become more "DevOps" and use microservices to gain speed and innovate faster.

Bridget Kromhout has been quoted to say "Putting your monolith in a container does not make it a micro service" - yet it's the default adoption pattern for a lot of Enterprises. They believe that containers are going to save them and solve all their development problems.

This isn't what #devops is about. DevOps was never about fancy new devops tools. Have we all forgotten one of the key principles of agile is "People over processes and tools" ?
DevOps was from the start about bringing value to the end user, the business, the customer.

And yes new technology is fun, but it doesn't mean it's stable or performant. Like Jon Topper mentioned at DevopsDays London:

"Boring is Powerfull, boring is stable"

A platform that is boring often means that there aren't any bleeding edges around it. It often means that it is more stable. Think about an engineer who is on call. He wants to be bored. He does not want to being paged every other hour.

And here is the Dull Stack Engineer

So imagine a number of people at a conference discussing this problem, joking about those Full Stack engineers with their fancy new tools who fail to collaborate. And by accident saying "Dull stack Engineer"

This is when I realized that this is actually what we want. People who care about the business and collaborating with their peers, rather than jumping on the latest and shiniest new tooling. A Dull Stack engineer.

The concept being born on a swing at a conference in Bucharest, quickly started to live it's own life. More engineers started taking pictures of themselves on a swing and endorsing others for Dull Stack engineering on LinkedIn.

Building stable platforms that brings value to your business can happen with boring old tools, there's nothing wrong with your team if they don't go for fancy containers but stick with trusted automation in VM's or even bare metal.

Because DevOps is about collaborating with your peers, it's not about tools, and once we realize that... The on-call engineers might be bored again :)

Kris Buytaert's blog

Aug 22 21:50

11.5 Factor apps

By: Kris Buytaert

Each time someone talks about the 12 Factor Application a weird feeling overcomes me .. Because I like the concept .. but it feels awkward .. it feels as if someone with 0 operational experience wrote it. And this devil is in a really small detail.

And that is Part III. Config ... For me (and a lot of other folks I've talked to about this topic) , using environment variables (as in real environment variables) are just one of the worst ideas ever. Environment variables are typically set manually , or from a script that is being executed and there's little or trace to see fast how a specific config is set.

Imagine I launch an app with an env variable X=foo , then your collegue stops that app, and launches it with X=bar. The systems integrity has not changed, no config or binaries have been changed, but the application behaviour could have completely changed.

Sure I can go in to /proc/pid/environ to find the params it was launched with ... But I want to know what state the system is supposed to be in .. and have tooling around that verified that the system indeed is in that state.

https://12factor.net/config claims that using config files that are not stored
in revision control are a huge step forward from hardcoded config in code, but they claim that config files now start ending up all over the place. This obviously feels written by someone who never experienced the power of modern configuration management. Templates which are dynamically populated , or configs that are even calculated on the spot depending on the environment an application lives in are the norm in modern infrastructure as code .. yet people seem to think that managing an environment variable would be less painfull.

I feel the idea of what they wanted to achieve was good, but the way they suggest the implementation was foobar. I don't ever want critical config of an application (like wether is is talking to the PROD or DEV database) to be set from an environment variable that can be modified. This has to come from an automated system.

This kind of config should be idempotent, one should be able to trace back where it came from and who modified it (version control), and every redeploy of the application should end up with the same result. It can even be dynamic (service discovery), but placing it in an Environment variable is the last place where a config deserves to live.

So please let's stop calling it the 12 Factor application .. and call it the 11.5 Factor application ..

Kris Buytaert's blog

Sep 01 2017

Decomissioning my Drupal blog

By: Kris Buytaert Tags:

If you are looking at this blog post right now... my live Drupal site has finally been decommissioned.. or not .. well these pages are served statically but the content is still generated by an ancient aging Drupal 6, which is hiding somewhere in a container that I only start when I need it.

Given my current low blog volume .. and the lack of time to actually migrate all the content to something like Jekyll or Webby I took the middle road and pulled the internet facing Drupal offline. My main concern was that I want to keep a number of articles that people frequently point to in the exact same location as before. So that was my main requirement, but with no more public facing drupal I have no more worrying about the fact that it really needed updating, no more worrying about potential issues on wednesday evenings etc

My first couple of experiments were with wget / curl but I bumped into. Sending a Drupal site into retirement which pointed me to httrack which was a new tool for me ..

As documented there
httrack http://127.0.0.1:8080/blog -O . -N "%h%p/%n/index%[page].%t" -WqQ%v --robots=0
creates a usuable tree but root page ends up in blog/blog which is not really handy.
So the quick hack for that is to go into the blog/blog subdir and regexp the hell out of all those files generated there direction one level below :)
for file in `ls`; do cat $file | sed -e "s/\.\.\//\/blog\//g" > ../$file ; done

httrack however has one annoying default in which it puts metatdata in the footer of a page it mirrors, where it comes from and when it was generated thats very usefull for some use cases, but not for mine as it means that every time I regenerate the site it actually generates slightly different content rather than identical pages. Luckily I found the -%F "" param to keep that footerstring empty

And that is what you are looking at right now ...

There are still a bunch of articles I have in draft .. so maybe now that I don't have to worry about the Drupal part of things I might blog more frequent again, or not..

Kris Buytaert's blog

Jun 15 2016

Will containers take over ?

By: Kris Buytaert Tags:

and if so why haven't they done so yet ?

Unlike many people think, containers are not new, they have been around for more than a decade, they however just became popular for a larger part of our ecosystem. Some people think containers will eventually take over.

Imvho It is all about application workloads, when 8 years ago I wrote about a decade of open source virtualization, we looked at containers as the solution for running a large number of isolated instances of something on a machine. And with large we meant hundreds or more instances of apache, this was one of the example use cases for an ISP that wanted to give a secure but isolated platform to his users. One container per user.

The majority of enterprise usecases however were full VM's Partly because we were still consolidating existing services to VM's and weren't planning on changing the deployment patterns yet. But mainly because most organisations didn't have the need to run 100 similar or identical instances of an application or a service, they were going from 4 bare metal servers to 40 something VM's but they had not yet come to the need to run 100's of them. The software architecture had just moved from FatClient applications that talked directly to bloated relational databases containing business logic, to web enabled multi-tier
applications. In those days when you suggested to run 1 Tomcat instance per VM because VM's were cheap and it would make management easier, (Oh oops I shut down the wrong tomcat instance) , people gave you very weird looks

Slowly software architectures are changing , today the new breed of applications is small, single function, dedicated, and it interacts frequently with it's peers, together combined they provide similar functionality as a big fat application 10 years ago, But when you look at the market that new breed is a minority. So a modern application might consist of 30-50 really small ones, all with different deployment speeds. And unlike 10 years ago where we needed to fight hard to be able to build both dev, acceptance and production platforms, people now consider that practice normal. So today we do get environments that quickly go to 100+ instances , but requiring similar CPU power as before, so the use case for containers like we proposed it in the early days is now slowly becoming a more common use case.

So yes containers might take over ... but before that happens .. a lot of software architectures will need to change, a lot of elephants will need to be sliced, and that is usually what blocks cloud, container, agile and devops adoption.

Jun 13 2016

Jenkins DSL and Heisenbugs

By: Kris Buytaert Tags:

I`m working on getting even more moving parts automated, those who use Jenkins frequently probably also have Love - Hate relationship with it.

The love coming from the flexibility , stability and the power you get from it, the hate from it's UI. If you've ever had to create a new Jenkins job or even pipeline based on one that already existed you've gone trough the horror of click and paste errors , and you know where the hate breeds.

We've been trying to automate this with different levels of success, we've puppetized the XML jobs, we've used the Buildflow Plugin (reusing the same job for different pipelines is a bad idea..) We played with JJB running into issues with some plugins (Promoted Build) and most recently we have put our hope in the Job DSL.

While toying with the DSL I ran into a couple of interresting behaviours. Imagine you have an entry like this which is supposed to replace the $foldername with the content of the variable and actually take the correct upstream

cloneWorkspace('${foldername}/dashing-dashboard-test', 'Successful')

You generate the job, look inside the Jenkins UI to verify what the build result was .. save the job and run it .. success ..
Then a couple of times later that same job gives an error ... It can't find the upstream job to copy the workspace from. You once again open up the job in the UI, look at it .. save it , run it again and then it works.. a typical case of Heisenbug ..

When you start looking closer to the XML of the job you notice ..

<parentJobName>${foldername}/dashing-dashboard-test</parentJobName>

obviously wrong .. I should have used double quotes ..

But why doesn't it look wrong in the UI ? That's because the UI autoselects the first option from it's autogenerated pull down list .. Which actually contains the right upstream workplace I wanted to trigger (that will teach me to use 00 as a prefix for the foldername for all my tests..)

So when working with the DSL .. review the generated XML .. not just if the job works ..

Kris Buytaert's blog

Jun 01 2016

Linux Troubleshooting 101 , 2016 Edition

By: Kris Buytaert Tags:

Back in 2006 I wrote a blog post about linux troubleshoooting. Bert Van Vreckem pointed out that it might be time for an update ..

There's not that much that has changed .. however :)

Everything is a DNS Problem

Everything is a Fscking DNS Problem
No really, Everything is a Fscking DNS Problem
If it's not a fucking DNS Problem ..
It's a Full Filesystem Problem
If your filesystem isn't full
It is a SELinux problem
If you have SELinux disabled
It might be an ntp problem
If it's not an ntp problem
It's an arp problem
If it's not an arp problem...
It is a Java Garbage Collection problem
If you ain't running Java
It's a natting problem
If you are already on IPv6
It's a Spanning Tree problem
If it's not a spanning Tree problem...
It's a USB problem
If it's not a USB Problem
It's a sharing IRQ Problem
If it's not a sharing IRQ Problem
But most often .. its a Freaking Dns Problem !

May 28 2016

Docker and volumes hell

By: Kris Buytaert Tags:

We're increasingly using Docker to build packages, a fresh chroot in which we prepare a number of packages, builds typically for ruby (rvm) , or python (virtualenv) or node stuf where the language ecosystem fails on us ... and fpm the whole tree as a working artifact.

An example of such a build is my work on packaging Dashing. https://github.com/KrisBuytaert/build-dashing

Now part of that build is running the actual build script in docker with a local volume mounted inside the container This is your typical -v=/home/src/dashing-docker/package-scripts:/scripts param.

Earlier this week however I was stuck on a box where that combo did not want to work as expected. Docker clearly mounted the local volume, as it could execute the script in the directory, but for some reason it didn't want to write in the mounted volume.

docker run -v=/home/src/dashing-docker/package-scripts:/scripts dashing/rvm /scripts/packagervm
Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
corefines: Your Ruby doesn't support refinements, so I'll fake them using plain monkey-patching (not scoped!).
/usr/local/share/gems/gems/corefines-1.9.0/lib/corefines/support/fake_refinements.rb:26: warning: Refinements are experimental, and the behavior may change in future versions of Ruby!
/usr/share/ruby/fileutils.rb:1381:in `initialize': Permission denied - rvm-1.27.0-1.x86_64.rpm (Errno::EACCES)

So what was I doing wrong, did the Docker params change, did I invert the order of the params, did I mistype them ? I added debugging to the script, (ls , chmod, etc..) and I couldn't seem to read or modify the directory. So I asked a coworker to be my wobbling duck.

He did more .. he wondered if this wasn't selinux.

And he was right..

Apr 23 21:47:00 mine23.inuits.eu audit[9570]: AVC avc: denied { write } for pid=9570 comm="fpm" name="package-scripts" dev="dm-2" ino=368957 scontext=system_u:system_r:svirt_lxc_net_t:s0:c47,c929 tcontext=unconfined_u:object_r:unlabeled_t:s0 tclass=dir permissive=0
Apr 23 21:47:02 mine23.inuits.eu python3[9597]: SELinux is preventing fpm from write access on the directory /home/src/dashing-docker/package-scripts.

So while I was looking for errors in docker, it was just my selinux set to enforce acting up and me not noticing it.

The quick way to verify obvisously was to setenforce 0 and trigger the build again that however is not a long term fix so I changed the

semanage fcontext -a -t cgroup_t '/home/src/dashing-docker/package-scripts'
restorecon -v '/home/src/dashing-docker/package-scripts'

That solves the problem

Kris Buytaert's blog

Mar 23 2016

It's just rubber , with some air in it.

By: Kris Buytaert Tags:

- A child balloon
you inflate it, play with it for a while, then it explodes, you throw it away you inflate another one, maybe even a different color. the kid plays with it .. till it breaks, then you throw it away...

- An inflatable castle.
you inflate it, play with it for a while, deflate it, move it around, inflate it again, if it has a hole in it, you patch the hole.

- The tyres on your kids bike,.
You inflate it , kid rides on it, if it starts losing air, you fix it asap so you can continue using it.

All of the 3 are really valuable use cases for rubber with air in it,
in a way they are all equally valuable, but serve different purposes, different use cases.

Now think about this next time you spin up a container, that's running a database, application server and where your users ssh to.

It's not just another VirtualMachine

Kris Buytaert's blog

Mar 23 2016

Lies, Damn Lies and Statistics, 2016 Edition

By: Kris Buytaert Tags:

When people sign up for Configuration Management Camp, we ask them what community room they are mostly interested in.
We ask this question because we have rooms in different sizes and we don't want to put communities with 20 people showing interest in a 120 seat room and we don't want to put a community with 200 people in a 60 seat room.

But it also gives us to opportunity to build some very interesting graph over the potential evolution of the communities.

So looking at the figures ... the overall community is obviously growing,From 350 to 420, to just short of 600 people registered now.

The Puppet Community is not the biggest anymore, that spot went to the Ansible Community room. And all but the CFengine communities are growing.

One more thing , The organisation team discussed several times if we should rebrand the event. We opted not to .. Infracoders.eu could have been an alternative name .. but we decided to stick with the name that already is known,
the content will evolve.. but Config Management Camp will stay the place where people that care about Infrastructure as Code and Infrastructure automation meet.

Kris Buytaert's blog

Mar 23 2016

Recent and Upcoming Talks

By: Kris Buytaert Tags:

I gave a couple of new and updated talks the last couple of months.

At Fosdem, I opened up the Distributions Devroom with a talk on how we could improve the collaboration between the developers of a distro, and their users, the ops folks. I seemed to not be the only person with similar ideas as the smart folks over at the CentOS project were already actively talking about their efforts into making some my ideas become reality the day before at the CentOS Dojo.

Closing the gap between Distros(devs) and their Users(ops) from Kris Buytaert

Another talk of early this year was an update on the why Security is requires a devops aproach, why you want to embed security as a standard practice in your development process and why continuous delivery actually is a security requirement.

Dev secops opsec, devsec, devops ? from Kris Buytaert

At last weeks FLOSS UK Conference in London , I gave an updated version of My MonitoringLove talk, giving an opinionated overview of the current state of Open Source Monitoring tools.

From MonitoringSucks to Monitoring Love , 2016 Edition from Kris Buytaert

I was scheduled to give the Opening Keynote today (23/3/2016) at the Making Open Source Software Conference , but sadly I had to cancel that due to yesterdays (22/3/2016) events at Brussels Airport. My flight to Bucharest was obviously cancelled.

I`m scheduled to open up the 2nd day of the the upcoming Devopsdays London Edition

And Bernd Erk tricked me into giving a follow up talk on my popular 7 tools for your devops stack talk, aptly titled Another 7 tools for your devops stack

Kris Buytaert's blog