devops

Jun 15 2016

Will containers take over ?

and if so why haven't they done so yet ?

Unlike many people think, containers are not new, they have been around for more than a decade, they however just became popular for a larger part of our ecosystem. Some people think containers will eventually take over.

Imvho It is all about application workloads, when 8 years ago I wrote about a decade of open source virtualization, we looked at containers as the solution for running a large number of isolated instances of something on a machine. And with large we meant hundreds or more instances of apache, this was one of the example use cases for an ISP that wanted to give a secure but isolated platform to his users. One container per user.

The majority of enterprise usecases however were full VM's Partly because we were still consolidating existing services to VM's and weren't planning on changing the deployment patterns yet. But mainly because most organisations didn't have the need to run 100 similar or identical instances of an application or a service, they were going from 4 bare metal servers to 40 something VM's but they had not yet come to the need to run 100's of them. The software architecture had just moved from FatClient applications that talked directly to bloated relational databases containing business logic, to web enabled multi-tier
applications. In those days when you suggested to run 1 Tomcat instance per VM because VM's were cheap and it would make management easier, (Oh oops I shut down the wrong tomcat instance) , people gave you very weird looks

Slowly software architectures are changing , today the new breed of applications is small, single function, dedicated, and it interacts frequently with it's peers, together combined they provide similar functionality as a big fat application 10 years ago, But when you look at the market that new breed is a minority. So a modern application might consist of 30-50 really small ones, all with different deployment speeds. And unlike 10 years ago where we needed to fight hard to be able to build both dev, acceptance and production platforms, people now consider that practice normal. So today we do get environments that quickly go to 100+ instances , but requiring similar CPU power as before, so the use case for containers like we proposed it in the early days is now slowly becoming a more common use case.

So yes containers might take over ... but before that happens .. a lot of software architectures will need to change, a lot of elephants will need to be sliced, and that is usually what blocks cloud, container, agile and devops adoption.

Jun 13 2016

Jenkins DSL and Heisenbugs

I`m working on getting even more moving parts automated, those who use Jenkins frequently probably also have Love - Hate relationship with it.

The love coming from the flexibility , stability and the power you get from it, the hate from it's UI. If you've ever had to create a new Jenkins job or even pipeline based on one that already existed you've gone trough the horror of click and paste errors , and you know where the hate breeds.

We've been trying to automate this with different levels of success, we've puppetized the XML jobs, we've used the Buildflow Plugin (reusing the same job for different pipelines is a bad idea..) We played with JJB running into issues with some plugins (Promoted Build) and most recently we have put our hope in the Job DSL.

While toying with the DSL I ran into a couple of interresting behaviours. Imagine you have an entry like this which is supposed to replace the $foldername with the content of the variable and actually take the correct upstream

  1. cloneWorkspace('${foldername}/dashing-dashboard-test', 'Successful')

You generate the job, look inside the Jenkins UI to verify what the build result was .. save the job and run it .. success ..
Then a couple of times later that same job gives an error ... It can't find the upstream job to copy the workspace from. You once again open up the job in the UI, look at it .. save it , run it again and then it works.. a typical case of Heisenbug ..

When you start looking closer to the XML of the job you notice ..

  1. <parentJobName>${foldername}/dashing-dashboard-test</parentJobName>

obviously wrong .. I should have used double quotes ..

But why doesn't it look wrong in the UI ? That's because the UI autoselects the first option from it's autogenerated pull down list .. Which actually contains the right upstream workplace I wanted to trigger (that will teach me to use 00 as a prefix for the foldername for all my tests..)

So when working with the DSL .. review the generated XML .. not just if the job works ..

Jul 28 2015

The power of packaging software, package all the things

Software delivery is hard, plenty of people all over this planet are struggling with delivering software in their own controlled environment. They have invented great patterns that will build an artifact, then do some magic and the application is up and running.

When talking about continuous delivery, people invariably discus their delivery pipeline and the different components that need to be in that pipeline.
Often, the focus on getting the application deployed or upgraded from that pipeline is so strong that teams
forget how to deploy their environment from scratch.

After running a number of tests on the code , compiling it where needed, people want to move forward quickly and deploy their release artifact on an actual platform.
This deployment is typically via a file upload or a checkout from a source-control tool from the dedicated computer on which the application resides.
Sometimes, dedicated tools are integrated to simulate what a developer would do manually on a computer to get the application running. Copy three files left, one right, and make sure you restart the service. Although this is obviously already a large improvement over people manually pasting commands from a 42 page run book, it doesn’t solve all problems.

Like the guy who quickly makes a change on the production server, never to commit the change, (say goodbye to git pull for your upgrade process)
If you package your software there are a couple of things you get for free from your packaging system.
Questions like, has this file been modified since I deployed it, where did this file come from, when was it deployed,
what version of software X do I have running on all my servers, are easily answered by the same
tools we use already for every other package on the system. Not only can you use existing tools you are also using tools that are well known by your ops team and that they
already use for every other piece of software on your system.

If your build process creates a package and uploads it to a package repository which is available for the hosts in the environment you want to deploy to, there is no need anymore for
a script that copies the artifact from a 3rd party location , and even less for that 42 page text document which never gets updated and still tells you to download yaja.3.1.9.war from a location where you can only find
3.2 and 3.1.8 and the developer that knows if you can use 3.2 or why 3.1.9 got removed just left for the long weekend.

Another, and maybe even more important thing, is the current sadly growing practice of having yet another tool in place that translates that 42 page text document to a bunch of shell scripts created from a drag and drop interface, typically that "deploy tool" is even triggered from within the pipeline. Apart from the fact that it usually stimulates a pattern of non reusable code, distributing even more ssh keys , or adding yet another agent on all systems. it doesn’t take into account that you want to think of your servers as cattle and be able to deploy new instances of your application fast.
Do you really want to deploy your five new nodes on AWS with a full Apache stack ready for production, then reconfigure your load balancers only to figure out that someone needs to go click in your continuous integration tool or deployment to deploy the application to the new hosts? That one manual action someone forgets?
Imvho Deployment tools are a phase in the maturity process of a product team.. yes it's a step up from manually deploying software but it creates more and other problems , once your team grows in maturity refactoring out that tool is trivial.

The obvious and trivial approach to this problem, and it comes with even more benefits. is called packaging. When you package your artifacts as operating system (e.g., .deb or .rpm) packages,
you can include that package in the list of packages to be deployed at installation time (via Kickstart or debootstrap). Similarly, when your configuration management tool
(e.g., Puppet or Chef) provisions the computer, you can specify which version of the application you want to have deployed by default.

So, when you’re designing how you want to deploy your application, think about deploying new instances or deploying to existing setups (or rather, upgrading your application).
Doing so will make life so much easier when you want to deploy a new batch of servers.

May 03 2015

What done REALLY looks like in devops

Steve Ropa blogged about What done looks like in devops , I must say I respecfullly , but fully disagree with Steve here.

For those of you that remember I gave an Ignite about my views on the use of the Definition of Done back ad #deovpsdays 2013 in Amsterdam.

In the early days we talked about the #devops movement partly being a reaction against the late friday night deployments where the ops people got a tarball with some minimalistic notes and were supposed to put stuff in production. The work of the development team was Done, but the operations team work just started.

Things have improved .. like Steve mentions for a lot of teams done now means that that their software is deployable, that we have metrics from them, that we can monitor the application.

But lets face it .. even if all of that is in place there is still going to be maintenance, security fixes, major stack upgrades, minor application changes, we all still need to keep the delivery pipelines running.

A security patch on an appliction stack means that both the ops and the developers need to figure out the required changes together.

Building and delivering value to your end users is something that never ends, we are never actually done.

So let me repeat ,

"Done is when your last enduser is in his grave"
In other words, when the application is decomissioned.

And that is the shared responsability mindset devops really brings, everybody is caring about the value they are bringing to their customers, both developers and operations people. Thinking about keeping the application running. And not assuming that because a list of requirements have been validated at the end of a sprint we are done. Because we never are...

BTW. Here's my original slides for that #devopsdays Amsterdam talk.


Jun 04 2014

Jenkins, Puppet, Graphite, Logstash and YOU

This is a repost of an article I wrote for the Acquia Blog some time ago.

As mentioned before, devops can be summarized by talking about culture, automation, monitoring metrics and sharing. Although devops is not about tooling, there are a number of open source tools out there that will be able to help you achieve your goals. Some of those tools will also enable better communication between your development and operations teams.

When we talk about Continuous Integration and Continuous Deployment we need a number of tools to help us there. We need to be able to build reproducible artifacts which we can test. And we need a reproducible infrastructure which we can manage in a fast and sane way. To do that we need a Continuous Integration framework like Jenkins.

Formerly known as Hudson, Jenkins has been around for a while. The open source project was initially very popular in the Java community but has now gained popularity in different environments. Jenkins allows you to create reproducible Build and Test scenarios and perform reporting on those. It will provide you with a uniform and managed way to , Build, Test, Release and Trigger the deployment of new Artifacts, both traditional software and infrastructure as code-based projects. Jenkins has a vibrant community that builds new plugins for the tool in different kinds of languages. People use it to build their deployment pipelines, automatically check out new versions of the source code, syntax test it and style test it. If needed, users can compile the software, triggering unit tests, uploading a tested artifact into a repository so it is ready to be deployed on a new platform level.

Jenkins then can trigger an automated way to deploy the tested software on its new target platform. Whether that be development, testing, user acceptance or production is just a parameter. Deployment should not be something we try first in production, it should be done the same on all platforms. The deltas between these platforms should be managed using a configuration management tool such as Puppet, Chef or friends.

In a way this means that Infrastructure as code is a testing dependency, as you also want to be able to deploy a platform to exactly the same state as it was before you ran your tests, so that you can compare the test results of your test runs and make sure they are correct. This means you need to be able to control the starting point of your test and tools like Puppet and Chef can help you here. Which tool you use is the least important part of the discussion, as the important part is that you adopt one of the tools and start treating your infrastructure the same way as you treat your code base: as a tested, stable, reproducible piece of software that you can deploy over and over in a predictable fashion.

Configuration management tools such as Puppet, Chef, CFengine are just a part of the ecosystem and integration with Orchestration and monitoring tools is needed as you want feedback on how your platform is behaving after the changes have been introduced. Lots of people measure the impact of a new deploy, and then we obviously move to the M part of CAMS.

There, Graphite is one of the most popular tools to store metrics. Plenty of other tools in the same area tried to go where Graphite is going , but both on flexibility, scalability and ease of use, not many tools allow developers and operations people to build dashboards for any metric they can think of in a matter of seconds.

Just sending a keyword, a timestamp and a value to the Graphite platform provides you with a large choice of actions that can be done with that metric. You can graph it, transform it, or even set an alert on it. Graphite takes out the complexity of similar tools together with an easy to use API for developers so they can integrate their own self service metrics into dashboards to be used by everyone.

One last tool that deserves our attention is Logstash. Initially just a tool to aggregate, index and search the log files of our platform, it is sometimes a huge missed source of relevant information about how our applications behave.. Logstash and it's Kibana+ElasticSearch ecosystem are now quickly evolving into a real time analytics platform. Implementing the Collect, Ship+Transform, Store and Display pattern we see emerge a lot in the #monitoringlove community. Logstash now allows us to turn boring old logfiles that people only started searching upon failure into valuable information that is being used by product owners and business manager to learn from on the behavior of their users.

Together with the Graphite-based dashboards we mentioned above, these tools help people start sharing their information and communicate better. When thinking about these tools, think about what you are doing, what goals you are trying to reach and where you need to improve. Because after all, devops is not solving a technical problem, it's trying to solve a business problem and bringing better value to the end user at a more sustainable pace. And in that way the biggest tool we need to use is YOU, as the person who enables communication.

Jun 04 2014

Why Does DevOps Matter?

This is a repost of an article I wrote for the Acquia Blog some time ago.

People often ask, why does DevOps matter?

The honest answer to that question is...because having the development and operations team work together is the only way IT is successful.

Over the past few decades I've worked in different environments that include: small web start ups, big pharmaceutical companies, hardware engineering shops and large software companies and banks. All were trying different approaches to deliver quality software to their end users, customers, but most of them were failing badly.

Operations people were being pulled in at the last minute. A marketing campaign needed to go live at 5 p.m. because that's when the first radio commercial was scheduled to be broadcasted. At 11 a.m., the operations people still didn't know the campaign existed.

It was always the other person’s fault. Waterfall projects and large PID documents were the solution to all the problems. But people learned; they figured out that we can't expect humans to predict how long it would take to implement something they have never done before. Unfortunately, even today, only a small set of people understand the value of being agile and that we cannot break a project down to its granular details without factoring in the “unpredictable.” The key element here is the “uncertainty” of the many project pieces.

So on came the agile movement and software development became much smoother.
People agreed on time boxing a reasonable set of work that would result in delivering useful functionality in frequent batches. Yet, on the day of deployment, all hell breaks loose because someone forgot to loop in the Ops team.

This is where my personal experience differs from a lot of others, because I was part of a development team building a product where the developers were sitting right next to the system administration team. Within sprints, our DevOps team was building both system features and application features, making the application highly available was a story on the board next to an actual end user feature.

In the old days, a new feature that was scheduled for Friday couldn't be brought online for a couple of days because it couldn't be deployed to production. In the new setup, deploying to production was a no brainer as we had already tested the automated deployment to the acceptance platform.

This brings us to the first benefit : Actually being able to go live.

The next problem came on a Wednesday evening. A major security issue had popped up in Drupal and an upgrade needed to be performed, however nobody dared to perform the upgrade as they were afraid of breaking the site. Some people had made changes, they hadn't put their config back in code base, and thus the site didn't get updated. This is the typical state of the majority of any type of website where people build something, deploy it and never look back. This is the case until disaster strikes and it hits the evening news.

Teams then learn that not only do they need to implement features and put their config changes in code, but also do continuous integration testing on their sites.

From doing continuous integration, they go to continuous delivery and continuous deployment, where an upgrade isn't a risk anymore but a normal event which happens automatically when all the tests are green. By implementing infrastructure as code, they now have achieved 2 goals. By implementing tests, we build the confidence that the code was working, but also made sure that the number of defects in that code base went down so the number of times people needed to dig back into old code to fix issue also came down.

By delivering better software in a much more regular way, it enables the security issues to be fixed faster, but also brings new features to market faster. With faster, we often mean that there is an change from releasing software on a bi-yearly basis to a release each sprint, to a release whenever a commit has passed a number of test criteria.

Because they started to involve other stakeholders, the value of their application grew as they had faster feedback and better usage statistics. The faster feedback meant that they weren't spending as much time on features nobody used, but focusing their efforts on things that mattered.

Having other stakeholders like systems and security teams involved with early metrics and taking in the non functional requirements into the backlog planning meant that the stability of the platform was growing. Rather than people spending hours and nights fixing production problems, Potential issues are now being tackled upfront because of the
communication between devs and ops. Also, scale and high availability have been built into the application upfront, rather than afterwards -- when it is too late.

So, in the end it comes down to the most important part, which is that devops creates more happiness. It creates more happy customers, developers, operations teams, managers, and investors and for a lot of people it improves not only application quality, but also their life quality.

Jun 04 2014

The Rise of the DevOps movement

This is a repost of an article I wrote for the Acquia Blog some time ago.

DevOps, DevOps, DevOps … the whole world is talking about DevOps, but what is DevOps?

Since Munich 2012, DrupalCon had a dedicated devops track. After talking to
a lot of people in Prague last month, I realized that the concept of DevOps is still very unclear to a lot of developers. To a large part of the development community, DevOps development still means folks working on 'the infrastructure part' of the development life cycle and for some it just means simply deploying Drupal, being concerned about purely keeping the site alive etc.

Obviously that's not what DevOps is about, so let's take a step back and find out how it all started.

Like all good things, Drupal included, DevOps is a Belgian thing!

Back in 2009 DevopsDays Europe was created because a group of people met over and over again at different conferences throughout the world and didn’t have a common devops conference to go to. These individuals would talk about software delivery, deployment, build, scale, clustering, management, failure, monitoring and all the important things one needs to think about when running a modern web operation. These folks included Patrick Debois, Julian Simpson, Gildas Le Nadan, Jezz Humble, Chris Read, Matt Rechenburg , John Willis, Lindsay Holmswood and me - Kris Buytaert.

O’Reilly created a conference called, “Velocity,” and that sounded interesting to a bunch of us Europeans, but on our side of the ocean we had to resort to the existing Open Source, Unix, and Agile conferences. We didn't really have a common meeting ground yet. At CloudCamp Antwerp, in the Antwerp Zoo, I started talking to Patrick Debois about ways to fill this gap.

Many different events and activities like John Allspaw and Paul Hammond’s talk at “Velocity”, multiple twitter discussions influenced Patrick to create a DevOps specific event in Gent, which became the very first ‘DevopsDays'. DevopsDays Gent was not your traditional conference, it was a mix between a couple of formal presentations in the morning and open spaces in the afternoon. And those open spaces were where people got most value. The opportunity to talk to people with the same complex problems, with actual experiences in solving them, with stories both about success and failure etc. How do you deal with that oldskool system admin that doesn’t understand what configuration management can bring him? How do you do Kanban for operations while the developers are working in 2 week sprints? What tools do you use to monitor a highly volatile and expanding infrastructure?

From that very first DevopsDays in Gent several people spread out to organize other events John Willis and Damon Edwards started organizing DevopsDays Mountain View, and the European Edition started touring Europe. It wasn’t until this year that different local communities started organizing their own local DevopsDays, e.g in Atlanta, Portland, Austin, Berlin, Paris, Amsterdam, London, Barcelona and many more.

From this group of events a community has grown of people that care about bridging the gap between development and operations, a community of people that cares about delivering holistic business value to their organization.

As a community, we have realized that there needs to be more communication between the different stakeholders in an IT project lifecycle - business owners, developers, operations, network engineers, security engineers – everybody needs to be involved as soon as possible in the project in order to help each other and talk about solving potential pitfalls ages before the application goes live. And when it goes live the communication needs to stay alive too.. We need to talk about maintaining the application, scaling it, keeping it secure . Just think about how many Drupal sites are out there vulnerable to attackers because the required security updates have never been implemented. Why does this happen? It could be because many developers don't try to touch the site anymore..because they are afraid of breaking it.

And this is where automation will help.. if we can do automatic deployments and upgrades of a site because it is automatically tested when developers push their code, upgrading won't be that difficult of a task. Typically when people only update once in 6 months, its a painful and difficult process but when its automated and done regularly, it makes life so much easier.

This ultimately comes down to the idea that the involvement of developers doesn’t end at their last commit. Collaboration is key which allows every developer to play a key role in keeping the site up and running, for more happy users. After all software with no users has no value. The involvement of the developers in the ongoing operations of their software shouldn't end before the last end user stops using their applications.

In order to keep users happy we need to get feedback and metrics, starting from the very first phases of development all the way up to production. It means we need to monitor both our application and infrastructure and get metrics from all possible aspects, with that feedback we can learn about potential problems but also about successes.

Finally, summarizing this in an acronym coined by John Willis and Damon Edwards
- CAMS. CAMS says Devops is about Culture, Automation, Measurement and Sharing.
Getting the discussion going on how to do all of that, more specifically in a Drupal environment, is the sharing part .

Dec 22 2013

FOSDEM 2014 is coming

and with that almost a full week of side events.
For those who don't know FOSDEM, (where have you been hiding for the past 13 years ? ) Fosdem is the annual Free and Open Source Developers European meeting. If you are into open source , you just can't mis this event where thousands of likeminded people will meet.

And if 2 days of FOSDEM madness isn't enough people organise events around it.

Last year I organised PuppetCamp in Gent, the days before Fosdem and a MonitoringLove Hackfest in our office the 2 days after FOSDEM This year another marathon is planned.

On Friday (31/1/2014) the CentOs community is hosting a Dojo in Brussels at the IBM Forum. (Free, but registration required by the venue)

After the success of PuppetCamp in Gent last year we decided to open up the discussion and get more Infrastructure as Code people involved in a CfgMgmtCamp.eu

The keynotes for CfgMgmtCamp will include the leaders of the 3 most popular tools around , both Mark Burgess, Luke Kanies and Adam Jacob will present at the event which will take place in Gent right after Fosdem. We expect people from all the major communities including, but not limited to , Ansible, Salt, Chef, Puppet, CFengine, Rudder, Foreman and Juju (Free but registration required for catering)

And because 3 events in one week isn't enough the RedHat Community is hosting their Infrastructure.next conference after CfgMgmtCamp at the same venue. (Free but registration required for catering)

cya in Belgium next year..

Nov 27 2013

Docker vs Reality , 0 - 1

(aka the opinionated summary of the #devopsdays London November OpenSpace on , Containers and the new flood of Image Sprawl)

There's a bunch of people out there that think I don't like docker, they are wrong.

I just never understood the hype about it since I didn't see, (and still don't) see it being used at large and people seem to understand that as being against it.

So let me put a couple of things straight :

There's absolutely nothing wrong with using a container based approach when deploying your infrastructure. If you remember my talks about the rise of Open Source Virtualization some years ago you've noticed that I've always mentioned OpenVZ and friends as good alternatives if you wanted to have a lot of isolated platforms on one machine. LXC and friends have grown .. they are even more usable these days. Years ago people bought bare metal and ran Hypervisors on it to isolate resources. These days people rent VM's and also want the same functionality so the use of the combination of Virtualization and Container based technologies is a very good match there.

There's also nothing wrong with using Infrastructure as Code tools to build an reproducable image you are going to deploy will provide you with a disposable image which allows you to quickly launch a reproducable and versionned platform for your application if that application is supposed to be shortlived. The tooling around today is not yet there to have these images long lived as you still need to manage the config inside the containers as your application will evolve, it will change, your environment will change (think even about changing to a different loghost..) , but when you don't have to keep state you can dispose the image and redeploy a new reproducable one.

In the embedded world, this kind of approach with multiple banks has been a round for a while , one image running, a second bank as a fallback, and when you upgrade the passive bank you can swap the roles and still have roll back.

There's is also nothing wrong on combining these to approaches and using tools such as Docker and Packer.

But there is lot wrong with building images that then start living their own life, tools like Veewee etc saw the light to create an easy way to make sure the JeOS image (Just Enough Operating System) we created was reproducable, not to ship around virtual appliances.

But, lets be realistic, the number of applications that are suitable for this kind of environment is small. Most applications these days are still very statefull, and when your application contains state you need to manage that
that state, you can't just dispose an image which has state. Specially in an Enterprise environment stateless, immutable applications are really the exception rather than the rule.

When your application maps with stateless and short lived, or a some people like to call it Immutable please do so.. but if it doesn't please remember that we started using configuration management tools like CFengine, Puppet and Chef to prevent Image Sprawl and Config Drift
There's proprietary businesses out there building tools to detect config drift and extort organisations to solve problems that shouldn't have existed in the first place.

Luckily the majority of smart people I've spoken to over the past couple of weeks pretty much confirmed this ...
Like one of the larger devops minded appliation hosting outsourcers in emea, I asked them how much % of their customer base they could all "Immutable" , exactly 0% was the answer.

Image Based Container solutions are definitely not a one size fits all solution, and we have along way to go before we get there if at all ..

Till then I like not to diffuse my attention to too many different types of deploying platforms, just not to make stuff more complex than it already is...as complexity is the enemy of reliability