About a year ago one of my clients, a software development company called Avisi, asked me to help them design and build a new infrastructure for their company. They had a few requirements about security, scalability, performance, flexibility, etcetera. But there were a few notable requirements:
From the start, it was clear that automation was going to be a key part of our infrastructure. As we had prior experience with Puppet, we decided to 'Puppet all the things'. That was easier said than done, though.
We had to deal with several different development teams that had very different requirements, wanted support for different Linux distributions, different versions of Java, Oracle, PHP, MySQL, etcetera. Then there were all the infrastructure services (DNS, LDAP, backup, syslog, monitoring), development services (SCM, build, test, deploy, repository, QA tooling), collaboration services (issue tracking, wiki) some websites and the usual 'one-offs', that all needed to be 'Puppetized'.
One year down the road we seem to have accomplished everything we set out to do. We have eliminated root permissions for developers and manual changes, deploying new servers takes just minutes and includes fully automated configuration of monitoring and backup. But most important, the developers are actively using Puppet to deploy their applications on the development infrastructure.
This has resulted in a fairly large codebase. Some numbers:
While the above may seem pretty successful, our fairly complex Puppet setup did introduce a few new challenges:
There is no single solution to this problem, but there are a few guidelines that could help you steer clear from the common pitfalls of complex Puppet setups.
Separating data from code has a few advantages. First, it allows for easy re-use of code. Second, it forces you to think ahead while writing code, make your modules highly configurable, and decide on sane defaults. Third, it allows you to expose the 'data-part', or node-classification separately, so the actual configuration of your nodes doesn't necessarily require any programming skills.
When separating data from code, you obviously need a place to store your data. The most obvious choice currently is Hiera, which is built into Puppet. Hiera is a key/value lookup tool that uses a configurable hierarchy and supports multiple backends.
Other options are ENCs (External Node Classifiers) like the Puppet Enterprise Dashboard or The Foreman.
Depending on the node classifier you choose, you can configure nodes using YAML, JSON or web interfaces.
Please continue reading at Benny's blog for more guidelines that could help you steer clear from the common pitfalls of complex Puppet setups!