Recap of the situation
In the first part of this blog series, we introduced you to our evolutionary grown IT landscape. We had a room full of snow- flaked servers and no overall concept how to use them. We wanted our services to be self-contained and separated. So we chose the approach of virtualization to host one VM per service on a uniform platform. We chose VirtualBox, Vagrant and Ansible to help us along the way.
This blog entry tells you about the way and our experiences and insights.
In order to migrate every service you use to its own virtual machine (VM), you’ll need a list or map of your services first. We gathered our list, compared it to reality, adjusted it, reiterated everything, added the forgotten services, drew the map, compared again, drew again and even then missed some services that are painfully obvious in hindsight, like DNS or SMTP. We identified more than 15 distinct services and estimated their resource profile. Then we planned the VM layouts and estimated the required computation power to host all of them. Then we bought the servers.
We started with three powerful hosting servers but soon saw that there is a group of “alpha VMs” with elevated requirements on availability and bought a fourth hosting server with emphasis on redundancy. If some seldom used backoffice service goes down, that’s one thing. The most important services of our company should not go down because of a harddisk failure or such.
Four nearly identical hosting servers to run 15+ VMs on required a repeatable process to set things up. This is where the first tip comes into play:
- Document everything. Document all the details. Have your Wiki ready and write a step-by-step tutorial for every task you perform. It’s really tedious and probably a bit cumbersome at first, but it will pay of sooner and better than you’d imagine.
We started the migration process with the least important services to get a feeling for the required steps. It turned out later that these services were also the most time-consuming ones. The most essential and seemingly complex services took the least time. We essentially experienced the pareto effect but in reverse: We started with the lowest benefit for the highest cost. But we can give two tips from this experience:
- Go the extra mile. Just forget about the pareto effect and migrate all services. It’s so much more fun to have a clean IT landscape map than one where most things are tidy but there’s an area marked “here be dragons”.
- Migration effort and service importance aren’t linked. Our most important service was migrated in about half an hour. Our least important service needed nearly three days. It’s all about the system architecture of the service and if it values self-containment.
The migration took place over the course of a few months with frequent address changes of our tools and an awful lot of communication for cutoff dates. If you need to migrate a service, be very open about the process and make sure that the old service address won’t work after the switch. I cannot count the amount of e-mails I wrote with the subject prefix “IMPORTANT!”. But the transition went smooth and without problems, so we probably added some extra caution that might not have been necessary.
After the migration
When we had migrated our last service in its own VM, there were a lot of old servers without any purpose anymore. We switched them off and got rid of them. Now we had nearly two dozen new servers to care for. One insight we had right after the start of our journey is that virtualized servers require the same amount of administration as physical ones. Just using our old approaches for the new IT landscape wouldn’t cut it. So we invested heavily in automation and scripted everything. Want to set up a new CI build slave? Just add its address into Ansible’s inventory and run the script (“playbook”). All servers need security updates? Just one command and a little wait.
Learning to automate the administrative tasks in the right way had a steep curve, but it’s the only feasible way. We benefit heavily from the simple fact that we forced ourselves to do it by making it impossible to handle the tasks manually. It’s a “burned bridges” approach, but upon reaching the goal, it really pays off. So another tip:
- Automate everything. Even if you think you’ll perform this task just a few times – that’s exactly the scenario to automate it to never have to bother with the details again. Automation is key if you want to scale your IT landscape to reasonable sizes.
Reaping the profit
We’ve done the migration and have a fully virtualized setup now. This would not be very beneficial in itself, but opens the door for another level of capabilities we simply couldn’t leverage before. Let me just describe two of them:
- Rethink your backup strategy. With virtual machines, you can now backup your services on an appliance level. If you wanted to perform this with a real server, you would need to buy the exact same hardware, make exact copies of the harddisks and store this “clone machine” somewhere safe. Creating an appliance level backup means to stop the VM, export it and restart it. You’ll have some downtime, but everything else is just a (big) file.
- Rethink your service maintainance strategy. We often performed test upgrades to newer versions of our important services on test machines. If the upgrade went well, we would perform it again on the live server and hope for the best. With virtual machines and appliance backups, you can try the upgrade on an exact copy of the live server over and over again. And if you are happy with the result, you just swap your copy with the live server and everything’s fine. No need for duplicated procedures, you always work with the real deal – well, an indistinguishable copy of it.
We’ve migrated our IT landscape from evoluationary to a planned virtualized state in just about a year. We’ve invested weeks of work in it, just to have the same services available as before. From a naive viewpoint, nothing much has changed. So – was it worth it?
The answer is short and clear: Absolutely yes. Even in the short time after the migration, the whole setup performs smoother and more in a planned way than just by chance. The layout can be communicated clearer and on different levels. And every virtual machine has its own use case, to the point and dedicated. We now have an IT landscape that obeys our rules and responds to our needs, whereas before we often needed to make hard compromises.
The positive effects of documentation and automation alone are worth the journey, even if they are mere side effects of the main goal. +1, would migrate again.