How our SRE Studio efficiently manages cloud infrastructures
AuthorJuan Andrés Zeni
Juan Andres is the SRE Studio Lead at Moove It, where he works in the design and implementation of fault-tolerant, cost-efficient, and resilient architectures in the cloud for agile software projects.
Many businesses today are focusing on managing more effectively their cloud environments, while also working to reduce errors and ensure the performance of their business critical applications. Our Site Reliability Engineering (SRE) Studio is at the forefront of these efforts to improve the management of our client’s infrastructure.
We wanted to share here some of the elements the team have been working on- both tools that we’ve developed internally, as well as how we use open source and commercially available tools to best help our clients with their technology environments.
SRE Bootstrapper: Moove It’s tool to rapidly set up new cloud environments and more easily provision network instances
With our SRE Bootstrapper tool we automate the cloud infrastructure setup when we start working with a new client. A typical manual process for setting up the infrastructure, and provisioning the network instances, takes about a day and a half – mixed in of course with the risks of human errors. With the tool, we do it in minutes. Having an automated process means we save a significant amount of time right at the beginning of projects with clients. It also enables us to provision almost everything – instances, networks, secrets, logs, certificates, domains, and scaling, and much more.
We decided to build the tool using Terraform as it enables us to use infrastructure-as-code. In plain English, that means we can simply write the code, “hit play”, and everything is automatically deployed.
We’re already working on the future iterations of Bootstrapper. In particular we want to ensure our teams have significant flexibility to adapt to specific or unique situations. For example, the next iteration will make it easier for us to create multiple environments (production, development, QA, staging etc) with the same infrastructure-as-code configuration.
Tools for developers
Here at Moove It, we focus on building quality software. As part of this, many of our teams take advantage of Sentry, which provides open-source error tracking and performance monitoring.
We have also created a module in Terraform for error monitoring, which sets up alarms on CloudWatch for any AWS hosted system – this provides notifications about aspects such as application downtime, 5XX HTTP errors, and resource utilization. Importantly, it automates the creation of alarms on the infrastructure – this means, for example, that businesses can ensure their customers never get to the point where they experience their system being down. Notifications are sent via email or Slack (or whichever preferred channel) so that the team can take preventive actions.
We also use Github Actions to implement continuous integration/ continuous development (CI/CD) pipelines, which then automate things such as running unit tests on code, linter on every merge, automatic image building and compilation, automatic deploys, and much more.
How it works in practice: Bringing it all together for a client
For one of Moove It’s clients we were faced with a situation where they had 3 main repositories. One of these functioned as the base service. This was then used by a microservices application with 9 components. Then there was a third repository which had various deployment tools. To enable the images in the application involved hundreds of lines of sequential code, clearly bringing various problems to their development team.
We identified four main problems:
- Multiple manual processes being executed locally.
- Missing access control and roles for tasks and processes.
- No centralized logs and their execution history was missing.
- No notifications for process results.
After starting work, we achieve the following key objectives:
- Moved tasks to their corresponding repository, removing unnecessary dependencies between them.
- Implemented build and push image automation.
- Reduced the build time via parallel execution and cache.
- Set up of notifications for build release process results – this also solved the issue of the execution history.
The tools that our SRE Studio has created, and also those they have evaluated, enable all teams at Moove It to provide a best in class service to our clients, and manage their cloud environments effectively.
Interested in learning more about Moove It’s SRE Studio and the work it does? The check out https://moove-it.com/studios/sre-studio