Platform engineering vs DevOps & SRE



Is platform engineering DevOps or SRE? Are they all the same thing? what is the difference?

You have probably heard all of these different terms mentioned a lot recently and you have been wondering what they all mean and what are the differences? 

In order to break this down as simply as possible to someone who is new to these terms lets start at the beginning. In the past we had on-premises infrastructure which was generally servers, networking, storage and other equipment housed in a local or colocation datacenters. This was the core infrastructure used for providing access to business applications used internally and externally. These servers were generally used as physical machines running the operating system and applications locally. These applications were generally monolithic and isolated. We then moved to using virtual machines on top of hypervisors to split many machines onto one physical box. Virtualization was born. The operations team was responsible for managing this infrastructure and the development team would focus on actually coding the application to be hosted on this physical infrastructure.

As the cloud became the next generation we started migrating workloads to cloud providers and started to share the infrastructure operation with cloud companies. Applications started being hosted within cloud datacenters to be used as IaaS, PaaS or SaaS internally or externally. Next we started using started using containers and container orchestration like Kubernetes to abstract and manage the applications and infrastructure more effectively. The monolithic applications were broken up into microservices which made deployment and management smoother. Google pioneered these developments on Kubernetes and open sourced the platform which is now the standard.

SRE

The term SRE was initially created by Google in order to help maintain the reliability of these container and cloud based microservice workloads. The SRE* discipline was invented to manage, automate, reduce toil*, monitor and ultimately increase reliability across infrastructure and apps, using various SLI's,  SLO's, error budgets* and postmortems to improve the end user experience. SRE was Google's internal approach to DevOps, which in itself was a methodology of bringing together Development and Operations teams on working towards a common goal. Whereas before operations were more focused on the actual infrastructure and development on coding. SRE bridged this gap and started to be adopted as a new culture within organizations.

DevOps

The term DevOps* was and still is interpreted very differently between different organizations and individuals, where some understand the term to be focused on development coding work and others on infrastructure management. Some organizations understand it as job titles and tooling, using different automation or software delivery tools to improve the flow and testing of software from development to production. Ideally it should possibly be viewed as a cultural philosophy of the management of software delivery by operations and development teams working together. Tooling is very important but can be interchangeable based on the specific use case and can depend on the requirements of the project at hand. DevOps can be related to most modern software delivery practices and can contain various aspects related to SRE, such as reducing toil, automation, postmortems and monitoring and observability of systems. DevOps can also help to reduce the finger pointing and placing blame on ops vs dev as per the SRE principals.

Platform engineering

Platform engineering can probably be best explained as the next evolution of DevOps, it is focused on improving the velocity of development and making it easier on development teams to get their code working as quickly and reliably as possible. This can potentially be by automating infrastructure deployment using developer-centric self service portals or infrastructure as code. Platform engineering focuses on reliable, uniform deployment of resources as quickly and efficiently as possible. Monitoring, alerting and observability are also very important areas to make sure that the applications are running correctly. This can make a big difference in the more agile approach to development which introduces constant iterations. By using CI/CD and other methods we can automate the deployment of updates and new features reliably at velocity.

Platform engineering future

At its essence platform engineering is taking DevOps to a new level where cloud computing, development, infrastructure deployment and operations work hand in hand and focus on automations and reliability of services. These can also start to introduce DevSecOps to include security monitoring and compliance. Once again platform engineering will probably be used differently by various organizations based on their interpretations of the term, culture and methodologies but in the end this is a great new method to adopt slowly, focusing on continual improvements, learning from incidents, and improving the reliability of systems as a whole forward.

Here are some great articles and further in depth information on platform engineering, DevOps and SRE:

https://cloud.google.com/blog/products/application-development/common-myths-about-platform-engineering

https://cloud.google.com/blog/products/application-development/another-five-myths-about-platform-engineering

https://cloud.google.com/blog/products/application-development/how-to-become-a-platform-engineer

https://dora.dev/devops-capabilities/

https://sre.google

https://sre.google/sre-book

https://sre.google/sre-book/eliminating-toil/

https://cloud.google.com/blog/products/devops-sre/systems-engineering-learning-resources-to-become-an-sre

https://cloud.google.com/blog/products/devops-sre/sre-fundamentals-slis-slas-and-slos

No comments:

Post a Comment