DevOps Pulse is an annual analysis of the DevOps industry, conducted by logz.io. Findings from the 2019 version have just been published, and we take the opportunity to share and comment on them with Tomer Levy, logz.io CEO. Levy co-founded logz.io in 2014 and has managed to raise about $100 million and build a team of 250 people to date, on the premise of offering an open-source based solution for log management.
The 2019 survey focuses on observability, bringing DevOps engineers’ insights into what it means to make systems observable, how to achieve that outcome, and how observability contributes to maximizing product performance. It may well give a glimpse into the future of software development in the 2020s. First software development merged with operations, giving us DevOps. Then DevOps became data-driven. What’s next?
Who does DevOps, and what are the issues?
Logz.io press release highlights the growing popularity of observability, the strategic role it plays in DevOps processes, and the challenges preventing teams from achieving full observability into their systems. With about 1000 people taking the survey, the majority of whom were not logz.io customers, it should be reasonably close to being representative of the industry.
What’s piqued our interest to boot was the diversity of (self-identified) roles of the people who took the survey. From managers and directors to developers, DevOps engineers, and system administrators. Last time we checked, boundaries and responsibility split between some of those roles did not seem exactly clear. Looks like this is still the case.
As DevOps has become mainstream, R&D teams are sharing the responsibility for observability across multiple roles. DevOps teams are still largely responsible for ensuring observability, but Developers and Operations are not far behind. Or at least, this is how the logz.io press release puts it. Our experience is a bit different.
We are not aware of that many organizations in which there are actually different, nonoverlapping (in terms of staff) DevOps, developer, and operations teams. More often than not, when people say DevOps, they actually mean the practice, not the team, because in many organizations there is no such thing as a dedicated DevOps team. Levy clarified that they let respondents characterize their role in any way they want:
“Many respondents call their role ‘DevOps’ even though we know that for many companies, DevOps means a methodology. But in addition to that, there is often a team assigned to ‘own’ it. Yes, from our experience, DevOps/SREs/Platform teams are usually the main owners of the observability systems because they administrate them and set up the guidelines.
The various engineering teams who develop features and capabilities usually use them to achieve visibility into their micro-services/apps but do not own the overall availability KPIs.
Operations teams are traditionally more ‘system admin’ oriented and are usually part of IT (not always). So as architectures become more programmatic with tools like Terraform and practices like SREs, there may be less of a need for manual operations. This also raises the conflict between engineering and IT. If Operations is more often than not part of IT and SREs/DevOps are part of engineering. This increases the centricity of developers who practice DevOps.”
But regardless of who does it, what is actually happening in DevOps today? As production environments become more distributed and ephemeral, it’s increasingly difficult for engineering teams to understand systems’ availability and performance. Despite the proliferation of monitoring tools, obtaining real-time visibility into production systems has never been more challenging, noted Logz.io VP of Product, Asaf Yigal. Here are some highlights from the report in support of this view:
- Tool sprawl is a significant and widespread issue for software engineers. Sixty-three percent of DevOps Pulse respondents report using more than one observability tool, while close to 14% use five or more.
- Logging is critical for observability. Over 73% reported using log management and analysis tools to gain observability. Infrastructure monitoring and alerting took second place, both at about 40%.
- Distributed Tracing has yet to be fully adopted. Sixty-six percent of DevOps Pulse respondents do not use distributed tracing tools, but Jaeger is the most popular tool among those who have adopted this technology.
DevOps in the cloud and machine learning era
We would say the above findings are not surprising. They confirm what we intuitively expected, and what we see in the field. The same applies to another key finding — the dominance of open source solutions. In observability, too, just like in databases, open source is winning big time. Open-source observability stacks are largely preferred over their proprietary counterparts. ELK is the most popular logging tool, Grafana is the most popular metrics tool, and Jaeger is the most popular distributed tracing tool.
This seems like a pretty accurate depiction of today’s landscape. What about tomorrow? Machine learning and the cloud are two of the most significant trends for the 2020s, and yes, you guessed it, they leave their mark on observability, too.
Machine learning is gaining momentum as an observability solution. Almost 40% of DevOps Pulse participants use or are considering machine learning solutions to improve observability. The idea is not new; we’ve seen it being used in production by independent vendors for at least the last two to three years.
The Googles, Microsofts, and Amazons of the world have been doing this for much longer. But 40% of the mainstream audience is a sure sign: It’s not just cloud behemoths and early adopters anymore — machine learning is taking over DevOps, too.
We did say cloud. So what does the new reality of applications moving en masse to the cloud mean for DevOps, and observability? It means trouble. Let’s pick serverless architectures, which is the new fancy toy for development teams everywhere, and one that cloud vendors are actively promoting. Serverless does not really remove the need for servers, but it obfuscates it. And that exactly is the problem, for observability and beyond.
Yes, servers are nasty beasts, troublesome to configure, spin up, manage and monitor. Most developers would be happy if they never had to deal with them again; if they could just write their code and see it being incarnated and executed out of thin air, for all they care. Serverless does exactly that.
But by taking away the notion of a server, and turning code to a loose collection of free-floating functions, Serverless also takes away structure and observability. Serverless is the biggest technical obstacle to observability. Despite more than 40% of survey respondents adopting serverless, 47% claim serverless technology presents the most challenges for obtaining observability. Levy concurs:
“Yes, we agree serverless makes obtaining visibility challenging, though we were surprised by the magnitude of it. We estimate that the main reason serverless is hard is due to the fact that it is so highly distributed. If you think about microservices, you may have 20, 50, or even 200. With serverless, you may have hundreds of different functions which makes connecting the dots even harder.”
It’s an AWS world, we just live in it
So what are DevOps people to do? Maybe they should simply not use technologies that are not observability-enabled? Once again, it’s open-source to the rescue. Open source is the ideal solution for tackling these problems, says Levy:
“When relying on open-source connectors, parsers, and dashboards, the developer community can quickly customize the observability for every new AWS service out there. For example, if AWS just released ~200 new services and features in the last few months, no single vendor can adapt to this so quickly. So if I am a developer and I want to use a new AWS service in production, it is ideal to simply create Grafana and Kibana dashboards in order to obtain visibility into these services.
At Logz.io, we have built many of these open-source connectors and have integrated many user-built parsers to make it easier to observe new services. In addition, our ELK apps enable our users to contribute and download dashboards from a broad community of user.”
Once again, we can’t help but notice the relationship between cloud vendors and open source. AWS promotes serverless, causing issues in observability. Then the community steps in to remedy those issues. Of course, if the community did not do that, there’s a good chance AWS would charge its users for those connectors and parsers.
There’s also a good chance, given AWS’s product development philosophy and track record of shipping loosely connected products, that those issues would not be resolved for a while. So what happens now is that instead of waiting for AWS to do this, the community does it for free, and everyone else, including AWS, benefits.
We don’t really know if the community is happier that way. What we do know is, that it’s a very convenient business model for AWS. No surprise AWS is calling the shots in DevOps, too. It’s an AWS world, we just live in it.