Testing microservices

Given its by nature small size, the biggest complexity in a microservice is not within the service itself, but in how it interacts with others. Because of this, the [[Testing honeycomb]] is a more appropriate metaphor for thinking about what tests to write for a microservice-based system in place of the more traditional [[Testing pyramid]]. This testing approach is also applicable to [[Serverless testing MOC]], where in particular you should [[Use an integration test-first approach to testing serverless applications]]. --- ## References - [Testing of Microservices (Spotify Engineering)](https://engineering.atspotify.com/2018/01/11/testing-of-microservices/) - [Testing Microservices, the sane way](https://medium.com/@copyconstruct/testing-microservices-the-sane-way-9bb31d158c16) by Cindy Sridharan - Highlights - trying to spin up the full stack on developer laptops is fundamentally the wrong mindset to begin with, be it at startups or at bigger companies. - To this end they wind up investing excessive engineering resources to build out complex CI pipelines and intricate local development environments to meet these goals. Soon enough, sustaining such an elaborate setup ends up requiring a team in its own right to build, maintain, troubleshoot and evolve the infrastructure. Bigger companies can afford this level of sophistication, but for the rest of us treating ==testing as what it really is — a best effort verification of the system — and making smart bets and tradeoffs given our needs appears to be the best way forward.== - Development teams are now responsible for testing as well as operating the services they author. This new model is something I find incredibly powerful since it truly allows development teams to think about the scope, goal, tradeoffs and payoffs of the entire spectrum of testing in a manner that’s realistic as well as sustainable. In order to be able to craft a holistic strategy for understanding how our services function and gain confidence in their correctness, it becomes salient to be able to pick and choose the right subset of testing techniques given the availability, reliability and correctness requirements of the service - With sufficiently advanced monitoring & enough scale, it’s a realistic strategy to write code, push it to prod, & watch the error rates. If something in another part of the app breaks, it’ll be apparent very quickly in your error rates. You can either fix or roll back. You’re basically letting your monitoring system play the role that a regression suite & continuous integration play on other teams. - If there’s anything I’ve learned in the last few years of witnessing how services fail, it’s that pre-production testing is a best effort verification of a small subset of the guarantees of a system and often can prove to be grossly insufficient for long running systems with protean traffic patterns. - Pre-production testing is something ingrained in software engineers from the very beginning of their careers whereas the idea of experimenting with live traffic is either seen as the preserve of Operations engineers or is something that’s met with alarm and/or FUD. - Pushing regression testing to post-production monitoring requires not just a change in mindset and a certain appetite for risk, but more importantly an overhaul in system design along with a solid investment in good release engineering practices and tooling. In other words, it involves not just architecting for failure, but in essence, coding for failure when the heretofore default was coding for success. And that’s a notion, I’d wager, a substantial number of developers aren’t too comfortable with. - Distributed systems are pathologically unpredictable and it’s impossible to envisage the combinatorial number of quagmires various parts of the system might end up in. The sooner we come to terms with the fact that it’s a fool’s errand to try to predict every possible way in which a service might be exercised and write a regression test case for it, the sooner we’re likely to embrace a less dysfunctional approach to testing. - In the past I’ve argued that “monitoring everything” is an anti-pattern. I feel the same philosophy can be extended to testing as well. One simply cannot — and as such should not attempt to — test everything. - Extreme reliability comes at a cost: maximizing stability limits how fast new features can be developed and how quickly products can be delivered to users, and dramatically increases their cost, which in turn reduces the number of features a team can afford to offer. - Our goal is to explicitly align the risk taken by a given service with the risk the business is willing to bear. We strive to make a service reliable enough, but no more reliable than it needs to be. - ==What does it mean for a developer to “code accordingly”? In my opinion, this boils down to three things: - ==understanding the operational semantics of the application - understanding the operational characteristics of the dependencies - writing code that’s debuggable== - ==four axes — the goal, scope, tradeoffs and payoffs — can prove to be a good proxy for being able to assess how effective any form of testing might be.== - the best approach for making peace when working with abstractions — even the leakiest ones — is to (grudgingly) repose trust in the promised contract. While hardly ideal, this tradeoff makes most sense to me in respect of getting anything shipped at all. - the prevalent best-practice of testing such systems is by treating the underlying I/O not as an integral part of the unit under test but as a nuisance that needs to be warded away with mocks, to the point where all unit-testing these days has become synonymous with heavy mock usage. - Unit testing such service-critical I/O with mocks inherently embodies a sellout since it not just sacrifices accuracy at the altar of speed, but also ends up shaping our mental model in a way that’s almost entirely dissonant with the actual characteristics of the system we’re building. In fact, ==I’d go so far as to say that unit testing with mocks (we might as well call this mock testing), for the most part, is tantamount to validating our incomplete (and also very possibly flawed) mental model of the most business critical components of the systems we’re authoring and serves as one of the most insidious forms of confirmation bias.== - The biggest weakness of mocks when used as a testing tool is that both when simulating success as well as failure, mocks are a programmer’s mirage of a sliver of a future they cannot fathomably appreciate or even approximate in their minds during feature development time. - All things considered, test doubles have their place in the testing spectrum. But they aren’t the only means of performing unit tests and in my opinion work best when used sparingly. - The main thrust of my argument wasn’t that unit testing is completely obviated by end-to-end tests, but that being able to correctly identify the “unit” under test might mean accepting that the unit test might resemble what’s traditionally perceived as “integration testing” involving communication over a network. - Given how broad a spectrum testing is, there’s really no One True Way of doing it right. Any approach is going to involve making compromises and tradeoffs. - Ultimately, every individual team is the expert given the specific context and needs.