7 Reliability Questions Engineering Managers Need to Ask Their Teams

289

Modern software teams face no shortage of edge cases and variations across the service categories and tiers of their ever-evolving architectures. In the midst of leading a team through the day-to-day firefighting, it can be difficult to see the forest for the trees. But as managers, we know our teams face similar trials: defects and regressions, capacity problems, operational debt and dangerous workloads affect all of us.

And then there is the complexity of scale, something we know about first hand. The New Relic platform includes more than 300 unique services and petabytes of SSD storage that handle at least 40 million HTTP requests, write 1.5 billion new data points, and process trillions of events … every minute. The platform is maintained by more than 50 agile teams performing multiple production releases a week. To cope with serious scale like this, engineering teams must be nimble and fast moving. Their managers must also ensure that their teams adhere to reliability processes that support this kind of complexity and scale.

Read more at The New Stack