Tracking and Controlling Microservice Dependencies

452

Dependency cycles will be familiar to you if you have ever locked your keys inside your house or car. You can’t open the lock without the key, but you can’t get the key without opening the lock. Some cycles are obvious, but more complex dependency cycles can be challenging to find before they lead to outages. Strategies for tracking and controlling dependencies are necessary for maintaining reliable systems.

Reasons to Manage Dependencies

A lockout, as in the story of the cyclic coffee shop, is just one way that dependency management has critical implications for reliability. You can’t reason about the behavior of any system, or guarantee its performance characteristics, without knowing what other systems it depends on. Without knowing how services are interlinked, you can’t understand the effects of extra latency in one part of the system, or how outages will propagate. How else does dependency management affect reliability?

SLO

No service can be more reliable than its critical dependencies.8 If dependencies are not managed, a service with a strict SLO1 (service-level objective) might depend on a back end that is considered best-effort. …

After a disaster, it may be necessary to start up all of a company’s infrastructure without having anything already running. Cyclic dependencies can make this impossible: a front-end service may depend on a back end, but the back-end service could have been modified over time to depend on the front end. As systems grow more complex over time, the risk of this happening increases. Isolated bootstrap environments can also provide a robust QA environment.

Security

In networks with a perimeter-security model, access to one system may imply unfettered access to others.9 If an attacker compromises one system, the other systems that depend on it may also be at risk. Understanding how systems are interconnected is crucial for detecting and limiting the scope of damage. You may also think about dependencies when deploying DoS (denial of service) protection: one system that is resilient to extra load may send requests downstream to others that are less prepared.

Read more at ACM Queue