Distributed Systems & team size

In 2011, a million simultaneous requests per minute against a central inventory system would have been considered a lot. It would have been a remarkable system to have built in Spring. I was a replacement java architect brought in to cover a departing Java Architect while working as a senior consultant and my job would be to comprehend the departing architect’s design, evolve it as we discovered implementation problems, as well as head off any new challenges altering the prescribed design as the situation evolved. For me, at 12 members with a half dozen stake-holders, it would be the largest team I’d managed in my career.

Team Sizes and Coordination Costs

This means I have to in practice start breaking the team into sub-teams. Which is fine if I know my delegates well and I trust they’ll follow through on our shared vision. The issue becomes the coordination cost between group members. That’s effectively the same problem distributed systems have.

CAP Theorem and Human Teams

For a work-place, in 2011, a common solution to avoiding the CAP problem was to co-locate the teams in a single large open space. This avoided partitioning by physically putting everyone together. Putting everyone on the same mandatory work schedule avoided availability problems, but what avoids the issue with consistency in all the different ways a team of developers have to remain consistent? In simple human terms: technical leadership and management.

And then COVID-19

Consensus Algorithms in human terms

In these classic techniques of managing distributed consensus is handled the way I resolved my large developer team problems: appointing (officially or organically) local leaders who have higher authority over sub-domains of the unified problem domain. In human terms this means a database team, a model team, a service team, a UI/UX team, and so on. When an issue cuts across teams you now have to up-level the discussion to appeal to either a group consensus arrived by good-faith argument &voting or appeal to a higher authority.

I’ve been calling the last ditch “appeal to higher authority” gambit appealing to a jurisdiction. That’s because when we look at law as a technology it has the same basic problems any human-centric technology has. You need to have appellate courts in human court systems as well as the ability to bend or break the rules when a situation demands it. So to will a computerized system have such out-of-bounds situations that a central original designer simply won’t be able to anticipate.

CI/CD as a distributed consensus system

Automated Technical Leadership

In this spirit, I’ve tried creating pressure on teams to pass the CI/CD as a kind of automated specification review system. The idea is to codify which people need to be gate-keepers on what changes, and to put guidelines like test coverage, documentation standards, formatting, and specifications into the pre-build and test systems for a development team.

A lot of managers hear this and think: “that’s QE/QA work.” While that would not be completely wrong, it also misses the point. This early “quality” test can’t possibly replace real Quality Engineering work. For one, the pre-build quality we’re testing here is a completely different level of abstraction. This kind of work is attempting to heat-up issues in the consistency of developer practice and understanding so that it can be found without having to scrutinize every individual contribution.

A full Quality Engineering (QE) solution would not necessarily cover issues internal to the semantics of the developer’s practice. Instead, this separate and powerful practice is involved in isolating software system behaviors in increasingly accurate simulations of the production environment. The QE is interested in identifying mismatches between the code produced and it’s actual implicit requirements imposed on it by the real-world operating environment. This is subtly related to but independent from the vision, understanding, and execution of how the software achieves its goals.

Developer Consensus in Distributed Development

The formula for succeeding at this distributed development endeavor is unsurprisingly: provide clear vision, clean boundaries, with jurisdictions and appeals. It’s not as clean as central control, but it is more human. And, the human factors are more important in the long run.

Cloud Software and Security R&D