At the beginning of March 2017, Amazon Web Services had an outage. This time it was a failure in their S3 file storage service in a particular geographical region. On a micro level, this is just what happens from time to time if you’re an IT provider. Mistakes are made or technical malfunctions happen. It’s just a part of supplying IT systems. Most of the time the customer doesn’t even notice, but sometimes they do. An issue comes up, the organisation fixes it, says sorry to its customers, changes things so it won’t happen again, and hums along until the next malfunction occurs (which it will).
The bigger problem is: half of the internet breaks when big IT suppliers break or have a malfunction. Only a small portion of startups or consumer services have (automatic) fail-over systems. But usually, they just depend on a single Cloud vendor to keep their business running.
As is for any system of scale: the complexity to maintain it increases, it’s slower to change and harder to keep running. Let alone that consumers of such a system will start to depend too much on it, so when it breaks, everything breaks.
The luxury of having high level cloud services like AWS S3 is that you can start a business, product, or service without having much know-how of running the underlying infrastructure. This has sprouted numerous awesome startups and products the past decade that wouldn’t have existed without cloud services.
But we want to prevent the market from being dependent only a few suppliers. For technical and availability reasons, but also for legal, data protection, and monopoly reasons. We don’t want half of the internet to fail when one vendor messes up, or gets bought by an incompetent larger party.
To prevent more and more people from being dependent on a single cloud supplier, I think we must create tools and teach developers to run their own infrastructure. Or at least, they need to set up an infrastructure that can be easily moved to another vendor without being tied to a vendor. With easy to use tools and better training, we can make people that build products on top of services out there much more independent of single-point-of failure outages at large cloud providers.