Diversity · Society

The Microsoft/Crowdstrike Disaster and the Fragility of Modern Society

One of the biggest news stories last week was the Crowdstrike software update disaster that led to many millions of Windows computers crashing. The solution was a manual intervention, meaning that millions of working hours were wasted. This affected critical infrastructure, such as airports, as these are the kind of machines companies prefer to have extended protection. This is precisely where the aforementioned company Crowdstrike comes in. On a completely unrelated note, I first heard of Crowdstrike in the context of an investment by the greatest investor of our time, Nancy Pelosi, a woman so good at investing that she managed to accumulated over $100 million dollars.

Upon reading about the Crowdstrike disaster, I could not help but think of my time at Pajeet Soft. While it is the case that key leaders within Engineering at Crowdstrike are, er, of the brown persuasion, I surely do not want to make the utterly outlandish claim that the underlying problem is diversity hiring. Most certainly, big companies like Microsoft and John Deere (thanks for the links, SMH!) laying off their DIE teams and seemingly washing their hands off this ideology is a grave error that these companies will bitterly reject in times to come.

If you read up on the source of the error, it is obvious that it could have happened to absolutely anyone. While there are well-established best-practices for testing and deploying software, these surely only hold back engineering geniuses hailing from the third world. Crowdstrike deployed their update to all machines at once, instead of doing a so-called staged rollout where you start with a few machines, ideally uncritical ones. Normally, there are explicit “staging environments” for that. Maybe incompetent white engineers need such guardrails. In the case of Crowdstrike, it was simply an unfortunate accident that says absolutely nothing about the quality of the work this company produces.

Modern society is incredibly complex. This is most obvious if you work in a technical field. I do not even want to put software engineering at the top of the list, not by a wide margin. When I look at the complexity of a nuclear power plant, or the national energy grid, then shuffling data around via the Internet is quite trivial. The actual physical structure of the Internet is not, not by any means. In software engineering, however, there is the interesting property that it is quite easy to hide problems. If you had blackouts every day, you would notice. However, you would not necessarily notice if your favorite app or website has been kept alive by duct tape.

Let me give you an example: imagine your server is only able to handle 1,000 concurrent users. Normally, this is not a problem for your app but sometimes there are peak hours where you have to deal with higher loads. One way of dealing with this problem, instead of improving your software, is to shed users. If someone has been inactive for a certain amount of time, just cut them off. Put a pop-up window on their screen telling them that due to inactivity, the connection has been closed. Perhaps you find other workarounds, such as faking that your app is more responsive than it really is. Microsoft famously does this in Windows. You can log in before the system is fully usable as the operating system keeps loading key applications in the background.

No matter how many workarounds you find, you can easily end up in a situation in which your app will simply no longer be usable. Perhaps usability is fine but your engineering team no longer gets anything done. For instance, once code is too messy, it becomes unmaintainable and then you have a real problem. You may also have HR and a bunch of DIE grifters sitting around to make sure you waste your time on b.s., for instance by telling you that you have to rename your “master” branch to “main”. You can kick the can down the road for a quarter, a year, perhaps even multiple years, but eventually your users will be fed up. This happened to MySpace. Facebook is also not doing so well. Pinterest engaged in too much navel-gazing. There were other sites like Digg barely anybody even remembers anymore.

Modern society, in particularly modern technology, is a bit like Jenga tower. There is some resiliency built in but the more blocks people remove, no matter if they understand what they are doing or not, the higher the likelihood of this entire complex structure breaking down. History is full of such examples. The communist in the Soviet union killed the kulaks, i.e. the farm owners. The result was mass starvation. Something quite similar has been happening in South Africa. In big tech, we have been relatively lucky so far, but it is only a matter of time until Google really wipes out a huge pension fund because they delete all their data, or Crowdstrike bricks another million PCs. I find it entirely plausible that within a few years we will have reached a point at which our infrastructure is no longer maintainable. In fact, I have seen a glimpse of that as I have come across software applications that had to be rewritten from scratch because nobody understood the code anymore and the suits thought that it was more time-efficient to simply start over, completely ignoring that there may be countless corner cases described in the old code that could be unearthed if you put someone capable on it and asked them to produce a spec based on the current implementation that could serve as the foundation of a new implementation. I do not even want to know how many ticking time-bombs there are in tech.

5 thoughts on “The Microsoft/Crowdstrike Disaster and the Fragility of Modern Society

    1. This is what it looks like in a lot of companies with any history. I even know of smaller tech companies that service particular niches but do not grow a lot, yet make their owners a lot of money, where there is not a lot of turnover and you find people in their 50s or even 60s. Let us not worry about that because Shaqueesha got straight A’s in heir degree in ESG. She can do anything she puts her mind to!

  1. If a massive solar storm hits the earth and destroys our electronics, we will be set back 100 years.

    1. This will be worse. I am convinced that we will not be able to rebuild a lot of modern technology. That ship has sailed a few decades ago. On the plus side, we nowadays get to enjoy ethnic food on every corner, so I am sure it will have been worth it.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.