Dec 28, 2023

On Premises vs Cloud in 2023

Usually I don’t react to the articles that I read, or podcasts that I listen, but this time I will make an exception, just because I am so disappointed. I am talking about the recent interview that Kelsey Hightower did with David Heinemeier Hansson (DHH) on the topic of On-Premises with DHH. Just to be clear, I am not disappointed with the topic itself. It’s quite the opposite, I love controversial topics and questioning everything. However, I am disappointed with the way DHH presented his arguments. Basically their point, Kelsey included, is that whoever is using cloud is incompetent. Seriously? We will get to that part, I promise I will stay polite.

I like how the interview started. They mentioned Heroku and the way it influenced many developers and organizations. I can only agree to this, it was a great experience. It helped many people to adopt the cloud at some point. And this is the only part where I agree with what has been said. From here, it starts to get really bad.

I wrote a very long rant about this interview on one of the Slack channels, but here I will address only important parts. Let’s get started.

Argument: the cloud is supposed to be Rails for infrastructure and it was not fulfilled.

Not 100% sure what he meant by this, but what about IaC tools? Pick any, including raw CloudFormation on AWS. You can manage infra and you can version it. On AWS there is an AppConfig and many other tools that help to keep configuration consistent, etc. You can’t tell me that this is more complicated than having images and managing OpenStack, or whichever tool you prefer. I chose to let my cloud provider deal with that. I am happy to pay for what I use. Let’s never forget the number of people you need to do operations. I will mention this every time we talk about on-premise.

Argument: Sense of desperation and powerless when cloud outage happens.

DHH said that they had an outage once that took 9 hours to get back online and that he had a sense of desperation when that happened because they were dependent and couldn’t do anything, so they moved to AWS eventually. This is a problem, of course, but how often that happens? I remember (we probably all do) the famous 2017 S3 outage. Does that mean that we all have to go back on-premise? Absolutely not.

Everything will fail at some point. Your on-premise infrastructure will stop working, your app will crash because of the bug, there will be a power outage because of the storm, the fuse will burn, or human error will happen. How bad it’s gonna be, we can only guess. The only thing that we can do about it is to implement all the measures to mitigate it when it happens. This is the part where we come to the power of the cloud.

The huge difference between my on-premises infrastructure and the cloud is that hundreds, if not thousands of people in these major providers will work on fixing the issue. They will fix it, they will do a post-mortem, and it will hardly happen again. Something else will, but we will not wake up the whole team in the middle of the night, pay thousands of dollars to fix the issue, and most probably cause a few heart attacks during that process.

Argument: the cloud is not cheaper.

This is my favorite part. Let’s talk about what it takes to build world class data center.

CapEx is an abbreviated term for capital expenditures, major purchases that are usually capitalized on a company’s balance sheet instead of being expensed. - Investopedia

Opex is an operating expense is an ongoing cost for running a product, business, or system. - Wikipedia

To make it simple, operational costs are everything you spend on salaries, spare parts, services, maintenance, and similar during your day-to-day business. While the Capex is your initial investment into the infrastructure before operations even start.

I am not going to talk about the theory, I am going to give you a real-world example.

The ISP company that I used to work for built its data center. It was mainly built for the local government and some local companies because of the local laws and not a single cloud provider has a region in my country.

Local laws could be one of the valid use cases to build your data center, but still, you can easily outsource that to an ISP/telecom company so they can worry about all the things that I am about to mention. It is probably a little bit more expensive than AWS or competitors, but still cheaper than maintaining your infra. Why? I will give you an example where I was directly involved.

I was part of the team that worked on constructing a server room for one of the largest banks in my country. I was doing programming PLCs for the HVAC system. For those who don’t know what that is, just google HVAC and SCADA.

The cost of the equipment and manpower to build and maintain the server room was 7 figures. And that is just Capex. We will get to Opex eventually.

You need physical equipment for your room before you even start adding servers. You need surveillance, then you need physical security, then you need a procedure for safe disposal of the failed or replaced hard drives, then you need cables for the internet connection, you need redundant power supplies, generators, UPSs, then you need redundant internet connection… Of course, for the HVAC part, you need to have world-class chillers and they had to be redundant to keep room temperature between 19-21 degrees Celsius all the time, and humidity level around 50%. That has to be reliable. If that goes up or down below that level you need to have physical staff to address the issue immediately.

Speaking of physical staff what about the procedures to enter/exit/tamper with the equipment in the room? What about alarms, fire alarms, and automatic fire extinguishers? Hint: you can’t use water, there are other ways, which can be much more expensive, but you have to use it because you want to reduce damage to your equipment as much as you can in case of an accident. You need to maintain your data center after you build it. Notice that we still didn’t mention the actual servers where your applications will run.

I know that I am stating the obvious and the things that the people figured out and said 20 years ago, but with their narrative, it looks like it needs to be repeated. They are smart and experienced enough to make their own decisions, and I respect that, but selling this story to people who might not understand all the implications could cause serious damage.

Argument: Cloud economics and beta services.

Now another argument is ridiculous that relates to cloud economics. An example is that we start using beta service and then you see it’s easy, no effort, and then you leave it like that and then you realize that you are spending 15k per month on that small thing. First of all, yes, you can leave the EC2 instance, or a bunch of them running for no reason. You can do many things wrong. However, you need to monitor your infra and you need to be aware of the whole system. Doesn’t matter if you are on-premise or on the cloud, you need to know what is happening. Making these “beta service” mistakes could happen, but they should not go unnoticed. First rule, you have to teardown your infra after testing, and you never do click ops, and you never test on the production env. This is like saying what if I enter the data center on-premise and burn everything to the ground? Sure, I can do that. Is that really an argument?

Mistakes will happen, but if you monitor your infrastructure, and review your cloud bills, those mistakes will not be that costly. They said that they were overspending on the services that they were not even using. I don’t see this as a cloud issue. You can buy servers and hardware that you don’t need or use for your on-premise data center. You can buy stuff that you don’t need in your home. Again, is this an argument?

Argument: Incompetency?!

I certainly don’t agree with Kelsy that the reason to move to the cloud was because enterprises lost confidence in their IT teams. He is claiming that IT teams weren’t able to keep up the pace with the technology. That’s not true. It’s the problem of maintaining your infrastructure. If you need a larger server you will either scale vertically (and that has limits), and then you will try to scale horizontally, and then you will realize that you need more space, and by space I mean adequate space (see the above about what data center/server room needs). For that, you need a budget. If you want to replace servers, that is going to cost you a lot of money. So the companies exchanged hardware money for the cloud budget. This is correct. I don’t see a competency issue here.

His point is that incompetent IT teams found the cloud easier to use and that they don’t need to learn skills and then we have misconfigured and hacked systems… ok, how is this cloud problem? If you have that kind of team or the guy in the team, how do you think that your on-premise system will be configured? Oh, you will hire a competent guy. Sure, why can’t you hire a competent cloud guy?

To summarize, we are now at the territory where the main reason for choosing the cloud is because of incompetency

Sure. I will do self-censorship here.

Conclusion

Unfortunately, I saw this toxic behavior before when people were pushing their tools and frameworks in a way that they were bashing other tools and calling other builders incompetent. That’s not a way to go. If this was a talk about how things worked better for us, and how it might work better for you, I would buy it. It was not that. It was more like you had to move out of the cloud because we did it. You are incompetent and we are not. There is a difference between being confident and being arrogant.

To conclude this too-long comment/reaction/rant, call it as you like, I will just say that I am sorry that I lost 2 hours of my life listening to this nonsense. I was hoping that I would learn something new. I didn’t.