… a cloud in a Datacenter most likely is going to ruin your day (especially if it’s a smoke-cloud).

On March 10th, at about 00:47h local time some of our servers did stop responding.

One of our Providers, OVH, has faced the worst of all nightmares last night: One of their Datacenters in Strasbourg (FR) did burn to crisp.

There are four Datacenters at the Strasbourg facility, SBG1, SBG2, SBG3 and SBG4. As far as we know today, SBG2 has been completely destroyed and SBG1 to a certain extent.

Luckily, nobody was hurt or killed.

And yes, we have lost a bunch of servers, too (and no, we won’t announce public figures about the amount of money we lost – this is of no importance here).

In our case we were lucky (this time): The servers concerned were mainly reverse proxies we use to distribute the load and cache static content that might overload our bandwidth.

Some servers have served special purposes, some where for development / DevOps.

But… what had happened?

Actually, we don’t know yet. There have been a lot of discussions on forums bashing OVH for being unprofessional. Some people herewith refer to photographs of the burnt down site and argue that the Datacenter was not built secure because they used wood, the construction being made out of containers, etc. In addition, many voices have spoken up that the fire extinguishers were not sized correctly, did not work, were not installed correctly and many more.

Let me tell you one thing: Those arguments are mainly rubbish.

Datacenter State of the art today?

I am in and out of Datacenters on an almost daily basis. If I look at the history of how they were built the evolution is pretty clear:

In the beginning most Datacenters were built inside of old fabrication plants. Yes, those were massive built. But they did have some drawbacks like cooling / air circulation being a constant struggle.

Modern Datacenters are built different. They are built more efficient. They scale better. The cooling is better. The efficiency is higher.

And: They are safer.

If we had a fire in one of the old Datacenters I still work from time to time you would most certainly have had casualties (in some I doubt it would be possible without, even today).

Those old buildings were not built for the heat dissipation modern Datacenters have. They were not built for the power consumption – and the supply of sufficient electric energy is a constant and utter struggle.

In addition, if the cooling fails your servers will die (and eventually start burning as well).

There is (exceptions defend the rule) no way old Datacenters can get rid of the heat in case of even a partial cooling failure. Ask me how I know…

Modern Datacenters are built much more fault tolerant in a way that if the cooling fails you have a possibility to use ambient air to keep the temperature at an at least bearable level.

And don’t believe Datacenters are light on energy consumption. Back in the day (~18 years ago), one of the largest Datacenters here in the Zurich region has had three Diesel Generators, each rated at 2MW (yes, that’s Megawatt!). Today the same building needs ten, twelve Generators, each rated at 4MW. That’s an increase by a factor north of six! In the same Building, for the same space.

Clearly, Power Consumption has gone up quite a lot!

Regarding the fire extinction, things have changed, too:

In former times all of the Datacenters did use Halon to extinguish fires. This normally works quite well for electrical fires but has one big downside: Everyone in the room will die. Since you’re in a Datacenter, getting out of it isn’t always straight forward (or fast).

Luckily, Halon Systems are the exception today (there still are some around but the days are clearly counted – and you can’t build a new Halon System).

There are others that use CO2 or Argon (which are less dangerous than Halon but lack some effectiveness, too).

In theory, you could use Water (if you completely cut off electricity before letting the nozzles go off it might have some effect) but water and electricity generally don’t mix too well…

In the end: The best thing you can do not to have a fire is to prevent it from happening – which isn’t that easy.

Remember the growth in Generators mentioned above? They all have a dependency: Batteries!

If you start-up the Generators it will take some minutes until they are ready to deliver the power. This time has to be bridged using Batteries.

In former times, lead-acid batteries were used. Since those aren’t efficient enough anymore the same happened as has bappened to batteries from your toy car in the 80s: They have been replaced by Li-Ion or Li-Fe-Po Batteries.

If a Li-Ion or Li-Fe-Po Battery starts burning it’s going to be one hell of a fire!

And even worse: Even IF you could have extinguishers based on Halon: They would not work since those batterie’s ingredients contain various components that (like Magnesium) bring enough oxygen with them to simply keep burning (self-sufficient ignition). While this type of batteries can be controlled in smaller devices (well… most of the time) the amount of batteries in a Datacenter (we’re talking about several 100s of tons) is not really controllable once they get off!

Honestly? It’s been a question of “when”, not “if” this happens. And it most probably doesn’t have anything to do with OVH being unprofessional. This could have happened to anyone – and it will happen to other providers/hosters, too!

Verdict?

I don’t want to join those who voice strong opposition against clouds. It’s not a problem of “the cloud”. It’s how you deal with it.

If:

  • this has been your only Datacenter
  • you didn’t do backups (preferably: to YOUR location)
  • your services rely on it

well. I’m afraid: The fault’s on you.

You can’t just put anything into the cloud and rely on other people doing YOUR job.

  • Backup is important. Do it.
  • Redundancy is important. Make it happen

Everyone blaming OVH (or any other provider if data has been lost): It’s not the providers fault. It’s been you, not doing your job.

Yes, we have lost Money, too. Yes, re-building the servers is a necessity (and somehow annoying).

Nevertheless: I’d rather have it this way than have my servers running but one (or several) people have died.