Anybody who has gone through a slide deck on data has heard of three things.
- First, data as the new oil.
- Second, the value of the data is in the insights. Insights are generated by combining multiple sources of historical data
- And third, the amount of raw data generated in the world doubles every two years.
This exponentially growing raw data needs to be stored for insights (and the accidental crash). Usually the most recent data is stored on disks. The storage of data on disks is expensive, so older data is taken offline and stored on magnetic tapes. Data storage on the tapes is cheap, but then it needs to be copied to disks to access it. i.e. data retrieval takes time.
Data tapes at Google
Every company that I have worked with had a library of tapes that contained backed-up and archival data. In google, in 2019, the tape storage system was around 30 exbibytes, or about 34.5K petabytes. It required 21 million magnetic tape cartridges – roughly 23 times the distance to the moon and back. The process of finding the right tape, transporting it to a datacenter and making it available would take several weeks. The tapes had data from Gmail, Youtube, Photos and many other services that google had launched over the years. Given that this data would exponentially increase, storing it on tapes was not a viable option.
The google technical infrastructure team innovated on an entirely new hard drive-based storage system. It was affordable, secure and immutable. Data from any Google service (for example, Gmail) can be restored within seconds. The customers of google cloud can also restore archival data within seconds and not days or weeks.
Avoiding reinventing the wheel
The journey of creating a massive tape storage system and innovating out of it is a journey that every company will need to take on its way to digitalisation. Fortunately, one does not need to reinvent the wheel every time.
The cloud lets you piggyback on others’ innovation.

Nice article,