So far this week we’ve looked at a programming languages paper and a systems paper, so for today I thought it would be fun to look at an algorithm-based paper.
HDFS enables horizontally scalable low-cost storage for the masses, so it becomes feasible to collect and store much more data. Enter the Jevons paradox and soon you’ve got so much data that the storage costs become a real issue again. For redundancy and read efficiency, HDFS and the Google File System store three copies of all data by default.
Although disk storage seems inexpensive for small data sizes, this assumption is no longer true when storing multiple copies at the massive scales of operation of today’s data centers. As a result, several large-scale distributed storage systems now deploy
erasure codes, that provide higher…
View original post 1,348 more words