Google set to debut a new custom file system
Google has the largest search
engine on the globe and subsequently not only has the most users, but
also has one of the largest databases of any IT company. That means
that the search giant is always on the lookout for new ways to
increase the performance and storage capacity of its data
centers.
Google launched its Caffeine
update earlier this month and the company is now testing
a new storage system on the backend. EWeek
reports that Google is set to debut its second custom designed file
system in the last 10 years.
When Google first hit the scene
in 1999 it as running on a server and array system that Google
engineers were pretty much building on their own. Google lead
software storage engineer Sean Quinlan said, "In 1999, at the
peak of the dot-com boom when everybody was buying nice Sun machines,
we were buying bare motherboards, putting them on corkboard, and
laying hard drives on top of it. This was not a reliable computing
platform. But this is what Google was built on top of."
The
original Google system had stability problems and Quinlan said,
"Sometimes, 500 to 1,000 servers would disappear from the system
and take hours to come back. And those were just the problems we
expected. Then there are always those you didn't expect."
In
the early days, Google engineers were able to design and implement
their own file system called Google File System (GFS). Quinlan says
that it has a familiar interface, but wasn't specifically Posix.
Google was taking the machines in the data center and spreading
applications across all the servers without caring where the data
actually was in the machine.
The biggest issue with the GFS
was that it lacked automatic failover if the master went down. That
fact meant that in the early days it could take hours for the search
engine to come back up. After the IPO in 2004, Google grew even
faster and modified the GFS file system called BigTable that put a
distributed database-like file system on top of GFS.
Quinlan
told eWeek
that this is the part of the system that runs the user application.
The systems are known as cells and each cell scales into petabytes of
data. Much of the base storage reports eWeek
is comprised of Rackable Eco-Logical storage servers clusters to run
on Linux as high as 273TB per cabinet.
Details on the new file
system are scant as Google is playing things close to the vest.
Quinlan said, "By far the biggest challenge is dealing with the
reliability of the system. We're building on top of this really flaky
hardware—people have high expectations when they store data at
Google and with internal applications. We are operating in a mode
where failure is commonplace. The system has to be automated in terms
of how to deal with that. We do checksumming up the wazoo to detect
errors, and using replication to allow recovery."
Google
stores most of its data in two forms -- RecordIO and SSTables.
Quinlan says, "SSTables are immutable, key/value pair, sorted
tables with indexes on them. Those two data structures are fairly
simple; there's no update in place. All the records are either
sequential through the RecordIO or streaming through the SSTable.
This helps us a lot when building these [new] reliable
systems."
Quinlan and Google are working on how to build
this new type of data center on a more global basis and the hope is
that the new file system will allow the search giant to spread data
globally.
"There's no chance that the iPhone is going to get any significant market share. No chance." -- Microsoft CEO Steve Ballmer
|
Most Popular ArticlesReport: Apple to Debut iPad 3 During First Week of March February 10, 2012, 9:36 AM Nikon Announces 36.3MP D800, D800E D-SLRs February 7, 2012, 10:11 AM Quick Note: Acura Unveils Production Version of ILX Hybrid Sedan February 8, 2012, 9:10 AM Google's Motorola Mobility Purchase Approval Expected Next Week February 9, 2012, 3:02 PM AMD Concedes Die-Shrink Race to Intel, Considers ARM Cores February 6, 2012, 11:45 AM
|