Yahoo resolve to build world`s biggest database

Posted by Ryan Davies on Jun 4, 2008 | Tagged as:

Yahoo has discovered that outsourcing data storage and management has become expensive and inadequate in capacity to meet their demand, and is consequently developing a multi petabyte SQL database. The system will, like Google`s BigTable, use a system of distributed columns rather than the typical tables system, which is organised by rows and columns. The difference from Google`s BigTable is that Yahoo`s database is designed for a SQL interface.

Google`s BigTable method of using distributed columns employs a plurality of storage servers, one with a database engine that partitions database tables into column chunks.

A distributed column chunk data store may be provided by multiple storage servers operably coupled to a network. A storage server may include a database engine for partitioning a data table into the column chunks for distributing across multiple storage servers

Any data table may be flexibly partitioned into column chunks using one or more columns as a key with various partitioning methods.

bigtable process diagram

BigTable Process diagram taken from Google Patent application

Yahoo chose the distributed columns system because of its nature to only read through data that is relevant to the query, thereby massively reducing the labour involved in a given query. Another major advantage is that the programming of software to write to and query the database will be far cheaper than using the C++ or Java languages that BigTable requires.

I wonder if this database technology will help their indexing system. A Google patent for anchor text processing, covered by Bill Slawski at SEO by the sea, suggests that indexed web pages are associated to anchor text in external inward pointing links via a database-powered cataloging system. This system is used in ranking the indexed pages for single-query search results. Yahoo may use their database applications similarly. A highly souped-up database querying method could enhance their index and ranking ability, or their other functions like the concept dictionary to quite a speed.

A petabyte is a very large amount of data, equal to roughly a thousand terabytes, or one million gigabytes. This capacious storage reserve will do nicely for the immediate future, allowing Yahoo flexibility when developing new products and applications that require such space and efficiency. It will also provide Yahoo employees and their families & friends a place to store an endless reserve of mp3`s, jpegs and mpegs 🙂

YouTube`s new developments - Search Suggest and Video stats tries to go for a more feminine approach

2 responses to “Yahoo resolve to build world`s biggest database”

  1. Prevyn Jeftha says:

    Well it’s about time. Google has by far been the market leader and this effort from Yahoo! is a giant stride in the right direction. They finally start taking accountability for their lack of market share.

  2. Melissa says:

    Good to hear, maybe the big G will have some competition in the near future?

Leave a Reply

Your email address will not be published.

Companies that have trusted us