Subscribe in a reader Or you can Subscribe to TechWag by Email

Yahoo brings on Hadoop

In case you did not know about Hadoop, it is time to take a look at the Hadoop Distributed File System (HDFS) as an alternative to the Google file system. With Yahoo working on the project, the HDFS might actually stand a chance of getting bigger, and better able to scale over a large term project.

We first learned of Hadoop when we were looking at Nutch 3 years ago in the idea of building out a vertical search engine. While our project was eventually abandoned because of various issues (one of them being a very early version of Hadoop), the adoption of Hadoop by Yahoo makes us want to go back and start our project again.

When you are dealing with super massive data sets, ones that expand over hundreds of systems, the processing and management overhead are huge. While Google uses Map Reduce to do this, hadoop worked on an analogous version of map reduce, and have gotten the system running on a scalable grid-computing node of 1000 systems at the Yahoo research lab. If you have a large chunk of data, and need to be able to search it efficiently, then Hadoop might just be for you.

By supporting and contributing to an open source grid computing project, we hope to be part of providing a solid, efficient, and scalable system that anyone can use to attack the types of problems and data sets that are becoming more common on the web. And since it’s open source, everyone benefits from the expertise of developers and users around the world. We’ve already seen similar benefits from our use and support of Apache, PHP, and MySQL (just to name a few). Source: Yahoo Developer

Overall though, with hadoop getting Yahoo support, we might just pick up our vertical search engine project and see if we can get Hadoop to install and work nicely so that we can have a system with more than 1 TB size drive (Nutch had problems with the indexes when there was that much data involved). This also means that grid computing, and scalable computing might be something that the better than average computer geek can do at home. Suddenly everyone can have a grid computer, meaning super massive sets of data can be easily scanned looking for information. On the other hand, building out vertical search engines based on the Map Reduce function might just be a home DIY weekend kind of work.

1 comment so far must have more ↓

#1 GPhone is another definite maybe | TechWag on 08.30.07 at 10:10 am

[…] we know about the Google OS, that is one product we would like to see commercialized, and while Yahoo is working with Hadoop, the idea is that if the Google OS was put up for sale, there would be a lot of people who purchase […]

Leave a Comment


ss_blog_claim=3c1696ce5b8393dba57964d7ee0d0875 ss_blog_claim=3c1696ce5b8393dba57964d7ee0d0875