feed queue fetcher with hadoop/zookeeper/gearman?

2010-04-12 Thread Thomas Koch
Hi, I'd like to implement a feed loader with Hadoop and most likely HBase. I've got around 1 million feeds, that should be loaded and checked for new entries. However the feeds have different priorities based on their average update frequency in the past and their relevance. The feeds (url,

Re: feed queue fetcher with hadoop/zookeeper/gearman?

2010-04-12 Thread Mahadev Konar
Hi Thomas, There are a couple of projects inside Yahoo! that use ZooKeeper as an event manager for feed processing. I am little bit unclear on your example below. As I understand it- 1. There are 1 million feeds that will be stored in Hbase. 2. A map reduce job will be run on these feeds to

Re: feed queue fetcher with hadoop/zookeeper/gearman?

2010-04-12 Thread Thomas Koch
Mahadev Konar: Hi Thomas, There are a couple of projects inside Yahoo! that use ZooKeeper as an event manager for feed processing. I am little bit unclear on your example below. As I understand it- 1. There are 1 million feeds that will be stored in Hbase. 2. A map reduce job will be

Re: feed queue fetcher with hadoop/zookeeper/gearman?

2010-04-12 Thread Patrick Hunt
See this environment http://bit.ly/4ekN8G. Subsequently I used the 3 server setup, each configured with 8gig of heap in the jvm and 4 CPUs/jvm (I think I used 10second session timeouts for this) for some additional testing that I've not written up yet. I was able to run ~500 clients (same test