Re: realtime hadoop

2008-06-24 Thread Ian Holsman (Lists)
Matt Kent wrote: We use Hadoop in a similar manner, to process batches of data in real-time every few minutes. However, we do substantial amounts of processing on that data, so we use Hadoop to distribute our computation. Unless you have a significant amount of work to be done, I wouldn't

Re: realtime hadoop

2008-06-24 Thread Ian Holsman (Lists)
Fernando Padilla wrote: One use case I have a question about, is using Hadoop to power a web search or other query. So the full job should be done in under a second, from start to finish. I don't think you should be using hadoop to answer the results of a user's search query. you should be

Re: realtime hadoop

2008-06-24 Thread Vadim Zaliva
Matt, How do you manage your tasks? Do you lauch them periodically or keep them somehow running and feed them data? Vadim On Mon, Jun 23, 2008 at 21:54, Matt Kent [EMAIL PROTECTED] wrote: We use Hadoop in a similar manner, to process batches of data in real-time every few minutes. However,

Re: realtime hadoop

2008-06-24 Thread Chris K Wensel
On Jun 23, 2008, at 9:54 PM, Matt Kent wrote: Unless you have a significant amount of work to be done, I wouldn't recommend using Hadoop because it's not worth the overhead of launching the jobs and moving the data around. I think part of the tradeoff is having a system that is resilient

Re: realtime hadoop

2008-06-24 Thread Matt Kent
We wrote some custom tools that poll for new data and launch jobs periodically. Matt On Tue, 2008-06-24 at 09:27 -0700, Vadim Zaliva wrote: Matt, How do you manage your tasks? Do you lauch them periodically or keep them somehow running and feed them data? Vadim On Mon, Jun 23, 2008

Re: how to write data to one file on HDF S by some clients Synchronized?

2008-06-24 Thread Billy Pearson
https://issues.apache.org/jira/browse/HADOOP-1700 过佳 [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Does HDFS support it?I need it to be synchronized , e.g. I call many clients to write a lots of IntWritable to one file. Best. Jarvis.

How Mappers function and solultion for my input file problem?

2008-06-24 Thread Xuan Dzung Doan
Hi, I'm a Hadoop newbie. My question is as follows: The level of parallelism of a job, with respect to mappers, is largely the number of map tasks spawned, which is equal to the number of InputSplits. But within each InputSplit, there may be many records (many input key-value pairs), each is

Re: How Mappers function and solultion for my input file problem?

2008-06-24 Thread Amar Kamat
Xuan Dzung Doan wrote: Hi, I'm a Hadoop newbie. My question is as follows: The level of parallelism of a job, with respect to mappers, is largely the number of map tasks spawned, which is equal to the number of InputSplits. But within each InputSplit, there may be many records (many input