Matt Kent wrote:
We use Hadoop in a similar manner, to process batches of data in
real-time every few minutes. However, we do substantial amounts of
processing on that data, so we use Hadoop to distribute our computation.
Unless you have a significant amount of work to be done, I wouldn't
Fernando Padilla wrote:
One use case I have a question about, is using Hadoop to power a web
search or other query. So the full job should be done in under a second,
from start to finish.
I don't think you should be using hadoop to answer the results of a
user's search query.
you should be
Matt,
How do you manage your tasks? Do you lauch them periodically or keep
them somehow running and feed them data?
Vadim
On Mon, Jun 23, 2008 at 21:54, Matt Kent [EMAIL PROTECTED] wrote:
We use Hadoop in a similar manner, to process batches of data in
real-time every few minutes. However,
On Jun 23, 2008, at 9:54 PM, Matt Kent wrote:
Unless you have a significant amount of work to be done, I wouldn't
recommend using Hadoop because it's not worth the overhead of
launching
the jobs and moving the data around.
I think part of the tradeoff is having a system that is resilient
We wrote some custom tools that poll for new data and launch jobs
periodically.
Matt
On Tue, 2008-06-24 at 09:27 -0700, Vadim Zaliva wrote:
Matt,
How do you manage your tasks? Do you lauch them periodically or keep
them somehow running and feed them data?
Vadim
On Mon, Jun 23, 2008
https://issues.apache.org/jira/browse/HADOOP-1700
过佳 [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
Does HDFS support it?I need it to be synchronized , e.g. I call many
clients
to write a lots of IntWritable to one file.
Best.
Jarvis.
Hi,
I'm a Hadoop newbie. My question is as follows:
The level of parallelism of a job, with respect to mappers, is largely the
number of map tasks spawned, which is equal to the number of InputSplits. But
within each InputSplit, there may be many records (many input key-value pairs),
each is
Xuan Dzung Doan wrote:
Hi,
I'm a Hadoop newbie. My question is as follows:
The level of parallelism of a job, with respect to mappers, is largely the
number of map tasks spawned, which is equal to the number of InputSplits. But
within each InputSplit, there may be many records (many input