2008/6/24 Konstantin Shvachko [EMAIL PROTECTED]:
Also HDFS might be critical since to access your data you need to close
the file
Not anymore. Since 0.16 files are readable while being written to.
Does this mean i can open some file as map input and the reduce output ? So
i can update the
Matt Kent wrote:
We use Hadoop in a similar manner, to process batches of data in
real-time every few minutes. However, we do substantial amounts of
processing on that data, so we use Hadoop to distribute our computation.
Unless you have a significant amount of work to be done, I wouldn't
Fernando Padilla wrote:
One use case I have a question about, is using Hadoop to power a web
search or other query. So the full job should be done in under a second,
from start to finish.
I don't think you should be using hadoop to answer the results of a
user's search query.
you should be
Matt,
How do you manage your tasks? Do you lauch them periodically or keep
them somehow running and feed them data?
Vadim
On Mon, Jun 23, 2008 at 21:54, Matt Kent [EMAIL PROTECTED] wrote:
We use Hadoop in a similar manner, to process batches of data in
real-time every few minutes. However,
On Jun 23, 2008, at 9:54 PM, Matt Kent wrote:
Unless you have a significant amount of work to be done, I wouldn't
recommend using Hadoop because it's not worth the overhead of
launching
the jobs and moving the data around.
I think part of the tradeoff is having a system that is resilient
We wrote some custom tools that poll for new data and launch jobs
periodically.
Matt
On Tue, 2008-06-24 at 09:27 -0700, Vadim Zaliva wrote:
Matt,
How do you manage your tasks? Do you lauch them periodically or keep
them somehow running and feed them data?
Vadim
On Mon, Jun 23, 2008
Hi!
I am considering using Hadoop for (almost) realime data processing. I
have data coming every second and I would like to use hadoop cluster
to process
it as fast as possible. I need to be able to maintain some guaranteed
max. processing time, for example under 3 minutes.
Does anybody have
Vadim,
Depending on the nature of your data, CouchDB (http://couchdb.org)
might be worth looking into. It speaks JSON natively, and has
real-time map/reduce support. The 0.8.0 release is imminent (don't
bother with 0.7.2), and the community is active. We're using it for
something similar to what
Also HDFS might be critical since to access your data you need to close the
file
Not anymore. Since 0.16 files are readable while being written to.
it as fast as possible. I need to be able to maintain some guaranteed
max. processing time, for example under 3 minutes.
It looks like you do
Interesting.
we are planning on using hadoop to provide 'near' real time log
analysis. we plan on having files close every 5 minutes (1 per log
machine, so 80 files every 5 minutes) and then have a m/r to merge it
into a single file that will get processed by other jobs later on.
do you
We use Hadoop in a similar manner, to process batches of data in
real-time every few minutes. However, we do substantial amounts of
processing on that data, so we use Hadoop to distribute our computation.
Unless you have a significant amount of work to be done, I wouldn't
recommend using Hadoop
One use case I have a question about, is using Hadoop to power a web
search or other query. So the full job should be done in under a
second, from start to finish.
You know, you have a huge datastore, and you have to run a query against
that, implemented as a MR query. Is there a way to
12 matches
Mail list logo