. So as long as the correctness of the computation doesn't
> rely on a transformation performed in the combiner, it should be OK. In
Right, i had the same thought.
>
> However, this restriction limits the scalability of your solution. It might
> be necessary to work around R's limitations by br
Brok,
I've had good luck storing time-series data with HBase. Its latency for
looking up records is orders of magnitude lower than Hadoop's MapReduce
(which is more for batch processing), yet still resides on HDFS, and has
mechanisms to let you MapReduce on your HBase data.
You may have a diffic
Hi list,
I am researching hadoop as a possible solution for my company's data
warehousing solution. My question is whether hadoop, possibly in combination
with Hive or Pig, is a good solution for time-series data? We basically have
a ton of web analytics to store that we display both internally and
hi Konstantin,
I think I got it, I forgot one thing in your last post.
time = time(0) + ... + time(N-1).
So it must be the throughput per client, and I'm happy now, that hadoop
works very well with the scalbility on my cluster.
Thank you so much and wish you all the best in the new year 2009 :
hi Konstantin,
sorry for my mistake, it was not 5012, it was 512.
Of course, it is great that the throughput is mb/sec per client like you
said. In this case we have circa 120 MB/sec :clap:
But I'm not sure, if that really was. Please follow my example and
calculation of throughput
> hadoop-0.
From Java documentation
http://java.sun.com/javase/6/docs/api/java/nio/channels/FileChannel.html#force(boolean)
"Passing false for this parameter indicates that only updates to the file's content need be written to storage; passing true indicates that updates to both the file's content and metada
tienduc_dinh wrote:
Hi Konstantin,
thanks so much for your help. I was a litte bit confused about why my
setting mapred.map.tasks = 10 in hadoop-site.xml, but hadoop didn't map
anything. So your answer with
In case of TestDFSIO it will be overridden by "-nrFiles".
is the key.
I need now
I was able to process 100 pdfs in 4 directories. How I have moved up to
500 pdfs (started with 700 and I'm working backwards) in 6 directories,
and I am getting this error in the console:
09/01/07 14:04:41 INFO mapred.JobClient: Task Id :
attempt_200812311556_0034_m_00_0, Status : FAILED
java
Did you look at FSEditLog.EditLogFileOutputStream.flushAndSync()?
This code was re-organized sometime back. But the guarantees it provides
should be exactly same as before. Please let us know otherwise.
Raghu.
Jason Venner wrote:
I have always assumed (which is clearly my error) that edit lo
I have always assumed (which is clearly my error) that edit log writes
were flushed to storage to ensure that the edit log was consistent
during machine crash recovery.
I have been working through FSEditLog.java and I don't see any calls of
force(true) on the file channel or sync on the file d
The notion of a client/task ID, independent of IP or username seems
useful for log analysis. DFS's client ID is probably currently your
best bet, but we might improve its implementation, and make the notion
more generic.
It is currently implemented as:
String taskId = conf.get("mapred.ta
Hey,
One of our charges is to do auditing and accounting with our file
systems (we use the simplifying assumption that the users are non-
malicious).
Auditing can be done by going through the namenode logs and utilizing
the UGI information to track opens/reads/writes back to the users.
Hi Konstantin,
thanks so much for your help. I was a litte bit confused about why my
setting mapred.map.tasks = 10 in hadoop-site.xml, but hadoop didn't map
anything. So your answer with
> In case of TestDFSIO it will be overridden by "-nrFiles".
is the key.
I need now your confirm to know,
Dear all,
We have finally opened Neptune, yet another BigTable-clone project.
Neptune has the following features.
- Basic Data Service
. Single-row operations : Get, Put
. Multi-row operations : Like, Between, Scanner
. Data Uploader : DirectUploader
. MapReduce : TableInputFormat
14 matches
Mail list logo