date:20090107

Re: Combiner run specification and questions

2009-01-07 Thread Saptarshi Guha

. So as long as the correctness of the computation doesn't > rely on a transformation performed in the combiner, it should be OK. In Right, i had the same thought. > > However, this restriction limits the scalability of your solution. It might > be necessary to work around R's limitations by br

Re: Storing/retrieving time series with hadoop

2009-01-07 Thread Mark Chadwick

Brok, I've had good luck storing time-series data with HBase. Its latency for looking up records is orders of magnitude lower than Hadoop's MapReduce (which is more for batch processing), yet still resides on HDFS, and has mechanisms to let you MapReduce on your HBase data. You may have a diffic

Storing/retrieving time series with hadoop

2009-01-07 Thread Brock Judkins

Hi list, I am researching hadoop as a possible solution for my company's data warehousing solution. My question is whether hadoop, possibly in combination with Hive or Pig, is a good solution for time-series data? We basically have a ton of web analytics to store that we display both internally and

Re: TestDFSIO delivers bad values of "throughput" and "average IO rate"

2009-01-07 Thread tienduc_dinh

hi Konstantin, I think I got it, I forgot one thing in your last post. time = time(0) + ... + time(N-1). So it must be the throughput per client, and I'm happy now, that hadoop works very well with the scalbility on my cluster. Thank you so much and wish you all the best in the new year 2009 :

Re: TestDFSIO delivers bad values of "throughput" and "average IO rate"

2009-01-07 Thread tienduc_dinh

hi Konstantin, sorry for my mistake, it was not 5012, it was 512. Of course, it is great that the throughput is mb/sec per client like you said. In this case we have circa 120 MB/sec :clap: But I'm not sure, if that really was. Please follow my example and calculation of throughput > hadoop-0.

Re: Question about the Namenode edit log and syncing the edit log to disk. 0.19.0

2009-01-07 Thread Konstantin Shvachko

From Java documentation http://java.sun.com/javase/6/docs/api/java/nio/channels/FileChannel.html#force(boolean) "Passing false for this parameter indicates that only updates to the file's content need be written to storage; passing true indicates that updates to both the file's content and metada

Re: TestDFSIO delivers bad values of "throughput" and "average IO rate"

2009-01-07 Thread Konstantin Shvachko

tienduc_dinh wrote: Hi Konstantin, thanks so much for your help. I was a litte bit confused about why my setting mapred.map.tasks = 10 in hadoop-site.xml, but hadoop didn't map anything. So your answer with In case of TestDFSIO it will be overridden by "-nrFiles". is the key. I need now

RE: Concatenating PDF files

2009-01-07 Thread Zak, Richard [USA]

I was able to process 100 pdfs in 4 directories. How I have moved up to 500 pdfs (started with 700 and I'm working backwards) in 6 directories, and I am getting this error in the console: 09/01/07 14:04:41 INFO mapred.JobClient: Task Id : attempt_200812311556_0034_m_00_0, Status : FAILED java

Re: Question about the Namenode edit log and syncing the edit log to disk. 0.19.0

2009-01-07 Thread Raghu Angadi

Did you look at FSEditLog.EditLogFileOutputStream.flushAndSync()? This code was re-organized sometime back. But the guarantees it provides should be exactly same as before. Please let us know otherwise. Raghu. Jason Venner wrote: I have always assumed (which is clearly my error) that edit lo

Question about the Namenode edit log and syncing the edit log to disk. 0.19.0

2009-01-07 Thread Jason Venner

I have always assumed (which is clearly my error) that edit log writes were flushed to storage to ensure that the edit log was consistent during machine crash recovery. I have been working through FSEditLog.java and I don't see any calls of force(true) on the file channel or sync on the file d

Re: Auditing and accounting with Hadoop

2009-01-07 Thread Doug Cutting

The notion of a client/task ID, independent of IP or username seems useful for log analysis. DFS's client ID is probably currently your best bet, but we might improve its implementation, and make the notion more generic. It is currently implemented as: String taskId = conf.get("mapred.ta

Auditing and accounting with Hadoop

2009-01-07 Thread Brian Bockelman

Hey, One of our charges is to do auditing and accounting with our file systems (we use the simplifying assumption that the users are non- malicious). Auditing can be done by going through the namenode logs and utilizing the UGI information to track opens/reads/writes back to the users.

Re: TestDFSIO delivers bad values of "throughput" and "average IO rate"

2009-01-07 Thread tienduc_dinh

Hi Konstantin, thanks so much for your help. I was a litte bit confused about why my setting mapred.map.tasks = 10 in hadoop-site.xml, but hadoop didn't map anything. So your answer with > In case of TestDFSIO it will be overridden by "-nrFiles". is the key. I need now your confirm to know,

We have finally opened Neptune, yet another BigTable-clone project.

2009-01-07 Thread neptune

Dear all, We have finally opened Neptune, yet another BigTable-clone project. Neptune has the following features. - Basic Data Service . Single-row operations : Get, Put . Multi-row operations : Like, Between, Scanner . Data Uploader : DirectUploader . MapReduce : TableInputFormat

Re: Combiner run specification and questions

Re: Storing/retrieving time series with hadoop

Storing/retrieving time series with hadoop

Re: TestDFSIO delivers bad values of "throughput" and "average IO rate"

Re: TestDFSIO delivers bad values of "throughput" and "average IO rate"

Re: Question about the Namenode edit log and syncing the edit log to disk. 0.19.0

Re: TestDFSIO delivers bad values of "throughput" and "average IO rate"

RE: Concatenating PDF files

Re: Question about the Namenode edit log and syncing the edit log to disk. 0.19.0

Question about the Namenode edit log and syncing the edit log to disk. 0.19.0

Re: Auditing and accounting with Hadoop

Auditing and accounting with Hadoop

Re: TestDFSIO delivers bad values of "throughput" and "average IO rate"

We have finally opened Neptune, yet another BigTable-clone project.

14 matches

Site Navigation

Mail list logo

Footer information