RE: Hadoop/HDFS for scientific simulation output data analysis

2009-04-03 Thread Tu, Tiankai
- From: Matei Zaharia [mailto:ma...@cloudera.com] Sent: Friday, April 03, 2009 11:21 AM To: core-user@hadoop.apache.org Subject: Re: Hadoop/HDFS for scientific simulation output data analysis Hi Tiankai, The one strange thing I see in your configuration as described is IO buffer size and IO bytes

Re: Hadoop/HDFS for scientific simulation output data analysis

2009-04-03 Thread Matei Zaharia
, 6400 for the 256MB file dataset, and so forth. Tiankai -Original Message- From: Matei Zaharia [mailto:ma...@cloudera.com] Sent: Friday, April 03, 2009 11:21 AM To: core-user@hadoop.apache.org Subject: Re: Hadoop/HDFS for scientific simulation output data analysis Hi Tiankai

RE: Hadoop/HDFS for scientific simulation output data analysis

2009-04-03 Thread Tu, Tiankai
. -Original Message- From: Matei Zaharia [mailto:ma...@cloudera.com] Sent: Friday, April 03, 2009 1:18 PM To: core-user@hadoop.apache.org Subject: Re: Hadoop/HDFS for scientific simulation output data analysis Hadoop does checksums for each small chunk of the file (512 bytes by default) and stores

Re: Hadoop/HDFS for scientific simulation output data analysis

2009-04-03 Thread Owen O'Malley
On Apr 3, 2009, at 1:41 PM, Tu, Tiankai wrote: By the way, what is the largest size---in terms of total bytes, number of files, and number of nodes---in your applications? Thanks. The largest Hadoop application that has been documented is the Yahoo Webmap. 10,000 cores 500 TB shuffle 300