RE: Hadoop/HDFS for scientific simulation output data analysis

2009-04-03 Thread Tu, Tiankai
mulation output data analysis On Apr 3, 2009, at 1:41 PM, Tu, Tiankai wrote: > By the way, what is the largest size---in terms of total bytes, number > of files, and number of nodes---in your applications? Thanks. The largest Hadoop application that has been documented is the Yahoo Webmap. 1

RE: Hadoop/HDFS for scientific simulation output data analysis

2009-04-03 Thread Tu, Tiankai
ore streaming it through your mapper. If your algorithm does require random access throughout the file on the other hand, you do need to read it all in. I think the WholeFileRecordReader in the FAQ is aimed at smaller files than 256 MB / 1 GB. On Fri, Apr 3, 2009 at 9:37 AM, Tu, Tiankai wrote: &

RE: Hadoop/HDFS for scientific simulation output data analysis

2009-04-03 Thread Tu, Tiankai
ven to your job? Does bin/hadoop dfs -dus come out as 1.6 TB? Matei On Sat, Mar 28, 2009 at 4:10 PM, Tu, Tiankai wrote: > Hi, > > I have been exploring the feasibility of using Hadoop/HDFS to analyze > terabyte-scale scientific simulation output datasets. After a set of > initial e

Hadoop/HDFS for scientific simulation output data analysis

2009-03-28 Thread Tu, Tiankai
Hi, I have been exploring the feasibility of using Hadoop/HDFS to analyze terabyte-scale scientific simulation output datasets. After a set of initial experiments, I have a number of questions regarding (1) the configuration setting and (2) the IO read performance. --