mulation output data analysis
On Apr 3, 2009, at 1:41 PM, Tu, Tiankai wrote:
> By the way, what is the largest size---in terms of total bytes, number
> of files, and number of nodes---in your applications? Thanks.
The largest Hadoop application that has been documented is the Yahoo
Webmap.
1
ore streaming it through your mapper. If your algorithm does
require random access throughout the file on the other hand, you do need
to
read it all in. I think the WholeFileRecordReader in the FAQ is aimed at
smaller files than 256 MB / 1 GB.
On Fri, Apr 3, 2009 at 9:37 AM, Tu, Tiankai
wrote:
&
ven to your job? Does
bin/hadoop dfs -dus come out as 1.6 TB?
Matei
On Sat, Mar 28, 2009 at 4:10 PM, Tu, Tiankai
wrote:
> Hi,
>
> I have been exploring the feasibility of using Hadoop/HDFS to analyze
> terabyte-scale scientific simulation output datasets. After a set of
> initial e
Hi,
I have been exploring the feasibility of using Hadoop/HDFS to analyze
terabyte-scale scientific simulation output datasets. After a set of
initial experiments, I have a number of questions regarding (1) the
configuration setting and (2) the IO read performance.
--