Hi Charles,
Can you describe your MR workflow?
Do you use MR for reconstruction , analysis or simulation jobs?
What's the layout of the input and output files, ROOT? NTuple?
How do you split the input and merge the result?
Thanks!
Donal
2011/11/11 Charles Earl
> Hi,
> Please also feel free to co
you
> >> can go for WholeFileInputFormat.
> >>
> >> Please revert if you are still confused. Also if you have some specific
> >> scenario, please put that across so we may be able to help you
> understand
> >> better on the map reduce processing of
d out you
>> can go for WholeFileInputFormat.
>>
>> Please revert if you are still confused. Also if you have some specific
>> scenario, please put that across so we may be able to help you understand
>> better on the map reduce processing of the same.
>>
>>
Thanks Bejoy, that help a lot!
2011/11/11, Bejoy KS :
> Hi Donal
> I don't have much of an expose to the domain which you are
> pointing on to, but from a plain map reduce developer terms there would be
> my way of looking into processing such data format with map reduce
> - If the data i
Hi Donal
I don't have much of an expose to the domain which you are
pointing on to, but from a plain map reduce developer terms there would be
my way of looking into processing such data format with map reduce
- If the data is kind of flowing in continuously then I'd use flume to
collect t
Regards
> Bejoy K S
> --
> *From: * 臧冬松
> *Date: *Fri, 11 Nov 2011 20:46:54 +0800
> *To: *
> *ReplyTo: * hdfs-user@hadoop.apache.org
> *Subject: *Re: structured data split
>
> Thanks Bejoy!
> It's better to process the data blocks local
Hi,
Please also feel free to contact me. I'm working with STAR project at
Brookhaven Lab, and we are trying to build a MR workflow for analysis of
particle data. I've done some preliminary experiments running Root and other
nuclear physics analysis software in MR and have been looking at various
Hi Donal-
On Fri, Nov 11, 2011 at 10:12:44PM +0800, ?? wrote:
> My scenario is that I have lots of files from High Energy Physics experiment.
> These files are in binary format,about 2G each, but basically they are
> composed by lots of "Event", each Event is independent with others. The
> phy
t;
> Hope it clarifies...
> Regards
> Bejoy K S
> --
> *From: * 臧冬松
> *Date: *Fri, 11 Nov 2011 20:46:54 +0800
> *To: *
> *ReplyTo: * hdfs-user@hadoop.apache.org
> *Subject: *Re: structured data split
>
> Thanks Bejoy!
> It's bet
io, please put that across so we may be able to help you understand
> better on the map reduce processing of the same.
>
> Hope it clarifies...
> Regards
> Bejoy K S
> From: 臧冬松
> Date: Fri, 11 Nov 2011 20:46:54 +0800
> To:
> ReplyTo: hdfs-user@hadoop.apache.org
> S
Reply-To: hdfs-user@hadoop.apache.org
Subject: Re: structured data split
Thanks Bejoy!
It's better to process the data blocks locally and separately.
I just want to know how to deal with a structure (i.e. a word,a line) that
is split into two blocks.
Cheers,
Donal
在 2011年11月11日 下午7:01
Thanks Bejoy!
It's better to process the data blocks locally and separately.
I just want to know how to deal with a structure (i.e. a word,a line) that
is split into two blocks.
Cheers,
Donal
在 2011年11月11日 下午7:01,Bejoy KS 写道:
> Hi Donal
> You can configure your map tasks the way you like t
Hi Donal
You can configure your map tasks the way you like to process your
input. If you have file of size 100 mb, it would be divided into two input
blocks and stored in hdfs ( if your dfs.block.size is default 64 Mb). It is
your choice on how you process the same using map reduce
- With th
Thanks Denny!
So that means each map task will have to read from another DataNode inorder
to read the end line of the previous block?
Cheers,
Donal
2011/11/11 Denny Ye
> hi
>Structured data is always being split into different blocks, likes a
> word or line.
>MapReduce task read HDFS da
hi
Structured data is always being split into different blocks, likes a
word or line.
MapReduce task read HDFS data with the unit - *line* - it will read the
whole line from the end of previous block to start of subsequent to obtains
that part of line record. So you does not worry about the I
Usually large file in HDFS is split into bulks and store in different
DataNodes.
A map task is assigned to deal with that bulk, I wonder what if the
Structured data(i.e a word) was split into two bulks?
How MapReduce and HDFS deal with this?
Thanks!
Donal
16 matches
Mail list logo