Re: structured data split

2011-11-14 Thread 臧冬松
Hi Charles, Can you describe your MR workflow? Do you use MR for reconstruction , analysis or simulation jobs? What's the layout of the input and output files, ROOT? NTuple? How do you split the input and merge the result? Thanks! Donal 2011/11/11 Charles Earl > Hi, > Please also feel free to co

Re: structured data split

2011-11-11 Thread Bejoy KS
you > >> can go for WholeFileInputFormat. > >> > >> Please revert if you are still confused. Also if you have some specific > >> scenario, please put that across so we may be able to help you > understand > >> better on the map reduce processing of

Re: structured data split

2011-11-11 Thread Harsh J
d out you >> can go for WholeFileInputFormat. >> >> Please revert if you are still confused. Also if you have some specific >> scenario, please put that across so we may be able to help you understand >> better on the map reduce processing of the same. >> >>

Re: structured data split

2011-11-11 Thread 臧冬松
Thanks Bejoy, that help a lot! 2011/11/11, Bejoy KS : > Hi Donal > I don't have much of an expose to the domain which you are > pointing on to, but from a plain map reduce developer terms there would be > my way of looking into processing such data format with map reduce > - If the data i

Re: structured data split

2011-11-11 Thread Bejoy KS
Hi Donal I don't have much of an expose to the domain which you are pointing on to, but from a plain map reduce developer terms there would be my way of looking into processing such data format with map reduce - If the data is kind of flowing in continuously then I'd use flume to collect t

Re: structured data split

2011-11-11 Thread Bejoy KS
Regards > Bejoy K S > -- > *From: * 臧冬松 > *Date: *Fri, 11 Nov 2011 20:46:54 +0800 > *To: * > *ReplyTo: * hdfs-user@hadoop.apache.org > *Subject: *Re: structured data split > > Thanks Bejoy! > It's better to process the data blocks local

Re: structured data split

2011-11-11 Thread Charles Earl
Hi, Please also feel free to contact me. I'm working with STAR project at Brookhaven Lab, and we are trying to build a MR workflow for analysis of particle data. I've done some preliminary experiments running Root and other nuclear physics analysis software in MR and have been looking at various

Re: structured data split

2011-11-11 Thread Will Maier
Hi Donal- On Fri, Nov 11, 2011 at 10:12:44PM +0800, ?? wrote: > My scenario is that I have lots of files from High Energy Physics experiment. > These files are in binary format,about 2G each, but basically they are > composed by lots of "Event", each Event is independent with others. The > phy

Re: structured data split

2011-11-11 Thread 臧冬松
t; > Hope it clarifies... > Regards > Bejoy K S > -- > *From: * 臧冬松 > *Date: *Fri, 11 Nov 2011 20:46:54 +0800 > *To: * > *ReplyTo: * hdfs-user@hadoop.apache.org > *Subject: *Re: structured data split > > Thanks Bejoy! > It's bet

Re: structured data split

2011-11-11 Thread Harsh J
io, please put that across so we may be able to help you understand > better on the map reduce processing of the same. > > Hope it clarifies... > Regards > Bejoy K S > From: 臧冬松 > Date: Fri, 11 Nov 2011 20:46:54 +0800 > To: > ReplyTo: hdfs-user@hadoop.apache.org > S

Re: structured data split

2011-11-11 Thread bejoy . hadoop
Reply-To: hdfs-user@hadoop.apache.org Subject: Re: structured data split Thanks Bejoy! It's better to process the data blocks locally and separately. I just want to know how to deal with a structure (i.e. a word,a line) that is split into two blocks. Cheers, Donal 在 2011年11月11日 下午7:01

Re: structured data split

2011-11-11 Thread 臧冬松
Thanks Bejoy! It's better to process the data blocks locally and separately. I just want to know how to deal with a structure (i.e. a word,a line) that is split into two blocks. Cheers, Donal 在 2011年11月11日 下午7:01,Bejoy KS 写道: > Hi Donal > You can configure your map tasks the way you like t

Re: structured data split

2011-11-11 Thread Bejoy KS
Hi Donal You can configure your map tasks the way you like to process your input. If you have file of size 100 mb, it would be divided into two input blocks and stored in hdfs ( if your dfs.block.size is default 64 Mb). It is your choice on how you process the same using map reduce - With th

Re: structured data split

2011-11-11 Thread 臧冬松
Thanks Denny! So that means each map task will have to read from another DataNode inorder to read the end line of the previous block? Cheers, Donal 2011/11/11 Denny Ye > hi >Structured data is always being split into different blocks, likes a > word or line. >MapReduce task read HDFS da

Re: structured data split

2011-11-11 Thread Denny Ye
hi Structured data is always being split into different blocks, likes a word or line. MapReduce task read HDFS data with the unit - *line* - it will read the whole line from the end of previous block to start of subsequent to obtains that part of line record. So you does not worry about the I

structured data split

2011-11-10 Thread 臧冬松
Usually large file in HDFS is split into bulks and store in different DataNodes. A map task is assigned to deal with that bulk, I wonder what if the Structured data(i.e a word) was split into two bulks? How MapReduce and HDFS deal with this? Thanks! Donal