Re: splitting of big files?

2008-05-29 Thread Doug Cutting
Erik Paulson wrote: When reading from HDFS, how big are the network read requests, and what controls that? Or, more concretely, if I store files using 64Meg blocks in HDFS and run the simple word count example, and I get the default of one FileSplit/Map task per 64 meg block, how many bytes into

Re: splitting of big files?

2008-05-28 Thread Erik Paulson
On Tue, May 27, 2008 at 10:49:38AM -0700, Ted Dunning wrote: > > There is a good tutorial on the wiki about this. > > Your problem here is that you have conflated two concepts. The first is the > splitting of files into blocks for storage purposes. This has nothing to do > with what data a prog

Re: splitting of big files?

2008-05-27 Thread Ted Dunning
The input format chosen determines the semantics of the input file. On 5/27/08 9:46 AM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > How does the application know that the file is 'text' though (i.e. when is new > line a special character)? Or are all files assumed to be text? > > And even

Re: splitting of big files?

2008-05-27 Thread Ted Dunning
There is a good tutorial on the wiki about this. Your problem here is that you have conflated two concepts. The first is the splitting of files into blocks for storage purposes. This has nothing to do with what data a program can read any more than splitting a file into blocks on a disk in a co

RE: Re: splitting of big files?

2008-05-27 Thread Andreas Kostyrka
It's text lines for streaming, which is just another Map/Reduce app. And how it's interpreted by your app, it's up to your input class. Andreas Am Dienstag, den 27.05.2008, 16:46 + schrieb [EMAIL PROTECTED]: > > >- > >Od: Doug Cutting

RE: Re: splitting of big files?

2008-05-27 Thread koara
>- >Od: Doug Cutting > >Each split (except the first) contains the first line starting after >it's start position through the first line ending after its end >position. So if you have a file with: Aha, very nice, in my browsing around t