Erik Paulson wrote:
When reading from HDFS, how big are the network read requests, and what
controls that? Or, more concretely, if I store files using 64Meg blocks
in HDFS and run the simple word count example, and I get the default of
one FileSplit/Map task per 64 meg block, how many bytes into
On Tue, May 27, 2008 at 10:49:38AM -0700, Ted Dunning wrote:
>
> There is a good tutorial on the wiki about this.
>
> Your problem here is that you have conflated two concepts. The first is the
> splitting of files into blocks for storage purposes. This has nothing to do
> with what data a prog
The input format chosen determines the semantics of the input file.
On 5/27/08 9:46 AM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:
> How does the application know that the file is 'text' though (i.e. when is new
> line a special character)? Or are all files assumed to be text?
>
> And even
There is a good tutorial on the wiki about this.
Your problem here is that you have conflated two concepts. The first is the
splitting of files into blocks for storage purposes. This has nothing to do
with what data a program can read any more than splitting a file into blocks
on a disk in a co
It's text lines for streaming, which is just another Map/Reduce app.
And how it's interpreted by your app, it's up to your input class.
Andreas
Am Dienstag, den 27.05.2008, 16:46 + schrieb [EMAIL PROTECTED]:
>
> >-
> >Od: Doug Cutting
>-
>Od: Doug Cutting
>
>Each split (except the first) contains the first line starting after
>it's start position through the first line ending after its end
>position. So if you have a file with:
Aha, very nice, in my browsing around t