Re: Reduce output is strange

2012-04-03 Thread Owen O'Malley
On Tue, Apr 3, 2012 at 8:25 AM, Pedro Costa wrote: > What I want to ask is: > > - how do I read the values from sequence files that are block, or record > compressed, or uncompressed? You use the SequenceFile.Reader class. > - how do I know if the sequence file is block compressed, record > comp

Re: Reduce output is strange

2012-04-03 Thread Owen O'Malley
On Tue, Apr 3, 2012 at 8:01 AM, Pedro Costa wrote: > If I want to compare 2 sequence files to see if they are the same, how do I > compare? >From the command line, you can "textify" the files with: hadoop fs -text myfile.seq Of course, if you are using API you can iterate through the two Sequen

Re: Reduce output is strange

2012-04-03 Thread Pedro Costa
What I want to ask is: - how do I read the values from sequence files that are block, or record compressed, or uncompressed? - how do I know if the sequence file is block compressed, record compressed, or uncompressed? - how do I know if it's a sequence file or a Textfile? On 3 April 2012 16:

Re: Reduce output is strange

2012-04-03 Thread Pedro Costa
If I want to compare 2 sequence files to see if they are the same, how do I compare? On 19 December 2011 14:43, Robert Evans wrote: > Oh I forgot to say that part of the Random Characters are actually random > characters. Sequence files store a set of random characters as synch > points withi

Re: Reduce output is strange

2011-12-19 Thread Robert Evans
Oh I forgot to say that part of the Random Characters are actually random characters. Sequence files store a set of random characters as synch points within the file. This allows for splitting the file easily without a high risk that the random sequence appears inside the data itself just by c

Re: Reduce output is strange

2011-12-19 Thread Robert Evans
It looks mostly correct to me. I am not an expert on sequence files, and I have not checked the text against the spec nor have I checked the binary numbers in it to be sure they add up to the correct lengths etc, but it looks good from a first glance. I can see the SEQ tag at the beginning to

Reduce output is strange

2011-12-19 Thread Pedro Costa
Hi, In the hadoop MapReduce, I've executed the webdatascan example, and the reduce output is in a SequeceFile. The result is shows here ( http://paste.lisp.org/display/126572). What's the trash (random characters), like "u 265 100 330 320 252 " \n # ; 374 5 211 V ' 340 376" in the output? Is t