Hi,
2011/1/31 Sean Bigdatafun sean.bigdata...@gmail.com:
GZIP is not splittable.
Correct, gzip is a stream compression system which effectively means
you can only start at the beginning of the data with decompressing.
Does that mean a GZIP block compressed sequencefile can't take advantage of
On Mon, Jan 31, 2011 at 1:56 PM, Sean Bigdatafun
sean.bigdata...@gmail.com wrote:
How to control the size of block to be compressed in SequenceFile?
Specified when creating a SequenceFile.Writer object. See the various
SequenceFile.createWriter()
--
Harsh J
www.harshj.com
Hi!
I was running a Hadoop cluster on Amazon EC2 instances, then after 2 days of
work, one of the worker nodes just simply died (I cannot connect to the
instance either). That node also appears on the dfshealth as dead node.
Until now everything is normal.
Unfortunately the job it was running
On Mon, Jan 31, 2011 at 12:36 AM, Niels Basjes ni...@basjes.nl wrote:
Hi,
2011/1/31 Sean Bigdatafun sean.bigdata...@gmail.com:
GZIP is not splittable.
Correct, gzip is a stream compression system which effectively means
you can only start at the beginning of the data with decompressing.
Still need to figure out whether a queue can be associated with a TT. i.e.
TT acl for a queue
in which tasks submitted to that queue will only be relayed to TT in the acl
list for the queue.
On Mon, Jan 31, 2011 at 10:51 PM, rishi pathak mailmaverick...@gmail.comwrote:
Hi Koji,
Hello,
On Mon, Jan 31, 2011 at 10:41 PM, Sean Bigdatafun
sean.bigdata...@gmail.com wrote:
On Mon, Jan 31, 2011 at 12:36 AM, Niels Basjes ni...@basjes.nl wrote:
Hi,
2011/1/31 Sean Bigdatafun sean.bigdata...@gmail.com:
GZIP is not splittable.
Correct, gzip is a stream compression system
Hi,
On the reduce side, after the RT had passed the merge phase (before
the reduce phase starts), I've got the path of the map_0.out file. I'm
opening this file with
[code]
FSDataInputStream in = fs.open(file);
[/code]
But, I only got the path. Is it possible to obtain the file status of
this
I said file status, but what I would like to know is the size of the file.
On Mon, Jan 31, 2011 at 5:56 PM, Pedro Costa psdc1...@gmail.com wrote:
Hi,
On the reduce side, after the RT had passed the merge phase (before
the reduce phase starts), I've got the path of the map_0.out file. I'm
FileSystem.getFileStatus(Path path) should return you the goodies,
using an appropriate FileSystem implementation (Hint: URI).
On Mon, Jan 31, 2011 at 11:30 PM, Pedro Costa psdc1...@gmail.com wrote:
I said file status, but what I would like to know is the size of the file.
On Mon, Jan 31, 2011
Hi,
When the reduce fetch from the mappers a map output of the size of 1GB
and do the merge, is it possible that part of the map output is saved
in disk and other part in memory?
Or a map output must be saved all in disk, or all in memory?
Thanks,
--
Pedro
On Jan 31, 2011, at 10:51 AM, Pedro Costa wrote:
Hi,
When the reduce fetch from the mappers a map output of the size of 1GB
and do the merge, is it possible that part of the map output is saved
in disk and other part in memory?
Yes, the reduce tries to keep as much in memory as possible.
Please don't cross-post, CDH questions should go to their user lists.
On Jan 31, 2011, at 6:15 AM, Kiss Tibor wrote:
Hi!
I was running a Hadoop cluster on Amazon EC2 instances, then after 2
days of work, one of the worker nodes just simply died (I cannot
connect to the instance either).
12 matches
Mail list logo