Re: copy data from one hadoop cluster to another hadoop cluster + cant use distcp

2015-06-19 Thread Joep Rottinghuis
You can't set up a proxy ? You probably want to avoid writing to local file system because aside from that being slow, it limits the size of your file to the free space on your local disc. If you do need to go commando and go through a single client machine that can see both clusters you

Re: Question about log files

2015-04-06 Thread Joep Rottinghuis
This depends on your OS. When you delete a file on Linux, you merely unlink the entry from the directory. The file does not actually get deleted until until the last reference (open handle) goes away. Note that this could lead to an interesting way to fill up a disk. You should be able to see

Re: Fair Scheduler of Hadoop

2013-01-21 Thread Joep Rottinghuis
, Jan 21, 2013 at 8:24 AM, Joep Rottinghuis jrottingh...@gmail.com wrote: Lin, The article you are reading us old. Fair scheduler does have preemption. Tasks get killed and rerun later, potentially on a different node. You can set a minimum / guaranteed capacity. The sum of those across pools

Re: Fair Scheduler of Hadoop

2013-01-20 Thread Joep Rottinghuis
Lin, The article you are reading us old. Fair scheduler does have preemption. Tasks get killed and rerun later, potentially on a different node. You can set a minimum / guaranteed capacity. The sum of those across pools would typically equal the total capacity of your cluster or less. Then you

Re: NN Memory Jumps every 1 1/2 hours

2012-12-23 Thread Joep Rottinghuis
Do you have audit logs from before and after to compare? Are there some surprising access patterns you can discern? Joep Sent from my iPhone On Dec 23, 2012, at 10:34 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Tried this.. NameNode is still Ruining my Xmas on its slow death march to

Re: NN Memory Jumps every 1 1/2 hours

2012-12-22 Thread Joep Rottinghuis
Do your OOMs correlate with the secondary checkpointing? Joep Sent from my iPhone On Dec 22, 2012, at 7:42 AM, Michael Segel michael_se...@hotmail.com wrote: Hey Silly question... How long have you had 27 million files? I mean can you correlate the number of files to the spat of OOMs?

Re: hbase puts in map tasks don't seem to run in parallel

2012-06-03 Thread Joep Rottinghuis
How large is your table? If it is newly created and still almost empty then it will probably consist of only one region, which will be hosted on one region server. Even as the table grows and gets split into multiple regions, you will have to split your mappers in such a way that each writes to

Re: hbase puts in map tasks don't seem to run in parallel

2012-06-03 Thread Joep Rottinghuis
, 2012 at 12:02 PM, Joep Rottinghuis jrottingh...@gmail.comwrote: How large is your table? If it is newly created and still almost empty then it will probably consist of only one region, which will be hosted on one region server. Even as the table grows and gets split into multiple regions

Re: how is userlogs supposed to be cleaned up?

2012-03-06 Thread Joep Rottinghuis
Aside from cleanup, it seems like you are running into max number of subdirectories per directory on ext3. Joep Sent from my iPhone On Mar 6, 2012, at 10:22 AM, Chris Curtin curtin.ch...@gmail.com wrote: Hi, We had a fun morning trying to figure out why our cluster was failing jobs,

Re: 1gig or 10gig network for cluster?

2011-12-23 Thread Joep Rottinghuis
One or two 1gig nics on a 10g backbone sound reasonable with only 4 1T drives. 12*2T disks per node are getting more common and do not all have 10gig network cards, even on 600+ node clusters. Cheers, Joep Sent from my iPhone On Dec 23, 2011, at 11:15 AM, Mads Toftum m...@toftum.dk wrote: