custom partitioner

2012-03-25 Thread Harun Raşit ER
My custom parititoner is: public class PopulationPartitioner extends Partitioner implements Configurable { @Override public int getPartition(IntWritable key, Chromosome value, int numOfPartitions) { int partition = key.get();

Re: question about processing large zip

2012-03-25 Thread Joshua Smith
As I understand it, zip isn't splittable format. You might consider using bzip2 or another splittable compression format. Alternatively, you could have one job that does the decompression chained to another that does the.processing to get the parallelization. On Mar 19, 2012 8:26 PM, "Andrew McNai

Re: custom partitioner

2012-03-25 Thread Harsh J
Harun, Does your map task stdout logs show varying values for "partition"? Seems to me like all your keys are somehow outside of [0, numOfPartitions), and hence go to the last partition, per your logic. 2012/3/25 Harun Raşit ER : > public int getPartition(IntWritable key, Chromosome value, int >

Re: Getting statuses of jobs

2012-03-25 Thread Harsh J
If your real problem is a bad client you do not want running jobs (or do not wish them to be granted all resources when they do), why not tackle just that instead of "working-around"? Hadoop allows authorization of users, and MR schedulers also allow restricting submissions to defined queues/pools

Re: custom partitioner

2012-03-25 Thread Stan Rosenberg
Typically, numPartitons is used as a modulus in orde to derive a value that is between zero and strictly less than numPartitons. That is, key.get() % numPartitions would yield such a value. stan On Mar 25, 2012 11:25 AM, "Harun Raşit ER" wrote: > public int getPartition(IntWritable key, Chromos

Re: cannot start secure cluster without privileged resources

2012-03-25 Thread Eric Schwartz
Yes, there's nothing filtering packets between systems in the cluster. I've verified with tcpdump what network traffic is transpiring: just the kerberos login, and nothing else. Thanks, Eric Schwartz On Fri, 23 Mar 2012, Mapred Learn wrote: Do you have these ports open amongst the datanodes

custom partitioner

2012-03-25 Thread Harun Raşit ER
public int getPartition(IntWritable key, Chromosome value, int numOfPartitions) { int partition = key.get(); if (partition < 0 || partition >= numOfPartitions) { partition = numOfPartitions-1; }

Getting statuses of jobs

2012-03-25 Thread shlomi java
hi all, I want to figure out, from a client of Hadoop cluster, the statuses of jobs that are currently running on the cluster. I need it in order to prevent the client from submitting certain jobs to the cluster, when some certain jobs are already running on the cluster. I know to recognize my job