My custom parititoner is:
public class PopulationPartitioner extends Partitioner implements Configurable
{
@Override
public int getPartition(IntWritable key, Chromosome value, int
numOfPartitions) {
int partition = key.get();
As I understand it, zip isn't splittable format. You might consider using
bzip2 or another splittable compression format.
Alternatively, you could have one job that does the decompression chained
to another that does the.processing to get the parallelization.
On Mar 19, 2012 8:26 PM, "Andrew McNai
Harun,
Does your map task stdout logs show varying values for "partition"?
Seems to me like all your keys are somehow outside of [0,
numOfPartitions), and hence go to the last partition, per your logic.
2012/3/25 Harun Raşit ER :
> public int getPartition(IntWritable key, Chromosome value, int
>
If your real problem is a bad client you do not want running jobs (or
do not wish them to be granted all resources when they do), why not
tackle just that instead of "working-around"?
Hadoop allows authorization of users, and MR schedulers also allow
restricting submissions to defined queues/pools
Typically, numPartitons is used as a modulus in orde to derive a value that
is between zero and strictly less than numPartitons. That is, key.get() %
numPartitions would yield such a value.
stan
On Mar 25, 2012 11:25 AM, "Harun Raşit ER" wrote:
> public int getPartition(IntWritable key, Chromos
Yes, there's nothing filtering packets between systems in the cluster.
I've verified with tcpdump what network traffic is transpiring: just the
kerberos login, and nothing else.
Thanks,
Eric Schwartz
On Fri, 23 Mar 2012, Mapred Learn wrote:
Do you have these ports open amongst the datanodes
public int getPartition(IntWritable key, Chromosome value, int
numOfPartitions)
{
int partition = key.get();
if (partition < 0 || partition >= numOfPartitions)
{
partition = numOfPartitions-1;
}
hi all,
I want to figure out, from a client of Hadoop cluster, the statuses of jobs
that are currently running on the cluster.
I need it in order to prevent the client from submitting certain jobs to
the cluster, when some certain jobs are already running on the cluster.
I know to recognize my job