override mapreduce compression?

2012-03-06 Thread Luke Forehand
Hello, Is there a way to run the mahout kmeans program from the command line, with a parameter that will override (and disable) the reducer task compression? I have tried several different ways of specifying -D parameter but I can't seem to get any options to pass through to the hadoop mapredu

Re: override mapreduce compression?

2012-03-06 Thread Sean Owen
Mapper compression? -Dmapreduce.map.output.compress=false. I think the key was mapred.output.compress in Hadoop 0.20.0. I am not sure if there is reducer compression built-in, but, I could have missed it. On Tue, Mar 6, 2012 at 9:40 PM, Luke Forehand wrote: > Hello, > > Is there a way to run the

Re: override mapreduce compression?

2012-03-06 Thread Luke Forehand
I tried the following and it does not work: mahout kmeans -i /mahout/sparse/test1/tfidf-vectors -c /mahout/initial-clusters/test1 -o /mahout/kmeans/test1 -k 1 -cd 0.01 -x 100 \ -Dmapreduce.map.output.compress=false mahout kmeans -i /mahout/sparse/test1/tfidf-vectors -c /mahout/initial-cluster

Re: override mapreduce compression?

2012-03-06 Thread Sean Owen
-D arguments are to the JVM so need to be set in HADOOP_OPTS (as I recall). Or you configure this in your Hadoop config files. It has no meaning to the driver script. Why do you want to disable compression after the mapper? On Wed, Mar 7, 2012 at 12:11 AM, Luke Forehand wrote: > I tried the foll

Re: override mapreduce compression?

2012-03-06 Thread Luke Forehand
I want the results of the kmeans clustering to be uncompressed or compressed in a way that my users can natively decompress on their machines. All our other hadoop jobs use Snappy compression when writing output, but our users don't have Snappy and don't particularly want to install it (especially

Re: override mapreduce compression?

2012-03-06 Thread Sean Owen
Ok but you're talking about reducer output not mapper. It should not be compressed in the first place. On Mar 7, 2012 12:29 AM, "Luke Forehand" < luke.foreh...@networkedinsights.com> wrote: > I want the results of the kmeans clustering to be uncompressed or > compressed in a way that my users can

Re: override mapreduce compression?

2012-03-06 Thread Luke Forehand
Why should it not be compressed in the first place? Here is the header of one of the reducer parts that was written into /mahout/kmeans/clusters-5-final SEQorg.apache.hadoop.io.Text+org.apache.mahout.clustering.kmeans.Cluster )org.apache.hadoop.io.compress.SnappyCodec On 3/6/12 6:33 PM, "Se

Re: override mapreduce compression?

2012-03-06 Thread Sean Owen
Eh, hmm, does this job compress by default? I don't have the code here. That is not generally how Hadoop works but you could make it do this. I don't know if there's an override. On Mar 7, 2012 12:40 AM, "Luke Forehand" < luke.foreh...@networkedinsights.com> wrote: > Why should it not be compresse

Re: override mapreduce compression?

2012-03-06 Thread Luke Forehand
Our operations guy handles our hadoop configuration, and I think he has setup our hadoop conf to compress everything. I'm trying to subvert him :-) I think the HADOOP_OPTS trick will work for me, I think that makes sense. Thanks! -Luke On 3/6/12 6:46 PM, "Sean Owen" wrote: >Eh, hmm, does thi

Re: override mapreduce compression?

2012-03-07 Thread Dmitriy Lyubimov
Aren't hadoop site.xml settings on the driver's client usually overshadow whatever it is on the cluster? Or you don't have the privs to change that either? On Tue, Mar 6, 2012 at 4:54 PM, Luke Forehand wrote: > Our operations guy handles our hadoop configuration, and I think he has > setup our ha

Re: override mapreduce compression?

2012-03-07 Thread Sean Owen
The client can override cluster defaults unless the cluster marks them "final". On Wed, Mar 7, 2012 at 9:02 PM, Dmitriy Lyubimov wrote: > Aren't hadoop site.xml settings on the driver's client usually > overshadow whatever it is on the cluster? Or you don't have the privs > to change that either?