Re: Multiple Aggregate functions in map reduce program

2012-10-05 Thread Bejoy KS
Hi It is definitely possible. In your map make the dept name as the output key and salary as the value. In the reducer for every key you can initialize a counter and a sum. Add on to the sum for all values and increment the counter by 1 for each value. Output the dept key and the new aggregat

hadoop memory settings

2012-10-05 Thread Visioner Sadak
Is ther a relation between HADOOP_HEAPSIZE mapred.child.java.opts and mapred.child.ulimit settings in hadoop-env.sh and mapred-site.xml i have a sinngle machine with 2gb ram and running hadoop on psuedo distr mode my HADOOP_HEAPSIZE is set to 256 wat shud i set mapred.child.java.opts and mapred.chi

Re: Multiple Aggregate functions in map reduce program

2012-10-05 Thread Bertrand Dechoux
> > .It takes time for big data.I heard map reduce > java code will b faster.IS it true???Or i should go for pig programming?? > I guess one important question is what do you mean by 'it takes time'. And what goal do you want to reach. It may be that your current implementation is naive and can be

Re: copyFromLocal

2012-10-05 Thread Visioner Sadak
Hey thanks bejoy and andy act my user just has a desktop web user(like we browsing web) so i guess first i have to upload the file to my linux box(wher hadoop is installed) using a web app then frm there i have to use moveFromLocal to put tht file in to hadoop,i thought hadoop can be directly used

Re: hadoop memory settings

2012-10-05 Thread Visioner Sadak
coz i m getting Error occurred during initialization of VM hadoop java.lang.Throwable: Child Error At org.apache.hadoop.mapred.TaskRunner.run whe running a job.:) On Fri, Oct 5, 2012 at 1:39 PM, Visioner Sadak wrote: > Is ther a relation between HADOOP_HEAPSIZE mapred.child.java.opts and > ma

Re: hadoop memory settings

2012-10-05 Thread Bejoy KS
Hi Sadak AFAIK HADOOP_HEAPSIZE determines the jvm size of the daemons like NN,JT,TT,DN etc. mapred.child.java.opts and mapred.child.ulimit is used to set the jvm heap for child jvms launched for each map/reduce task launched. Regards Bejoy KS Sent from handheld, please excuse typos. -Or

Re: Impersonating HDFS user

2012-10-05 Thread Bertrand Dechoux
Hi, You might be looking for something like : UserGroupInformation.createRemoteUser(user).doAs( see http://hadoop.apache.org/docs/r1.0.3/api/org/apache/hadoop/security/UserGroupInformation.html It is a JAAS wrapper for Hadoop. Regards Bertrand On Fri, Oct 5, 2012 at 3:19 PM, Oleg Zhurakousk

sqoop jobs

2012-10-05 Thread Kartashov, Andy
Guys, Have any one successfully executed commands like Sqoop job -list Sqoop job -create .. etc. Do I need to set-up my sqoop-core.xml before hand? Example.. sqoop job --list 12/10/05 09:44:29 WARN hsqldb.HsqldbJobStorage: Could not interpret as a number: null 12/10/05 09:44:29 ERROR hsqldb.Hsq

RE: Cumulative value using mapreduce

2012-10-05 Thread java8964 java8964
Are you allowed to change the order of the data in the output? If you want to calculate the cr/dr indicator cumulative sum value, then it will easy if the business allow you to change the order of your data group by CR/DR indicator in the output. For example, you can do it very easy with the wa

Re: Impersonating HDFS user

2012-10-05 Thread Bertrand Dechoux
Indeed, you are connecting to localhost and you said it was a remote connection so I guess there is nothing there which is relevant for you. The main idea is that you need to provide the configuration files. They are read by default from the classpath. Any place where you have a Configuration/JobCo

Re: Impersonating HDFS user

2012-10-05 Thread Oleg Zhurakousky
Yes I understand that and I guess I am trying to find that 'right property' I did find one reference to it in hdfs-defaul.xml dfs.datanode.address 0.0.0.0:50010 so i changed that in my hdfs-site.xml to dfs.datanode.address 192.168.15.20:50010 But On Fri, Oct 5, 2012 at 10:33 AM, Bertrand D

Re: Impersonating HDFS user

2012-10-05 Thread Oleg Zhurakousky
sorry clicked send too soon, but basically changing that did not produce any result, still seeing the same message.So I guess my question is what is the property that is responsible for that? Thanks Oleg On Fri, Oct 5, 2012 at 10:40 AM, Oleg Zhurakousky < oleg.zhurakou...@gmail.com> wrote: > Yes

Re: Cumulative value using mapreduce

2012-10-05 Thread Steve Loughran
On 5 October 2012 06:50, Ted Dunning wrote: > negative numbers are a relatively new concept in accounting since 2008, if I'm not mistaken

Re: sqoop jobs

2012-10-05 Thread Marcos Ortiz
Which version of Sqoop are you using? Which version of Hadoop? On 10/05/2012 09:45 AM, Kartashov, Andy wrote: Guys, Have any one successfully executed commands like Sqoop job --list Sqoop job --create .. etc. Do I need to set-up my sqoop-core.xml before hand? Example.. sqoop job --list

RE: sqoop jobs

2012-10-05 Thread Kartashov, Andy
sqoop version Sqoop 1.4.1-cdh4.0.1 hadoop version Hadoop 2.0.0-cdh4.0.1 Subversion file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hadoop-2.0.0-cdh4.0.1/src/hadoop-common-project/hadoop-common hsqldb ver 2.2.9. Andy Kartashov MPAC Architecture R&D, Co-op 1340 Pickering

Re: Cumulative value using mapreduce

2012-10-05 Thread Jane Wayne
there's probably a million ways to do it, but it seems like it can be done, per your question. off the top of my head, you'd probably want to do the cumulative sum in the reducer. if you're savy, maybe even make the reducer reusable as a combiner (looks like this problem might have an associative a

Re: Cumulative value using mapreduce

2012-10-05 Thread Jane Wayne
i'm reading the other posts. i had assume you had +1 reducers. if you just have 1 reducer, then no matter what, every key-value pair goes there. so, in that case, i agree with java8964. you emit all records with one key to that one reducer. make sure you apply secondary sorting (that means you wil

Re: Counters that track the max value

2012-10-05 Thread Jeremy Lewi
HI Harsh, Thank you very much that will work. How come we can't simply create a modification of a regular mapreduce counter which does this behind the scenes? It seems like we should just be able to replace "+" with "max" and everything else should work? J On Wed, Oct 3, 2012 at 9:52 AM, Harsh

Chaning Multiple Reducers: Reduce -> Reduce -> Reduce

2012-10-05 Thread Jim Twensky
Hi, I have a complex Hadoop job that iterates over large graph data multiple times until some convergence condition is met. I know that the map output goes to the local disk of each particular mapper first, and then fetched by the reducers before the reduce tasks start. I can see that this is an

Job jar not removed from staging directory on job failure/how to share a job jar using distributed cache

2012-10-05 Thread Bertrand Dechoux
Hi, I am launching my job using the command line and I observed that when the provided input path do not match any files, the jar in the staging repository is not removed. It is removed on job termination (success or failure) but here the job isn't even really started so it may be an edge case. Ha

Re: Counters that track the max value

2012-10-05 Thread Harsh J
Jeremy, I suppose thats doable, please file a MAPREDUCE JIRA so you can discuss this with others on the development side as well. I am guessing that MAX operations of most of the user-oriented data flow front-ends such as Hive and Pig already do this efficiently, so perhaps there hasn't been a ve

Re: Chaning Multiple Reducers: Reduce -> Reduce -> Reduce

2012-10-05 Thread Harsh J
Hey Jim, Are you looking to re-sort or re-partition your data by a different key or key combo after each output from reduce? On Fri, Oct 5, 2012 at 10:01 PM, Jim Twensky wrote: > Hi, > > I have a complex Hadoop job that iterates over large graph data > multiple times until some convergence cond

Re: Counters that track the max value

2012-10-05 Thread Jeremy Lewi
Done. https://issues.apache.org/jira/browse/MAPREDUCE-4709 Thanks J On Fri, Oct 5, 2012 at 10:13 AM, Harsh J wrote: > Jeremy, > > I suppose thats doable, please file a MAPREDUCE JIRA so you can > discuss this with others on the development side as well. > > I am guessing that MAX operations of

Re: Chaning Multiple Reducers: Reduce -> Reduce -> Reduce

2012-10-05 Thread Jim Twensky
Hi Harsh, Yes, there is actually a "hidden" map stage, that generates new pairs based on the last reduce output but I can create those records during the reduce step instead and get rid of the intermediate map computation completely. The idea is to apply the map function to each output of the red

Re: Chaning Multiple Reducers: Reduce -> Reduce -> Reduce

2012-10-05 Thread Harsh J
Would it then be right to assume that the keys produced by the reduced partition at one stage would be isolated to its partition alone and not occur in any of the other partition outputs? I'm guessing not, based on the nature of your data? I'm trying to understand why shuffling is good to be avoid

Re: When running Hadoop in pseudo-distributed mode, what directory should I use for hadoop.tmp.dir?

2012-10-05 Thread Harsh J
On 0.20.x or 1.x based releases, do not use a file:/// prefix for hadoop.tmp.dir. That won't work. Remove it and things should work, I guess. And yes, for production, either tweak specific configs (like dfs.name.dir, dfs.data.dir, mapred.local.dir, mapred.system.dir (DFS), mapreduce.jobtracker.sta

Re: Chaning Multiple Reducers: Reduce -> Reduce -> Reduce

2012-10-05 Thread Jim Twensky
Hi Harsh, The hidden map operation which is applied to the reduced partition at one stage can generate keys that are outside of the range covered by that particular reducer. I still need to have the many-to-many communication from reduce step k to reduce step k+1. Otherwise, I think the ChainReduc

Re: Multiple Aggregate functions in map reduce program

2012-10-05 Thread Khang Pham
Hi, ideally you want to "scan" through data once and the the (sum,count). One simple solution is write your own map-reduce with key = department, value = new VectorWritable(vector); With vector is an array which array[0] = salary, array[1] = 1. In the reduce phase all you need is to do the aggr

where to download hadoop-1.0.0

2012-10-05 Thread alxsss
Hello, I try to use hbase-0.92.1 which is compatible with hadoop-1.0.0. However, I do not see this version of hadoop in the download page. For example, http://apache.mirrors.pair.com/hadoop/common/ I wondered why it was excluded from the list? Thanks. Alex.

Re: where to download hadoop-1.0.0

2012-10-05 Thread J. Rottinghuis
Any release in the 1.0.x line should be equally compatible, so is there any reason not to use the latest in that line? Cheers, Joep On Fri, Oct 5, 2012 at 12:06 PM, wrote: > Hello, > > I try to use hbase-0.92.1 which is compatible with hadoop-1.0.0. However, > I do not see this version of hado

Re: Impersonating HDFS user

2012-10-05 Thread Chris Nauroth
BTW, additional details on impersonation are here, including information about a piece of configuration required to allow use of doAs. http://hadoop.apache.org/docs/r1.0.3/Secure_Impersonation.html Thank you, --Chris On Fri, Oct 5, 2012 at 7:42 AM, Oleg Zhurakousky wrote: > sorry clicked send

Re: When running Hadoop in pseudo-distributed mode, what directory should I use for hadoop.tmp.dir?

2012-10-05 Thread jeremy p
Thank you, that worked! On Fri, Oct 5, 2012 at 10:58 AM, Harsh J wrote: > On 0.20.x or 1.x based releases, do not use a file:/// prefix for > hadoop.tmp.dir. That won't work. Remove it and things should work, I > guess. > > And yes, for production, either tweak specific configs (like > dfs.name.

Re: Impersonating HDFS user

2012-10-05 Thread Oleg Zhurakousky
Thank ou guys I got everything i needed working On Fri, Oct 5, 2012 at 3:29 PM, Chris Nauroth wrote: > BTW, additional details on impersonation are here, including information > about a piece of configuration required to allow use of doAs. > > http://hadoop.apache.org/docs/r1.0.3/Secure_Impersona