Re: Cumulative value using mapreduce

2012-10-04 Thread Bertrand Dechoux
Hi, The provided example records are perfect. With that I doubt there will be any confusion about what kind of data is available and it should be manipulated. However, "the output is not coming as desired" is vague. It's hard to say why you are not getting your expected result without a bit more i

Re: Cumulative value using mapreduce

2012-10-04 Thread Ted Dunning
The answer is really the same. Your problem is just using a goofy representation for negative numbers (after all, negative numbers are a relatively new concept in accounting). You still need to use the account number as the key and the date as a sort key. Many financial institutions also process

Re: Cumulative value using mapreduce

2012-10-04 Thread Sarath
Thanks for all your responses. As suggested will go through the documentation once again. But just to clarify, this is not my first map-reduce program. I've already written a map-reduce for our product which does filtering and transformation of the financial data. This is a new requirement we'

Re: Question about how to find which file takes the longest time to process and how to assign more mappers to process that particular file

2012-10-04 Thread Hemanth Yamijala
Hi, Roughly, this information will be available under the 'Hadoop map task list' page in the Mapreduce web ui (in Hadoop-1.0, which I am assuming is what you are using). You can reach this page by selecting the running tasks link from the job information page. The page has a table that lists all t

Re: Submitting a job to a remote cluster

2012-10-04 Thread Hemanth Yamijala
Hi, Could you please share your setup details - i.e. how many slaves, how many datanodes and tasktrackers. Also, the configuration - in particular hdfs-site.xml ? To answer your question: the datanode address is picked up from hdfs-site.xml, or hdfs-default.xml from the property dfs.datanode.addr

Re: hadoop issue on distributed cluster

2012-10-04 Thread Hemanth Yamijala
Hi, Didn't check everything. But found this in the mapred-site.xml: mapred.job.tracker hdfs://10.99.42.9:8021/ true The value shouldn't be a HDFS URL. Can you please fix this and try ? On Thu, Oct 4, 2012 at 12:32 PM, Ajit Kumar Shreevastava < ajit.shreevast...@hcl.com> wrote: > Hi All,**

Re: In Window (Cygwin) Java File not found

2012-10-04 Thread Chris Nauroth
Can you try deploying the JDK to a different directory that doesn't contain spaces and setting JAVA_HOME to that? There is a known problem when the JDK is deployed in a directory containing spaces (like C:\Program Files\...). Thank you, --Chris On Thu, Oct 4, 2012 at 9:38 AM, Sujit Dhamale wrote

Question about how to find which file takes the longest time to process and how to assign more mappers to process that particular file

2012-10-04 Thread Huanchen Zhang
Hello, I have a question about how to find which file takes the longest time to process and how to assign more mappers to process that particular file. Currently, about three mapper takes about five times more time to complete. So, how can I detect which specific files are those three mapper ar

Re: Cumulative value using mapreduce

2012-10-04 Thread Bertrand Dechoux
I indeed didn't catch the cumulative sum part. Then I guess it begs for what-is-often-called-a-secondary-sort, if you want to compute different cumulative sums during the same job. It can be more or less easy to implement depending on which API/library/tool you are using. Ted comments on performanc

RE: Cumulative value using mapreduce

2012-10-04 Thread java8964 java8964
I did the cumulative sum in the HIVE UDF, as one of the project for my employer. 1) You need to decide the grouping elements for your cumulative. For example, an account, a department etc. In the mapper, combine these information as your omit key.2) If you don't have any grouping requirement, yo

Re: Cumulative value using mapreduce

2012-10-04 Thread Ted Dunning
Bertrand is almost right. The only difference is that the original poster asked about cumulative sum. This can be done in reducer exactly as Bertrand described except for two points that make it different from word count: a) you can't use a combiner b) the output of the program is as large as t

Re: copyFromLocal

2012-10-04 Thread Bejoy KS
Hi Sadak If you are issuing copyFromLocal from a client/edge node you can copy the files available in the client's lfs to hdfs in cluster. The client/edge node could be a box that has all the hadoop jars and config files exactly same as that of the cluster and the cluster nodes should be access

RE: copyFromLocal

2012-10-04 Thread Kartashov, Andy
I use -put -get commands to bring files in/our of HDFS from/to my home directory on EC2. Then use WinSCP to download files to my laptop. Andy Kartashov MPAC Architecture R&D, Co-op 1340 Pickering Parkway, Pickering, L1V 0C4 * Phone : (905) 837 6269 * Mobile: (416) 722 1787 andy.kartas...@mpac.ca<

Re: Cumulative value using mapreduce

2012-10-04 Thread Bertrand Dechoux
Hi, It sounds like a 1) group information by account 2) compute sum per account If that not the case, you should precise a bit more about your context. This computing looks like a small variant of wordcount. If you do not know how to do it, you should read books about Hadoop MapReduce and/or onl

Re: Hadoop Archives under 0.23

2012-10-04 Thread Alexander Hristov
Thanks, but hadoop dfs is deprecated, hdfs is the recommended way. In any case, the result is exactly the same. use bin/hadoop dfs -lsr har:///sample/test.har not dfs -ls -R har:///sample/test.har On Tue, Oct 2, 2012 at 11:42 AM, Alexander Hristov mailto:al...@planetalia.com>> wrote:

copyFromLocal

2012-10-04 Thread Visioner Sadak
guys i have hadoop installled in a remote box ... does copyFromLocal method copies data from tht local box only wht if i have to copy data from uses desktop pc(for example E drive) thru my my web application will i have to first copy data to tht remote box using some java code then use copyFromLoca

Re: Hadoop Archives under 0.23

2012-10-04 Thread Visioner Sadak
use bin/hadoop dfs -lsr har:///sample/test.har not dfs -ls -R har:///sample/test.har On Tue, Oct 2, 2012 at 11:42 AM, Alexander Hristov wrote: > Hello > > I'm trying to test the Hadoop archive functionality under 0.23 and I can't > get it working. > > I have in HDFS a /test folder with seve

Cumulative value using mapreduce

2012-10-04 Thread Sarath
Hi, I have a file which has some financial transaction data. Each transaction will have amount and a credit/debit indicator. I want to write a mapreduce program which computes cumulative credit & debit amounts at each record and append these values to the record before dumping into the output

Re: GenericOptionsParser

2012-10-04 Thread Steve Loughran
On 3 October 2012 18:38, Koert Kuipers wrote: > Why does GenericOptionParser also remove -Dprop=value options (without > setting the system properties)? Those are not hadoop options but java > options. For the same reason Ant, maven and other tools do -they have a notion that you are setting pa

Re: Lib conflicts

2012-10-04 Thread Steve Loughran
On 3 October 2012 12:21, Ben Rycroft wrote: > Hi all, > > I have a jar that uses the Hadoop API to launch various remote mapreduce > jobs (ie, im not using the command-line to initiate the job). The service > jar that executes the various jobs is built with maven's > "jar-with-dependencies". > >