Thanks for all your responses. As suggested will go through the
documentation once again.
But just to clarify, this is not my first map-reduce program. I've
already written a map-reduce for our product which does filtering and
transformation of the financial data. This is a new requirement we've
got. I have also did the logic of calculating the cumulative sums. But
the output is not coming as desired and I feel I'm not doing it right
way and missing something. So thought of taking a quick help from the
mailing list.
As an example, say we have records as below -
Txn ID
Txn Date
Cr/Dr Indicator
Amount
1001
9/22/2012
CR
1000
1002
9/25/2012
DR
500
1003
10/1/2012
DR
1500
1004
10/4/2012
CR
2000
When this file passed the logic should append the below 2 columns to the
output for each record above -
CR Cumulative Amount
DR Cumulative Amount
1000
0
1000
500
1000
2000
3000
2000
Hope the problem is clear now. Please provide your suggestions on the
approach to the solution.
Regards,
Sarath.
On Friday 05 October 2012 02:51 AM, Bertrand Dechoux wrote:
I indeed didn't catch the cumulative sum part. Then I guess it begs
for what-is-often-called-a-secondary-sort, if you want to compute
different cumulative sums during the same job. It can be more or less
easy to implement depending on which API/library/tool you are using.
Ted comments on performance are spot on.
Regards
Bertrand
On Thu, Oct 4, 2012 at 9:02 PM, java8964 java8964
<java8...@hotmail.com <mailto:java8...@hotmail.com>> wrote:
I did the cumulative sum in the HIVE UDF, as one of the project
for my employer.
1) You need to decide the grouping elements for your cumulative.
For example, an account, a department etc. In the mapper, combine
these information as your omit key.
2) If you don't have any grouping requirement, you just want a
cumulative sum for all your data, then send all the data to one
common key, so they will all go to the same reducer.
3) When you calculate the cumulative sum, does the output need to
have a sorting order? If so, you need to do the 2nd sorting, so
the data will be sorted as the order you want in the reducer.
4) In the reducer, just do the sum, omit every value per original
record (Not per key).
I will suggest you do this in the UDF of HIVE, as it is much easy,
if you can build a HIVE schema on top of your data.
Yong
------------------------------------------------------------------------
From: tdunn...@maprtech.com <mailto:tdunn...@maprtech.com>
Date: Thu, 4 Oct 2012 18:52:09 +0100
Subject: Re: Cumulative value using mapreduce
To: user@hadoop.apache.org <mailto:user@hadoop.apache.org>
Bertrand is almost right.
The only difference is that the original poster asked about
cumulative sum.
This can be done in reducer exactly as Bertrand described except
for two points that make it different from word count:
a) you can't use a combiner
b) the output of the program is as large as the input so it will
have different performance characteristics than aggregation
programs like wordcount.
Bertrand's key recommendation to go read a book is the most
important advice.
On Thu, Oct 4, 2012 at 5:20 PM, Bertrand Dechoux
<decho...@gmail.com <mailto:decho...@gmail.com>> wrote:
Hi,
It sounds like a
1) group information by account
2) compute sum per account
If that not the case, you should precise a bit more about your
context.
This computing looks like a small variant of wordcount. If you
do not know how to do it, you should read books about Hadoop
MapReduce and/or online tutorial. Yahoo's is old but still a
nice read to begin with :
http://developer.yahoo.com/hadoop/tutorial/
Regards,
Bertrand
On Thu, Oct 4, 2012 at 3:58 PM, Sarath
<sarathchandra.jos...@algofusiontech.com
<mailto:sarathchandra.jos...@algofusiontech.com>> wrote:
Hi,
I have a file which has some financial transaction data.
Each transaction will have amount and a credit/debit
indicator.
I want to write a mapreduce program which computes
cumulative credit & debit amounts at each record
and append these values to the record before dumping into
the output file.
Is this possible? How can I achieve this? Where should i
put the logic of computing the cumulative values?
Regards,
Sarath.
--
Bertrand Dechoux
--
Bertrand Dechoux