Hi,
The provided example records are perfect. With that I doubt there will be
any confusion about what kind of data is available and it should be
manipulated. However, "the output is not coming as desired" is vague. It's
hard to say why you are not getting your expected result without a bit more
i
The answer is really the same. Your problem is just using a goofy
representation for negative numbers (after all, negative numbers are a
relatively new concept in accounting).
You still need to use the account number as the key and the date as a sort
key. Many financial institutions also process
Thanks for all your responses. As suggested will go through the
documentation once again.
But just to clarify, this is not my first map-reduce program. I've
already written a map-reduce for our product which does filtering and
transformation of the financial data. This is a new requirement we'
Hi,
Roughly, this information will be available under the 'Hadoop map task
list' page in the Mapreduce web ui (in Hadoop-1.0, which I am assuming is
what you are using). You can reach this page by selecting the running tasks
link from the job information page. The page has a table that lists all t
Hi,
Could you please share your setup details - i.e. how many slaves, how many
datanodes and tasktrackers. Also, the configuration - in particular
hdfs-site.xml ?
To answer your question: the datanode address is picked up from
hdfs-site.xml, or hdfs-default.xml from the property dfs.datanode.addr
Hi,
Didn't check everything. But found this in the mapred-site.xml:
mapred.job.tracker
hdfs://10.99.42.9:8021/
true
The value shouldn't be a HDFS URL. Can you please fix this and try ?
On Thu, Oct 4, 2012 at 12:32 PM, Ajit Kumar Shreevastava <
ajit.shreevast...@hcl.com> wrote:
> Hi All,**
Can you try deploying the JDK to a different directory that doesn't contain
spaces and setting JAVA_HOME to that? There is a known problem when the
JDK is deployed in a directory containing spaces (like C:\Program
Files\...).
Thank you,
--Chris
On Thu, Oct 4, 2012 at 9:38 AM, Sujit Dhamale wrote
Hello,
I have a question about how to find which file takes the longest time to
process and how to assign more mappers to process that particular file.
Currently, about three mapper takes about five times more time to complete. So,
how can I detect which specific files are those three mapper ar
I indeed didn't catch the cumulative sum part. Then I guess it begs for
what-is-often-called-a-secondary-sort, if you want to compute different
cumulative sums during the same job. It can be more or less easy to
implement depending on which API/library/tool you are using. Ted comments
on performanc
I did the cumulative sum in the HIVE UDF, as one of the project for my employer.
1) You need to decide the grouping elements for your cumulative. For example,
an account, a department etc. In the mapper, combine these information as your
omit key.2) If you don't have any grouping requirement, yo
Bertrand is almost right.
The only difference is that the original poster asked about cumulative sum.
This can be done in reducer exactly as Bertrand described except for two
points that make it different from word count:
a) you can't use a combiner
b) the output of the program is as large as t
Hi Sadak
If you are issuing copyFromLocal from a client/edge node you can copy the files
available in the client's lfs to hdfs in cluster. The client/edge node could be
a box that has all the hadoop jars and config files exactly same as that of the
cluster and the cluster nodes should be access
I use -put -get commands to bring files in/our of HDFS from/to my home
directory on EC2. Then use WinSCP to download files to my laptop.
Andy Kartashov
MPAC
Architecture R&D, Co-op
1340 Pickering Parkway, Pickering, L1V 0C4
* Phone : (905) 837 6269
* Mobile: (416) 722 1787
andy.kartas...@mpac.ca<
Hi,
It sounds like a
1) group information by account
2) compute sum per account
If that not the case, you should precise a bit more about your context.
This computing looks like a small variant of wordcount. If you do not know
how to do it, you should read books about Hadoop MapReduce and/or onl
Thanks, but hadoop dfs is deprecated, hdfs is the recommended way. In
any case, the result is exactly the same.
use bin/hadoop dfs -lsr har:///sample/test.har
not dfs -ls -R har:///sample/test.har
On Tue, Oct 2, 2012 at 11:42 AM, Alexander Hristov
mailto:al...@planetalia.com>> wrote:
guys i have hadoop installled in a remote box ... does copyFromLocal method
copies data from tht local box only wht if i have to copy data from uses
desktop pc(for example E drive) thru my my web application will i have to
first copy data to tht remote box using some java code then use
copyFromLoca
use bin/hadoop dfs -lsr har:///sample/test.har
not dfs -ls -R har:///sample/test.har
On Tue, Oct 2, 2012 at 11:42 AM, Alexander Hristov wrote:
> Hello
>
> I'm trying to test the Hadoop archive functionality under 0.23 and I can't
> get it working.
>
> I have in HDFS a /test folder with seve
Hi,
I have a file which has some financial transaction data. Each
transaction will have amount and a credit/debit indicator.
I want to write a mapreduce program which computes cumulative credit &
debit amounts at each record
and append these values to the record before dumping into the output
On 3 October 2012 18:38, Koert Kuipers wrote:
> Why does GenericOptionParser also remove -Dprop=value options (without
> setting the system properties)? Those are not hadoop options but java
> options.
For the same reason Ant, maven and other tools do -they have a notion that
you are setting pa
On 3 October 2012 12:21, Ben Rycroft wrote:
> Hi all,
>
> I have a jar that uses the Hadoop API to launch various remote mapreduce
> jobs (ie, im not using the command-line to initiate the job). The service
> jar that executes the various jobs is built with maven's
> "jar-with-dependencies".
>
>
20 matches
Mail list logo