Hello, I'm tring to use MapFile (stored on HDFS) in my reduce task,
which processes some text data.
When I try to initialize MapFile.Reader in reducer configure() method,
app throws NullPointerException, when the same approach is used for each
reduce() method call with the same parameters,
Hello,
sorry my mistake. Problem solved.
On 06/19/2012 03:40 PM, Devaraj k wrote:
Can you share the exception stack trace and piece of code where you are trying
to create?
Thanks
Devaraj
From: Ondřej Klimpera [klimp...@fit.cvut.cz]
Sent: Tuesday
is the problem and Hadoop
doesn't care about it's splitting in proper way.
Thanks any ideas.
On 06/16/2012 11:27 AM, Bejoy KS wrote:
Hi Ondrej
You can use NLineInputFormat with n set to 10.
--Original Message--
From: Ondřej Klimpera
To: common-user@hadoop.apache.org
ReplyTo: common-user
Hello,
I have very small input size (kB), but processing to produce some output
takes several minutes. Is there a way how to say, file has 100 lines, i
need 10 mappers, where each mapper node has to process 10 lines of input
file?
Thanks for advice.
Ondrej Klimpera
wrote:
Hi Ondrej
You can use NLineInputFormat with n set to 10.
--Original Message--
From: Ondřej Klimpera
To: common-user@hadoop.apache.org
ReplyTo: common-user@hadoop.apache.org
Subject: Setting number of mappers according to number of TextInput lines
Sent: Jun 16, 2012 14:31
Hello,
I
Hello,
we're testing application on 8 nodes, where each node has 20GB of local
storage available. What we are trying to achieve is to get more than
20GB to be processed on this cluster.
Is there a way how to distribute the data on the cluster?
There is also one shared NFS storage disk with
Thanks, for your reply. It would be great to mention this in your
tutorial on your web sites. Is the name of the
HADOOP_PREFIX/HOME/INSTALL crucial to Hadoop, or it's just user benefit
to set this variable.
Thanks for reply.
On 06/14/2012 07:46 AM, Harsh J wrote:
Hi Ondřej,
Due to a new
Hello,
you're right. That's exactly what I ment. And your answer is exactly
what I thought. I was just wondering if Hadoop can distribute the data
to other node's local storages if own local space is full.
Thanks
On 06/14/2012 03:38 PM, Harsh J wrote:
Ondřej,
If by processing you mean
Thanks, I'll try.
One more question, I've got few more nodes, which can be added to the
cluster. But how to do that?
If I understand it (according to Hadoop's wiki pages):
1. On master node - edit slaves file and add IP addresses of new nodes
(everything clear)
2. log in to each newly
Hello,
I'd like to ask you how Hadoop splits text input, if it's size is
smaller then HDFS block size.
I'm testing an application, which creates from small input large outputs.
When using NInputSplits input format and setting number of splits in
mapred-conf.xml some results are lost during
Hello, why when running Hadoop, there is always HADOOP_HOME shell
variable being told to be deprecated. How to set installation directory
on cluster nodes, which variable is correct.
Thanks
Ondrej Klimpera
support MultipleOutputs?
Thanks again.
On 04/30/2012 12:32 AM, Bill Graham wrote:
Take a look at the JobClient API. You can use that to get the current
progress of a running job.
On Sunday, April 29, 2012, Ondřej Klimpera wrote:
Hello I'd like to ask you what is the preferred way of getting
Hello I'd like to ask you what is the preferred way of getting running
jobs progress from Java application, that has executed them.
Im using Hadoop 0.20.203, tried job.end.notification.url property that
works well, but as the property name says, it sends only job end
notifications.
What I
Hello, I'd like to ask you if there is a possibility of setting a
timeout for processing one input line of text input in mapper function.
The idea is, that if processing of one line is too long, Hadoop will cut
this process and continue processing next input line.
Thank you for your answer.
Thanks, I'll try to implement it and get you know if it worked.
On 04/18/2012 04:07 PM, Harsh J wrote:
Since you're looking for per-line (and not per-task/file) monitoring,
this is best done by your own application code (a monitoring thread,
etc.).
On Wed, Apr 18, 2012 at 6:09 PM, Ondřej
Thanks for your advise, File.createTempFile() works great, at least in
pseudo-ditributed mode, hope cluster solution will do the same work. You
saved me hours of trying...
On 04/07/2012 11:29 PM, Harsh J wrote:
MapReduce sets mapred.child.tmp for all tasks to be the Task
Attempt's
I will, but deploying application on a cluster is now far away. Just
finishing raw implementation. Cluster tuning is planed in the end of
this month.
Thanks.
On 04/08/2012 09:06 PM, Harsh J wrote:
It will work. Pseudo-distributed mode shouldn't be all that different
from a fully distributed
Klimpera a scris:
And one more question, is it even possible to add a MapFile (as it
consits of index and data file) to Distributed cache?
Thanks
Should be no problem, they are just two files.
On 03/30/2012 01:15 PM, Ondřej Klimpera wrote:
Hello,
I'm not sure what you mean by using map reduce
Hello, I've got one more question, how is seek() (or get()) method
implemented in MapFile.Reader, does it use hashCode, compareTo() or
another mechanism to find a match in MapFile's index.
Thanks for your reply.
Ondrej Klimpera
On 03/29/2012 08:26 PM, Ondřej Klimpera wrote:
Thanks for your
:
Hello Ondrej,
Pe 29.03.2012 18:05, Ondřej Klimpera a scris:
Hello,
I have a MapFile as a product of MapReduce job, and what I need to do
is:
1. If MapReduce produced more spilts as Output, merge them to single
file.
2. Copy this merged MapFile to another HDFS location and use
And one more question, is it even possible to add a MapFile (as it
consits of index and data file) to Distributed cache?
Thanks
On 03/30/2012 01:15 PM, Ondřej Klimpera wrote:
Hello,
I'm not sure what you mean by using map reduce setup()?
If the file is that small you could load it all
Hello,
I have a MapFile as a product of MapReduce job, and what I need to do is:
1. If MapReduce produced more spilts as Output, merge them to single file.
2. Copy this merged MapFile to another HDFS location and use it as a
Distributed cache file for another MapReduce job.
I'm wondering if
());
}
}
I think you can also copy these files to a different location in dfs and then
put into distributed cache.
Deniz
On Mar 29, 2012, at 8:05 AM, Ondřej Klimpera wrote:
Hello,
I have a MapFile as a product of MapReduce job, and what I need to do is:
1. If MapReduce produced more
Hello,
I'm trying to develop an application, where Reducer has to produce
multiple outputs.
In detail I need the Reducer to produce two types of files. Each file
will have different output.
I found in Hadoop, The Definitive Guide, that new API uses only
MultipleOutputs, but working with
I'm using 1.0.0 beta, suppose it was wrong decision to use beta version.
So do you recommend using 0.20.203.X and stick to Hadoop definitive
guide approaches?
Thanks for your reply
On 01/25/2012 01:41 PM, Harsh J wrote:
Oh and btw, do not fear the @deprecated 'Old' API. We have
undeprecated
depricated things in my projects.
Thanks.
On 01/25/2012 01:46 PM, Ondřej Klimpera wrote:
I'm using 1.0.0 beta, suppose it was wrong decision to use beta
version. So do you recommend using 0.20.203.X and stick to Hadoop
definitive guide approaches?
Thanks for your reply
On 01/25/2012 01:41
26 matches
Mail list logo