Creating MapFile.Reader instance in reducer setup

2012-06-19 Thread Ondřej Klimpera
Hello, I'm tring to use MapFile (stored on HDFS) in my reduce task, which processes some text data. When I try to initialize MapFile.Reader in reducer configure() method, app throws NullPointerException, when the same approach is used for each reduce() method call with the same parameters,

Re: Creating MapFile.Reader instance in reducer setup

2012-06-19 Thread Ondřej Klimpera
Hello, sorry my mistake. Problem solved. On 06/19/2012 03:40 PM, Devaraj k wrote: Can you share the exception stack trace and piece of code where you are trying to create? Thanks Devaraj From: Ondřej Klimpera [klimp...@fit.cvut.cz] Sent: Tuesday

Re: Setting number of mappers according to number of TextInput lines

2012-06-17 Thread Ondřej Klimpera
is the problem and Hadoop doesn't care about it's splitting in proper way. Thanks any ideas. On 06/16/2012 11:27 AM, Bejoy KS wrote: Hi Ondrej You can use NLineInputFormat with n set to 10. --Original Message-- From: Ondřej Klimpera To: common-user@hadoop.apache.org ReplyTo: common-user

Setting number of mappers according to number of TextInput lines

2012-06-16 Thread Ondřej Klimpera
Hello, I have very small input size (kB), but processing to produce some output takes several minutes. Is there a way how to say, file has 100 lines, i need 10 mappers, where each mapper node has to process 10 lines of input file? Thanks for advice. Ondrej Klimpera

Re: Setting number of mappers according to number of TextInput lines

2012-06-16 Thread Ondřej Klimpera
wrote: Hi Ondrej You can use NLineInputFormat with n set to 10. --Original Message-- From: Ondřej Klimpera To: common-user@hadoop.apache.org ReplyTo: common-user@hadoop.apache.org Subject: Setting number of mappers according to number of TextInput lines Sent: Jun 16, 2012 14:31 Hello, I

Dealing with low space cluster

2012-06-14 Thread Ondřej Klimpera
Hello, we're testing application on 8 nodes, where each node has 20GB of local storage available. What we are trying to achieve is to get more than 20GB to be processed on this cluster. Is there a way how to distribute the data on the cluster? There is also one shared NFS storage disk with

Re: HADOOP_HOME depracated

2012-06-14 Thread Ondřej Klimpera
Thanks, for your reply. It would be great to mention this in your tutorial on your web sites. Is the name of the HADOOP_PREFIX/HOME/INSTALL crucial to Hadoop, or it's just user benefit to set this variable. Thanks for reply. On 06/14/2012 07:46 AM, Harsh J wrote: Hi Ondřej, Due to a new

Re: Dealing with low space cluster

2012-06-14 Thread Ondřej Klimpera
Hello, you're right. That's exactly what I ment. And your answer is exactly what I thought. I was just wondering if Hadoop can distribute the data to other node's local storages if own local space is full. Thanks On 06/14/2012 03:38 PM, Harsh J wrote: Ondřej, If by processing you mean

Re: Dealing with low space cluster

2012-06-14 Thread Ondřej Klimpera
Thanks, I'll try. One more question, I've got few more nodes, which can be added to the cluster. But how to do that? If I understand it (according to Hadoop's wiki pages): 1. On master node - edit slaves file and add IP addresses of new nodes (everything clear) 2. log in to each newly

How Hadoop splits TextInput?

2012-06-13 Thread Ondřej Klimpera
Hello, I'd like to ask you how Hadoop splits text input, if it's size is smaller then HDFS block size. I'm testing an application, which creates from small input large outputs. When using NInputSplits input format and setting number of splits in mapred-conf.xml some results are lost during

HADOOP_HOME depracated

2012-06-13 Thread Ondřej Klimpera
Hello, why when running Hadoop, there is always HADOOP_HOME shell variable being told to be deprecated. How to set installation directory on cluster nodes, which variable is correct. Thanks Ondrej Klimpera

Re: Getting job progress in java application

2012-04-30 Thread Ondřej Klimpera
support MultipleOutputs? Thanks again. On 04/30/2012 12:32 AM, Bill Graham wrote: Take a look at the JobClient API. You can use that to get the current progress of a running job. On Sunday, April 29, 2012, Ondřej Klimpera wrote: Hello I'd like to ask you what is the preferred way of getting

Getting job progress in java application

2012-04-29 Thread Ondřej Klimpera
Hello I'd like to ask you what is the preferred way of getting running jobs progress from Java application, that has executed them. Im using Hadoop 0.20.203, tried job.end.notification.url property that works well, but as the property name says, it sends only job end notifications. What I

Setting a timeout for one Map() input processing

2012-04-18 Thread Ondřej Klimpera
Hello, I'd like to ask you if there is a possibility of setting a timeout for processing one input line of text input in mapper function. The idea is, that if processing of one line is too long, Hadoop will cut this process and continue processing next input line. Thank you for your answer.

Re: Setting a timeout for one Map() input processing

2012-04-18 Thread Ondřej Klimpera
Thanks, I'll try to implement it and get you know if it worked. On 04/18/2012 04:07 PM, Harsh J wrote: Since you're looking for per-line (and not per-task/file) monitoring, this is best done by your own application code (a monitoring thread, etc.). On Wed, Apr 18, 2012 at 6:09 PM, Ondřej

Re: Creating and working with temporary file in a map() function

2012-04-08 Thread Ondřej Klimpera
Thanks for your advise, File.createTempFile() works great, at least in pseudo-ditributed mode, hope cluster solution will do the same work. You saved me hours of trying... On 04/07/2012 11:29 PM, Harsh J wrote: MapReduce sets mapred.child.tmp for all tasks to be the Task Attempt's

Re: Creating and working with temporary file in a map() function

2012-04-08 Thread Ondřej Klimpera
I will, but deploying application on a cluster is now far away. Just finishing raw implementation. Cluster tuning is planed in the end of this month. Thanks. On 04/08/2012 09:06 PM, Harsh J wrote: It will work. Pseudo-distributed mode shouldn't be all that different from a fully distributed

Re: Working with MapFiles

2012-04-02 Thread Ondřej Klimpera
Klimpera a scris: And one more question, is it even possible to add a MapFile (as it consits of index and data file) to Distributed cache? Thanks Should be no problem, they are just two files. On 03/30/2012 01:15 PM, Ondřej Klimpera wrote: Hello, I'm not sure what you mean by using map reduce

Re: Working with MapFiles

2012-03-30 Thread Ondřej Klimpera
Hello, I've got one more question, how is seek() (or get()) method implemented in MapFile.Reader, does it use hashCode, compareTo() or another mechanism to find a match in MapFile's index. Thanks for your reply. Ondrej Klimpera On 03/29/2012 08:26 PM, Ondřej Klimpera wrote: Thanks for your

Re: Working with MapFiles

2012-03-30 Thread Ondřej Klimpera
: Hello Ondrej, Pe 29.03.2012 18:05, Ondřej Klimpera a scris: Hello, I have a MapFile as a product of MapReduce job, and what I need to do is: 1. If MapReduce produced more spilts as Output, merge them to single file. 2. Copy this merged MapFile to another HDFS location and use

Re: Working with MapFiles

2012-03-30 Thread Ondřej Klimpera
And one more question, is it even possible to add a MapFile (as it consits of index and data file) to Distributed cache? Thanks On 03/30/2012 01:15 PM, Ondřej Klimpera wrote: Hello, I'm not sure what you mean by using map reduce setup()? If the file is that small you could load it all

Working with MapFiles

2012-03-29 Thread Ondřej Klimpera
Hello, I have a MapFile as a product of MapReduce job, and what I need to do is: 1. If MapReduce produced more spilts as Output, merge them to single file. 2. Copy this merged MapFile to another HDFS location and use it as a Distributed cache file for another MapReduce job. I'm wondering if

Re: Working with MapFiles

2012-03-29 Thread Ondřej Klimpera
()); } } I think you can also copy these files to a different location in dfs and then put into distributed cache. Deniz On Mar 29, 2012, at 8:05 AM, Ondřej Klimpera wrote: Hello, I have a MapFile as a product of MapReduce job, and what I need to do is: 1. If MapReduce produced more

Using MultipleOutputs with new API (v1.0)

2012-01-25 Thread Ondřej Klimpera
Hello, I'm trying to develop an application, where Reducer has to produce multiple outputs. In detail I need the Reducer to produce two types of files. Each file will have different output. I found in Hadoop, The Definitive Guide, that new API uses only MultipleOutputs, but working with

Re: Using MultipleOutputs with new API (v1.0)

2012-01-25 Thread Ondřej Klimpera
I'm using 1.0.0 beta, suppose it was wrong decision to use beta version. So do you recommend using 0.20.203.X and stick to Hadoop definitive guide approaches? Thanks for your reply On 01/25/2012 01:41 PM, Harsh J wrote: Oh and btw, do not fear the @deprecated 'Old' API. We have undeprecated

Re: Using MultipleOutputs with new API (v1.0)

2012-01-25 Thread Ondřej Klimpera
depricated things in my projects. Thanks. On 01/25/2012 01:46 PM, Ondřej Klimpera wrote: I'm using 1.0.0 beta, suppose it was wrong decision to use beta version. So do you recommend using 0.20.203.X and stick to Hadoop definitive guide approaches? Thanks for your reply On 01/25/2012 01:41