Re: NullPointerException (Text.java:388)

2010-10-15 Thread Vitaliy Semochkin
opening a JIRA for this? Thanks,  Cos On Thu, Oct 14, 2010 at 02:42PM, Vitaliy Semochkin wrote: Hi, during map phase I recieved following expcetion java.lang.NullPointerException       at org.apache.hadoop.io.Text.encode(Text.java:388)       at org.apache.hadoop.io.Text.encode(Text.java:369

Re: how to set diffent VM parameters for mappers and reducers?

2010-10-08 Thread Vitaliy Semochkin
-site.xml BTW, combiner can been run both in map side and reduce side On Tue, Oct 5, 2010 at 8:59 PM, Vitaliy Semochkin vitaliy...@gmail.com wrote: Hello, I have mappers that do not need much ram but combiners and reducers need a lot. Is it possible to set different VM parameters

Re: how to set diffent VM parameters for mappers and reducers?

2010-10-07 Thread Vitaliy Semochkin
Hi, I tried using mapred.map.child.java.opts and mapred.reduce.child.java.opts but looks like hadoop-0.20.2 ingnores it. On which version have you seen it working? Regards, Vitaliy S On Tue, Oct 5, 2010 at 5:14 PM, Alejandro Abdelnur t...@cloudera.com wrote: The following 2 properties

how to set diffent VM parameters for mappers and reducers?

2010-10-05 Thread Vitaliy Semochkin
Hello, I have mappers that do not need much ram but combiners and reducers need a lot. Is it possible to set different VM parameters for mappers and reducers? PS Often I face interesting problem, on same set of data I recieve I have java.lang.OutOfMemoryError: Java heap space in combiner but

Re: how to set diffent VM parameters for mappers and reducers?

2010-10-05 Thread Vitaliy Semochkin
I'm using apache hadoop-0.20.2 - the recent version i found in maven central repo. Regards, Vitaliy S On Tue, Oct 5, 2010 at 5:02 PM, Michael Segel michael_se...@hotmail.com wrote: Hi, You don't say which version of Hadoop you are using. Going from memory, I believe in the CDH3 release from

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2010-09-27 Thread Vitaliy Semochkin
Hi, [..]if more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, an OutOfMemoryError will be thrown. This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap

Re: Ivy

2010-09-04 Thread Vitaliy Semochkin
When the release engenier will publish jars to public repo, will he be so kind to publish sources and java docs as well. With maven it can be done with a single command. If there will be any dificulties with it I can assist. Thanks In Advance, Vitaliy S On Fri, Sep 3, 2010 at 9:46 PM, Tom White

Re: Do I need to write a RawComparator if my custom writable is not used as a Key?

2010-09-03 Thread Vitaliy Semochkin
Just as I expected, thank you very much ;) On Thu, Sep 2, 2010 at 6:08 PM, Owen O'Malley owen.omal...@gmail.com wrote: No, RawComparator is only needed for Keys. -- Owen On Sep 2, 2010, at 3:35, Vitaliy Semochkin vitaliy...@gmail.com wrote: Hello, Do I need to write a  RawComparator

Do I need to write a RawComparator if my custom writable is not used as a Key?

2010-09-02 Thread Vitaliy Semochkin
Hello, Do I need to write a RawComparator if my custom writable is not used as a Key to improve performance? Regards, Vitaliy S

Re: Error with Heap Space.

2010-08-16 Thread Vitaliy Semochkin
Hello Rahul, Is the heap error due to the number of threads or more intermediate data getting generated? Thread itself do not comsume a lot of memory. I think your problem is in intermediate data. However in this case I thought it would be a OutOfMemory exception in M/R task rather than hadoop

Re: Error with Heap Space.

2010-08-16 Thread Vitaliy Semochkin
, Aug 16, 2010 at 2:56 PM, Vitaliy Semochkin vitaliy...@gmail.comwrote: Hello Rahul, Is the heap error due to the number of threads or more intermediate data getting generated? Thread itself do not comsume a lot of memory. I think your problem is in intermediate data. However in this case I

Re: what affects number of reducers launched by hadoop?

2010-07-29 Thread Vitaliy Semochkin
then configure this between 25 and 36 On Wed, Jul 28, 2010 at 3:24 PM, Vitaliy Semochkin vitaliy...@gmail.com wrote: Hi, in my cluster mapred.tasktracker.reduce.tasks.maximum = 4 however during monitoring the job in job tracker I see only 1 reducer working first it is reduce

what affects number of reducers launched by hadoop?

2010-07-28 Thread Vitaliy Semochkin
Hi, in my cluster mapred.tasktracker.reduce.tasks.maximum = 4 however during monitoring the job in job tracker I see only 1 reducer working first it is reduce copy - can someone please explain what does this mean? after it is reduce reduce when I set the number of reduce tasks for a job

Re: How to control the number of map tasks for each nodes?

2010-07-22 Thread Vitaliy Semochkin
On Thu, Jul 22, 2010 at 1:07 AM, Allen Wittenauer awittena...@linkedin.comwrote: On Jul 21, 2010, at 9:17 AM, Vitaliy Semochkin wrote: might I ask how did you come to such result? In my cluster I use number of mappers and reducers twice more than I have cpu*cores This is probably

Distributed Updateable Cache

2010-07-22 Thread Vitaliy Semochkin
Hi, I need to do calculations that would benefit from storing information in distributed updateable cache. What are best practices for such things in hadoop? PS In case there is no good solution for my problem, here are details and ideas I have. I'm going to count unique visitors of a site

Re: How to control the number of map tasks for each nodes?

2010-07-21 Thread Vitaliy Semochkin
be no more than 8*2-2=14. Best Regards, Carp 2010/7/8 Vitaliy Semochkin vitaliy...@gmail.com Hi, in mapred-site.xml you should place property namemapred.tasktracker.map.tasks.maximum/name value8/value descriptionthe number of available cores on the tasktracker machines

Re: How to access Reporter in new API?

2010-07-09 Thread Vitaliy Semochkin
into the context object in the new API. http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Mapper.Context.html http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Reducer.Context.html On Thu, Jul 8, 2010 at 3:40 AM, Vitaliy Semochkin

Re: How to control the number of map tasks for each nodes?

2010-07-08 Thread Vitaliy Semochkin
Hi, in mapred-site.xml you should place property namemapred.tasktracker.map.tasks.maximum/name value8/value descriptionthe number of available cores on the tasktracker machines for map tasks /description /property property namemapred.tasktracker.reduce.tasks.maximum/name

How to access Reporter in new API?

2010-07-08 Thread Vitaliy Semochkin
Hi, I'm using Mapper interface from new Hadoop API. How to access Reporter instance in new API? PS If someone knows any article on logging and problem reporting in Hadoop please post a link here. Thanks in Advance, Vitaliy S

Re: why my Reduce Class does not work?

2010-07-05 Thread Vitaliy Semochkin
...@gmail.com wrote: You need @Override on your reduce method. Right now you are getting the identity reduce method. On 7/4/10, Vitaliy Semochkin vitaliy...@gmail.com wrote: Hi, I rewritten WordCount sample to use new Hadoop API however my reduce task doesn't launch. the result file always

why my Reduce Class does not work?

2010-07-04 Thread Vitaliy Semochkin
Hi, I rewritten WordCount sample to use new Hadoop API however my reduce task doesn't launch. the result file always looks like some_word 1 some_word 1 another_word 1 another_word 1 ... Here is the code: import java.io.IOException; import java.util.StringTokenizer; import

Re: Question about disk space allocation in hadoop

2010-06-30 Thread Vitaliy Semochkin
set dfs.datanode.du.reserved to amount of bytes you want to reserver for not HDFS usage. PS for search convenience IMHO better post such questions to hdfs-u...@hadoop.apache.org ;-) Regards, Vitaliy S On Tue, Jun 29, 2010 at 8:32 AM, Yu Li car...@gmail.com wrote: Hi all, As we all know,

masters/slaves files content

2010-06-21 Thread Vitaliy Semochkin
Hi, In default installation hadoop masters and slaves files contain localhost. Am I correct that masters file contains list of SECONDARY namenodes? If so, will localhost node try to start secondary namenode even if it already have one? More over will datanodes try to contact itself in order to