Re: How to balance reduce job

2013-04-16 Thread Ajay Srivastava
Tariq probably meant distribution of keys from pair emitted by mapper. Partitioner distributes these pairs to different reducers based on key. If data is such that keys are skewed then most of the records may go to same reducer. Regards, Ajay Srivastava On 17-Apr-2013, at 11:08 AM

Re: Cartesian product in hadoop

2013-04-18 Thread Ajay Srivastava
efore hand. If yes, then you need one more pass of dataset1 to identify the keys and store it to use for dataset2. Regards, Ajay Srivastava On 18-Apr-2013, at 3:51 PM, Azuryy Yu wrote: This is not suitable for his large dataset. --Send from my Sony mobile. On Apr 18, 2013 5:58 PM, &q

Re: Cartesian product in hadoop

2013-04-18 Thread Ajay Srivastava
ll go to same iteration of reduce. I forgot to mention in my previous post to write a partitioner too which partitions data based on first part of key. Regards, Ajay Srivastava On 18-Apr-2013, at 4:42 PM, zheyi rong wrote: Hi Ajay Srivastava, Thank your for your reply. Could you please expl

Re: Cartesian product in hadoop

2013-04-18 Thread Ajay Srivastava
The approach which I proposed will have m+n i/o for reading datasets not the (m + n + m*n) and but further i/o due to spills and reading mapper output by reducer will be more as number of tuples coming out of mapper are ( m + m * n). Regards, Ajay Srivastava On 18-Apr-2013, at 5:40 PM

Unexpected problem in creating temporary file

2013-07-19 Thread Ajay Srivastava
reason behind it ? It is causing system to slow down. Before these errors, there were few exceptions with "connection reset by peer" which I guess are harmless. Regards, Ajay Srivastava

Re: Unexpected problem in creating temporary file

2013-07-19 Thread Ajay Srivastava
Any suggestion ? I am stuck. Regards, Ajay Srivastava On 19-Jul-2013, at 5:54 PM, Ajay Srivastava wrote: > Hi, > > I am seeing many such errors on a datanode - > > 2013-07-18 22:10:49,473 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistrat

Only log.index

2013-07-23 Thread Ajay Srivastava
Hi, I see that most of the tasks have only log.index created in /opt/hadoop/logs/userlogs/jobId/task_attempt directory. When does this happen ? Is there a config setting for this OR this is a bug ? Regards, Ajay Srivastava

Re: Only log.index

2013-07-23 Thread Ajay Srivastava
/attempt_201307222115_0188_r_08_0 stdout:0 0 stderr:156 0 syslog:995 166247 Looks like that the log.index is pointing to another attempt directory. Is it doing some kind of optimization ? What is purpose of log.index ? Regards, Ajay Srivastava On 24-Jul-2013, at 11:09 AM, Vinod Kumar Vavilapalli wrote: > &

Re: Only log.index

2013-07-23 Thread Ajay Srivastava
Yes. That explains it and confirms my guess too :-) stderr:156 0 syslog:995 166247 What are these numbers ? Byte offset in corresponding files from where logs of this task starts. Regards, Ajay Srivastava On 24-Jul-2013, at 12:10 PM, Vinod Kumar Vavilapalli wrote: Ah, I should've gu

Non utf-8 chars in input

2012-09-10 Thread Ajay Srivastava
al char in my mapper what should be the correct inputFormat class ? Regards, Ajay Srivastava

Re: Non utf-8 chars in input

2012-09-11 Thread Ajay Srivastava
s > Rekha > > On 11/09/12 12:37 PM, "Joshi, Rekha" wrote: > >> Hi Ajay, >> >> Try SequenceFileAsBinaryInputFormat ? >> >> >> Thanks >> Rekha >> >> On 11/09/12 11:24 AM, "Ajay Srivastava" >> wrote

Re: How to split a sequence file

2012-09-11 Thread Ajay Srivastava
Hi Jason, I am wondering about use case of distributing records on the basis of key to mapper. If possible, could you please share your scenario ? Is it map only job ? Why not distribute records using partitioner and do the processing in reducers ? Regards, Ajay Srivastava On 12-Sep-2012

Too many fetch-failures

2012-12-03 Thread Ajay Srivastava
min. but it's going on and on for hours. Regards, Ajay Srivastava

Re: Too many fetch-failures

2012-12-03 Thread Ajay Srivastava
Thanks Harsh. Problem is resolved. Entry of one of the datanodes was missing from /etc/hosts. After adding the entry, job finished without any problem. Regards, Ajay Srivastava On 04-Dec-2012, at 2:50 AM, Harsh J wrote: > What version/distribution of Hadoop is this? > > On Mon, De

Query about Speculative Execution

2012-12-06 Thread Ajay Srivastava
regular. Regards, Ajay Srivastava

Re: Query about Speculative Execution

2012-12-06 Thread Ajay Srivastava
n to true then the system may spawn another instance of mapper and consider the output of the fast running once or early completing task. Best, Mahesh Balija, Calsoft Labs. On Thu, Dec 6, 2012 at 8:27 PM, Ajay Srivastava mailto:ajay.srivast...@guavus.com>> wrote: Hi, What is the behavior o

io.sort.factor

2013-01-22 Thread Ajay Srivastava
increasing io.sort.mb as well as io.sort.factor will help in better performance. Increasing io.sort.mb helped but changing io.sort.factor (> 10) does not seem to improve/degrade performance of mapred job. Regards, Ajay Srivastava

Re: io.sort.factor

2013-01-22 Thread Ajay Srivastava
Hi Bharat, I am looking at these logs - 2013-01-22 07:35:42,923 INFO org.apache.hadoop.mapred.MapTask: Finished spill 2 The number at the end of string does not go beyond 6. So I assume you are correct. Regards, Ajay Srivastava On 23-Jan-2013, at 12:14 PM, bharath vissapragada wrote: Hi

Spilled records

2013-01-23 Thread Ajay Srivastava
at I can tell mapper not to write final output to disk and reducers fetch the data from mapper's main memory ? Regards, Ajay Srivastava

Re: Need help optimizing reducer

2013-03-04 Thread Ajay Srivastava
Are you using combiner ? If not, that will be first thing to do. On 05-Mar-2013, at 1:27 AM, Austin Chungath wrote: > Hi all, > > I have 1 reducer and I have around 600 thousand unique keys coming to it. The > total data is only around 30 mb. > My logic doesn't allow me to have more than 1 red

Re: How to shuffle (Key,Value) pair from mapper to multiple reducer

2013-03-13 Thread Ajay Srivastava
Emit (key, value) twice from mapper by modifying key as key' = (key, partId) and record becomes (key', value) >From custom partitioner, send record to reducer based on partId. Ignore partId >field in reducer. Regards, Ajay Srivastava On 13-Mar-2013, at 2:29 PM, Vikas Jadhav