Re: splittable vs seekable compressed formats

2013-05-24 Thread Rahul Bhattacharjee
Yeah , I think John meant seeking to record boundaries. Thanks, Rahul On Fri, May 24, 2013 at 12:22 PM, Harsh J wrote: > SequenceFiles should be seekable provided you know/manage their sync > points during writes I think. With LZO this may be non-trivial. > > On Thu, May 23, 2013 at 11:01 PM,

Abort a job when a counter reaches to a threshold

2013-05-24 Thread abhinav gupta
Hi, While running a map-reduce job, that has only mappers, I have a counter that counts the number of failed documents .And after all the mappers are done, I want the job to fail if the total number of failed documents are above a fixed fraction. ( I need it in the end because I don't know the

Re: Abort a job when a counter reaches to a threshold

2013-05-24 Thread Harsh J
Yes there is a job level end-point upon success via OutputCommitter: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/OutputCommitter.html#commitJob(org.apache.hadoop.mapreduce.JobContext) On Fri, May 24, 2013 at 1:13 PM, abhinav gupta wrote: > Hi, > > While running a map-red

Re: pauses during startup (maybe network related?)

2013-05-24 Thread Ted
just an FYI in case anyone finds this thread in a web search I just edited my /etc/hosts file and added a mapping of 127.0.0.1 and everything starts up almost instantly, the difference is night and day :) On 5/24/13, Harsh J wrote: > You are spot on about the DNS lookup slowing thing

how to stop the specified client computer OS to connect hadoop using super user

2013-05-24 Thread 麦树荣
hi, all Our hadoop is started by user "hadoop",and the user "hadoop" is super user. Therefore, the hadoop client from other computer can manipulate HDFS using super user "hadoop" as long as the client computer OS has the user "hadoop". Are there any idea to stop the specified client computer OS

Re:how to stop the specified client computer OS to connect hadoopu

2013-05-24 Thread lxw
you can use iptables.. -- 原始邮件 -- 发件人: "麦树荣" ; 发送时间: 2013年5月24日(星期五) 17:27 收件人: "user@hadoop.apache.org" ; 主题: how to stop the specified client computer OS to connect hadoopusing super user hi, all Our hadoop is started by user "hadoop",and the u

Re: diff between these 2 dirs

2013-05-24 Thread Sai Sai
Just wondering if someone can explain what is the diff between these 2 dirs: Contents of directory /home/satish/work/mapred/staging/satish/.staging and this dir: /hadoop/mapred/system Thanks Sai

Re: Hint on EOFException's on datanodes

2013-05-24 Thread Azuryy Yu
maybe network issue, datanode received an incomplete packet. --Send from my Sony mobile. On May 24, 2013 1:39 PM, "Stephen Boesch" wrote: > > On a smallish (10 node) cluster with only 2 mappers per node after a few > minutes EOFExceptions are cropping up on the datanodes: an example is shown > b

RE: how to stop the specified client computer OS to connect hadoop using super user

2013-05-24 Thread zangxiangyu
Hi. why not try iptablesJ ,which in fact with no relation with hadoop. Add all hadoop nodes and client in whitelist,add XX to blacklist if only as you will “ stop the specified client computer OS to connect hadoop”,simply create one iptable rule. For long time planning,Kerberos is a must. From:

Please help me with heartbeat storm

2013-05-24 Thread Eremikhin Alexey
Hi all, I have 29 servers hadoop cluster in almost default configuration. After installing Hadoop 1.0.4 I've noticed that JT and some TT waste CPU. I started stracing its behaviour and found that some TT send heartbeats in an unlimited ways. It means hundreds in a second. Daemon restart solves

Re: Reducer that outputs no key

2013-05-24 Thread Something Something
You can ignore this for now. I was able to get merging of files to work under Hadoop Streaming by using the following 2 properties: -mapper "cut -f2-" -Dmapred.reduce.tasks=0 On Fri, May 24, 2013 at 12:55 AM, Something Something < mailinglist...@gmail.com> wrote: > Hello, > > Trying to use Had

Error while using the Hadoop Streaming

2013-05-24 Thread Adamantios Corais
I tried this nice example: http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ The python scripts work pretty fine from my laptop (through terminal), but they don't when I execute them on the CDH3 (Pseudo-Distributed Mode). Any ideas? hadoop jar /home/yyy/Dropbox

Re: Child Error

2013-05-24 Thread Jim Twensky
Hi again, in addition to my previous post, I was able to get some error logs from the task tracker/data node this morning and looks like it might be a jetty issue: 2013-05-23 19:59:20,595 WARN org.apache.hadoop.mapred.TaskLog: Failed to retrieve stdout log for task: attempt_201305231647_0007_m_001

Re: Error while using the Hadoop Streaming

2013-05-24 Thread Jitendra Yadav
Hi, I have run Michael's python map reduce example several times without any issue. I think this issue is related to your file path 'mapper.py'. you are using python binary? try this, hadoop jar /home/yyy/Dropbox/Private/xxx/Projects/task_week_22/hadoop-streaming-1.1.2.jar \ -input /user/yyy/

Re: Where to begin from??

2013-05-24 Thread Sanjay Subramanian
Hey guys Is there a way to dynamically change the input dir and outputdir I have the following CONSTANT directories in HDFS * /path/to/input/-99-99 (empty directory ) * /path/to/output/-99-99 (empty directory) A new directory with yesterdays date like /path/to/input/2013-05-23 g

Re: Error while using the Hadoop Streaming

2013-05-24 Thread Adamantios Corais
Hi, Thanks a lot for your response. Unfortunately, I run into the same problem though. What do you mean by "python binary"? This is what I have in the very first line of both scripts: #!/usr/bin/python Any ideas? On Fri, May 24, 2013 at 7:41 PM, Jitendra Yadav wrote: > Hi, > > I have run Mic

Re: Error while using the Hadoop Streaming

2013-05-24 Thread Jitendra Yadav
Hi, In your first mail you were using "/usr/bin/python" binary file just after "- mapper", I don't think we need python executable to run this example. Make sure that you are using correct path of you files "mapper.py and reducer.py" while executing. ~Thanks On Fri, May 24, 2013 at 11:31 PM

Re: Error while using the Hadoop Streaming

2013-05-24 Thread Adamantios Corais
That's the point. I think I have chosen them right, but how could I double-check it? As you see files "mapper.py and reducer.py" are on my laptop whereas input file is on the HDFS. Does this sounds ok to you? On Fri, May 24, 2013 at 8:10 PM, Jitendra Yadav wrote: > Hi, > > In your first mail y

MiniYARNCluster logs

2013-05-24 Thread Prashant Kommireddi
Hey guys, We are using the MiniYARNCluster and trying to see where the NN, RM, job logs can be found. We see the job logs are present on HDFS but not on any local dirs. Also, none of the master node logs (NN, RM) are available. Digging in a bit further (just looked at this 1 file), I see there is

RE: splittable vs seekable compressed formats

2013-05-24 Thread John Lilley
More specifically, seeking to a known location in the uncompressed data. So not just seeking to “the nearest record boundary”, but seeking to “position 1 in the uncompressed data”. I can see that if the writer kept track of this information on the side it would be available; my questio

Apache Flume Properties File

2013-05-24 Thread Raj Hadoop
Hi,   I just installed Apache Flume 1.3.1 and trying to run a small example to test. Can any one suggest me how can I do this? I am going through the documentation right now.   Thanks, Raj

question of how to debug hadoop code and mahout code

2013-05-24 Thread qiaoresearcher
hi all, is there a way to debug hadoop code in eclipse step by step using hdfs file system? thanks,

Single Output file from STORE command

2013-05-24 Thread Mix Nin
PIG STORE command produces multiple output files. I want a single output file and I tried using command as below STORE (foreach (group NoNullData all) generate flatten($1)) into ''; This command produces one single file but at the same time forces to use single reducer which kills performanc

Re: Apache Flume Properties File

2013-05-24 Thread Hitesh Shah
Hello Raj BCC-ing user@hadoop and user@hive Could you please not cross-post questions to multiple mailing lists? For questions on hadoop, go to user@hadoop. For questions on hive, please send them to the hive mailing list and not the user@hadoop mailing list. Likewise for flume. thanks -- H

Re: Apache Flume Properties File

2013-05-24 Thread Raj Hadoop
Hi, When I am reading all the stuff on internet on Flume, everything is mostly on CDH distribution. I am aware that Flume is Cloudera's contribution but I am using a strict Apache version in my research work. When I was reading all this, I want to make sure from the forum that Apache flume if ha

Re: Child Error

2013-05-24 Thread Jean-Marc Spaggiari
Hi Jim, Which JVM are you using? I don't think you have any memory issue. Else you will have got some OOME... JM 2013/5/24 Jim Twensky > Hi again, in addition to my previous post, I was able to get some error > logs from the task tracker/data node this morning and looks like it might > be a j

Issue with data Copy from CDH3 to CDH4

2013-05-24 Thread samir das mohapatra
Hi all, We tried to pull the data from upstream cluster(cdh3) which is running cdh3 to down stream system (running cdh4) ,Using *distcp* to copy the data, it was throughing some exception bcz due to version isssue. I wanted to know is there any solution to pull the data from CDH3 to CDH4 wi

Re: Issue with data Copy from CDH3 to CDH4

2013-05-24 Thread Jagat Singh
A bit unrelated but yet similar. Copy to sequence files data from cdh3 to cdh4 cluster. http://feedly.com/k/14KMYIk Thanks Jagat On May 25, 2013 1:50 PM, "samir das mohapatra" wrote: > Hi all, > > We tried to pull the data from upstream cluster(cdh3) which is > running cdh3 to down stream

Re: Where to begin from??

2013-05-24 Thread schhajed.iet
Did you try using MultipleOutputs Class. Sent from Windows Mail From: Sanjay Subramanian Sent: ‎Friday‎, ‎24‎ ‎May‎ ‎2013 ‎11‎:‎13‎ ‎PM To: user@hadoop.apache.org Hey guys Is there a way to dynamically change the input dir and outputdir I have the following CONSTANT directories in