Yeah , I think John meant seeking to record boundaries.
Thanks,
Rahul
On Fri, May 24, 2013 at 12:22 PM, Harsh J wrote:
> SequenceFiles should be seekable provided you know/manage their sync
> points during writes I think. With LZO this may be non-trivial.
>
> On Thu, May 23, 2013 at 11:01 PM,
Hi,
While running a map-reduce job, that has only mappers, I have a counter that
counts the number of failed documents .And after all the
mappers are done, I want the job to fail if the total number of failed
documents are above a fixed fraction. ( I need it in the end because I
don't know the
Yes there is a job level end-point upon success via OutputCommitter:
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/OutputCommitter.html#commitJob(org.apache.hadoop.mapreduce.JobContext)
On Fri, May 24, 2013 at 1:13 PM, abhinav gupta wrote:
> Hi,
>
> While running a map-red
just an FYI in case anyone finds this thread in a web search
I just edited my /etc/hosts file and added a mapping of
127.0.0.1
and everything starts up almost instantly, the difference is night and day :)
On 5/24/13, Harsh J wrote:
> You are spot on about the DNS lookup slowing thing
hi, all
Our hadoop is started by user "hadoop",and the user "hadoop" is super user.
Therefore, the hadoop client from other computer can manipulate HDFS using
super user "hadoop" as long as the client computer OS has the user "hadoop".
Are there any idea to stop the specified client computer OS
you can use iptables..
-- 原始邮件 --
发件人: "麦树荣" ;
发送时间: 2013年5月24日(星期五) 17:27
收件人: "user@hadoop.apache.org" ;
主题: how to stop the specified client computer OS to connect hadoopusing super
user
hi, all
Our hadoop is started by user "hadoop",and the u
Just wondering if someone can explain what is the diff between these 2 dirs:
Contents of directory /home/satish/work/mapred/staging/satish/.staging
and this dir:
/hadoop/mapred/system
Thanks
Sai
maybe network issue, datanode received an incomplete packet.
--Send from my Sony mobile.
On May 24, 2013 1:39 PM, "Stephen Boesch" wrote:
>
> On a smallish (10 node) cluster with only 2 mappers per node after a few
> minutes EOFExceptions are cropping up on the datanodes: an example is shown
> b
Hi.
why not try iptablesJ ,which in fact with no relation with hadoop. Add all
hadoop nodes and client in whitelist,add XX to blacklist if only as you
will “ stop the specified client computer OS to connect hadoop”,simply
create one iptable rule.
For long time planning,Kerberos is a must.
From:
Hi all,
I have 29 servers hadoop cluster in almost default configuration.
After installing Hadoop 1.0.4 I've noticed that JT and some TT waste CPU.
I started stracing its behaviour and found that some TT send heartbeats
in an unlimited ways.
It means hundreds in a second.
Daemon restart solves
You can ignore this for now. I was able to get merging of files to work
under Hadoop Streaming by using the following 2 properties:
-mapper "cut -f2-"
-Dmapred.reduce.tasks=0
On Fri, May 24, 2013 at 12:55 AM, Something Something <
mailinglist...@gmail.com> wrote:
> Hello,
>
> Trying to use Had
I tried this nice example:
http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
The python scripts work pretty fine from my laptop (through terminal), but
they don't when I execute them on the CDH3 (Pseudo-Distributed Mode).
Any ideas?
hadoop jar
/home/yyy/Dropbox
Hi again, in addition to my previous post, I was able to get some error
logs from the task tracker/data node this morning and looks like it might
be a jetty issue:
2013-05-23 19:59:20,595 WARN org.apache.hadoop.mapred.TaskLog: Failed to
retrieve stdout log for task: attempt_201305231647_0007_m_001
Hi,
I have run Michael's python map reduce example several times without any
issue.
I think this issue is related to your file path 'mapper.py'. you are using
python binary?
try this,
hadoop jar
/home/yyy/Dropbox/Private/xxx/Projects/task_week_22/hadoop-streaming-1.1.2.jar
\
-input /user/yyy/
Hey guys
Is there a way to dynamically change the input dir and outputdir
I have the following CONSTANT directories in HDFS
* /path/to/input/-99-99 (empty directory )
* /path/to/output/-99-99 (empty directory)
A new directory with yesterdays date like /path/to/input/2013-05-23 g
Hi,
Thanks a lot for your response.
Unfortunately, I run into the same problem though.
What do you mean by "python binary"? This is what I have in the very first
line of both scripts: #!/usr/bin/python
Any ideas?
On Fri, May 24, 2013 at 7:41 PM, Jitendra Yadav
wrote:
> Hi,
>
> I have run Mic
Hi,
In your first mail you were using "/usr/bin/python" binary file just after
"- mapper", I don't think we need python executable to run this example.
Make sure that you are using correct path of you files "mapper.py and
reducer.py" while executing.
~Thanks
On Fri, May 24, 2013 at 11:31 PM
That's the point. I think I have chosen them right, but how could I
double-check it? As you see files "mapper.py and reducer.py" are on my
laptop whereas input file is on the HDFS. Does this sounds ok to you?
On Fri, May 24, 2013 at 8:10 PM, Jitendra Yadav
wrote:
> Hi,
>
> In your first mail y
Hey guys,
We are using the MiniYARNCluster and trying to see where the NN, RM, job
logs can be found. We see the job logs are present on HDFS but not on any
local dirs. Also, none of the master node logs (NN, RM) are available.
Digging in a bit further (just looked at this 1 file), I see there is
More specifically, seeking to a known location in the uncompressed data. So
not just seeking to “the nearest record boundary”, but seeking to “position
1 in the uncompressed data”. I can see that if the writer kept track
of this information on the side it would be available; my questio
Hi,
I just installed Apache Flume 1.3.1 and trying to run a small example to test.
Can any one suggest me how can I do this? I am going through the documentation
right now.
Thanks,
Raj
hi all,
is there a way to debug hadoop code in eclipse step by step using hdfs file
system?
thanks,
PIG STORE command produces multiple output files. I want a single output
file and I tried using command as below
STORE (foreach (group NoNullData all) generate flatten($1)) into '';
This command produces one single file but at the same time forces to use
single reducer which kills performanc
Hello Raj
BCC-ing user@hadoop and user@hive
Could you please not cross-post questions to multiple mailing lists?
For questions on hadoop, go to user@hadoop. For questions on hive, please send
them to the hive mailing list and not the user@hadoop mailing list. Likewise
for flume.
thanks
-- H
Hi,
When I am reading all the stuff on internet on Flume, everything is mostly on
CDH distribution. I am aware that Flume is Cloudera's contribution but I am
using a strict Apache version in my research work. When I was reading all this,
I want to make sure from the forum that Apache flume if ha
Hi Jim,
Which JVM are you using?
I don't think you have any memory issue. Else you will have got some OOME...
JM
2013/5/24 Jim Twensky
> Hi again, in addition to my previous post, I was able to get some error
> logs from the task tracker/data node this morning and looks like it might
> be a j
Hi all,
We tried to pull the data from upstream cluster(cdh3) which is running
cdh3 to down stream system (running cdh4) ,Using *distcp* to copy the data,
it was throughing some exception bcz due to version isssue.
I wanted to know is there any solution to pull the data from CDH3 to CDH4
wi
A bit unrelated but yet similar. Copy to sequence files data from cdh3 to
cdh4 cluster.
http://feedly.com/k/14KMYIk
Thanks
Jagat
On May 25, 2013 1:50 PM, "samir das mohapatra"
wrote:
> Hi all,
>
> We tried to pull the data from upstream cluster(cdh3) which is
> running cdh3 to down stream
Did you try using MultipleOutputs Class.
Sent from Windows Mail
From: Sanjay Subramanian
Sent: Friday, 24 May 2013 11:13 PM
To: user@hadoop.apache.org
Hey guys
Is there a way to dynamically change the input dir and outputdir
I have the following CONSTANT directories in
29 matches
Mail list logo