Re: YARN: "Unauthorized request to start container, Expired Token" causes job failure

2015-10-19 Thread ed
d not find the the "yarn.resourcemanager.container-tokens.master-key-rolling-interval-secs" setting in Cloudera Manager (I know that is cloudera specific) but I think I can set it manually if anyone thinks that is worth trying. Best Regards, Ed Dorsey On Fri, Oct 16, 2015 at 3:41 PM, ed <e

YARN: "Unauthorized request to start container, Expired Token" causes job failure

2015-10-16 Thread ed
assigned? I searched through JIRA but did not see any open issues that might relate to the error we're seeing. Are there any work around to this or has anyone seen this happen before? Please let me know if there is any other information I can provide. Best Regards, Ed Dorsey

DataXceiver WRITE_BLOCK: Premature EOF from inputStream: Using Avro Multiple Outputs

2015-07-04 Thread ed
within what the nodes have (6 nodes with 90GB each). No errors in YARN either. Thank you! Best, Ed

Datanode denied communication with namenode

2014-07-26 Thread Ed Sweeney
don't want to add anything to the /etc/hosts files - shouldn't have to since the long and short names all resolve properly. Seeing hostname field in the error message has the ip field, I tried using dfs.client.use.datanode.hostname = true but no change. Any help appreciated! -Ed

Re: Datanode denied communication with namenode

2014-07-26 Thread Ed Sweeney
have added to you hosts. Please also check your exclude file and third point is to increase your dn heapsize and start it. Thanks On Jul 27, 2014 1:01 AM, Ed Sweeney ed.swee...@falkonry.com wrote: All, New AWS cluster with Cloudera 4.3 RPMs. dfs.hosts contains 3 host names, they all

Re: webhdfs read error after successful pig job

2013-06-14 Thread Ed Serrano
has seen this, has diagnostic tips, or best of all, a solution, please let me know! Thanks, Adam -- - *Ed Serrano* Mobile: 972-897-5443

Copying data to hdfs

2011-12-13 Thread Steve Ed
Sorry for the layman question. Whats the best way of writing data into HDFS from outside of the cluster. My customer is looking for wire speed data ingest into HDFS. We are considering flume, but initial performance results from flume are very discouraging. Thanks in advance. Steve

Moving data into HDFS

2011-11-22 Thread Steve Ed
Sorry for this novice question. I am trying to find the best way of moving (Copying) data in and out of HDFS. There are bunch of tools available and I need to pick the one which offers the easiest way. I have seen MapR presentation, who claim to offer direct NFS mounts to feed data into HDFS.

RE: Sizing help

2011-11-11 Thread Steve Ed
:40 PM, Steve Ed sediso...@gmail.com wrote: I am a newbie to Hadoop and trying to understand how to Size a Hadoop cluster. What are factors I should consider deciding the number of datanodes ? Datanode configuration ? CPU, Memory Amount of memory required for namenode ? My client

RE: NameNode corruption: NPE addChild at start up

2011-10-26 Thread Steve Ed
Did you ever considered keeping a backup copy of FSImage on a NFS share? The best practice is to have a reliable NFS storage mounted on the namenode and instruct the site-xml to keep a copy on the NFS mount. This will prevent FSImage loss. -Original Message- From: Markus Jelsma

Sizing help

2011-10-21 Thread Steve Ed
I am a newbie to Hadoop and trying to understand how to Size a Hadoop cluster. What are factors I should consider deciding the number of datanodes ? Datanode configuration ? CPU, Memory Amount of memory required for namenode ? My client is looking at 1 PB of usable data and will be

Re: How to stop a mapper within a map-reduce job when you detect bad input

2010-10-21 Thread ed
Hello, The MapRunner classes looks promising. I noticed it is in the deprecated mapred package but I didn't see an equivalent class in the mapreduce package. Is this going to ported to mapreduce or is it no longer being supported? Thanks! ~Ed On Thu, Oct 21, 2010 at 6:36 AM, Harsh J

Re: How to stop a mapper within a map-reduce job when you detect bad input

2010-10-21 Thread ed
Just checked the Hadoop 0.21.0 API docs (I was looking in the wrong docs before) and it doesn't look like MapRunner is deprecated so I'll try catching the error there and will report back if it's a good solution. Thanks! ~Ed On Thu, Oct 21, 2010 at 11:23 AM, ed hadoopn...@gmail.com wrote

Re: How to stop a mapper within a map-reduce job when you detect bad input

2010-10-21 Thread ed
class (org.apache.hadoop.mapreduce.MaperKEYIN, VALUEIN, KEYOUT, VALUEOUT for their mappers. ~Ed On Thu, Oct 21, 2010 at 12:14 PM, ed hadoopn...@gmail.com wrote: Just checked the Hadoop 0.21.0 API docs (I was looking in the wrong docs before) and it doesn't look like MapRunner is deprecated so I'll

Re: How to stop a mapper within a map-reduce job when you detect bad input

2010-10-21 Thread ed
Thanks Tom! Didn't see your post before posting =) On Thu, Oct 21, 2010 at 1:28 PM, ed hadoopn...@gmail.com wrote: Sorry to keep spamming this thread. It looks like the correct way to implement MapRunnable using the new mapreduce classes (instead of the deprecated mapred) is to override

Re: How to stop a mapper within a map-reduce job when you detect bad input

2010-10-21 Thread ed
(context, EOFException: Corrupt gzip file + mFileName); } } On Thu, Oct 21, 2010 at 1:29 PM, ed hadoopn...@gmail.com wrote: Thanks Tom! Didn't see your post before posting =) On Thu, Oct 21, 2010 at 1:28 PM, ed hadoopn...@gmail.com wrote: Sorry to keep spamming this thread. It looks

Re: Setting num reduce tasks

2010-10-21 Thread ed
You could also try job.setNumReduceTasks(yourNumber); ~Ed On Thu, Oct 21, 2010 at 4:45 PM, Alex Kozlov ale...@cloudera.com wrote: Hi Matt, it might be that the parameter does not end up in the final configuration for a number of reasons. Can you check the job config xml in jt:/var/log

LZO Compression Libraries don't appear to work properly with MultipleOutputs

2010-10-21 Thread ed
of using LZO compression in Hadoop. Thank you! ~Ed

Re: How to stop a mapper within a map-reduce job when you detect bad input

2010-10-21 Thread ed
our overriding the run() method and catching the EOFException works beautifully for processing files that might be corrupt or have errors. Thanks! ~Ed On Thu, Oct 21, 2010 at 2:07 PM, ed hadoopn...@gmail.com wrote: I overwrote the run() method in the mapper with a run() method (below

Re: LZO Compression Libraries don't appear to work properly with MultipleOutputs

2010-10-21 Thread ed
Hi Todd, I don't have the code in front of me right but I was looking over the API docs and it looks like I forgot to call close() on the MultipleOutput. I'll post back if that fixes the problem. If not I'll put together a unit test. Thanks! ~Ed On Thu, Oct 21, 2010 at 6:31 PM, Todd Lipcon t

Custom reduce progress reporting

2010-10-20 Thread Ed Mazur
Is there a configuration property or some other mechanism for changing how reduce tasks report progress? E.g., instead of the shuffle, merge, and reduce each counting for 33%, can this be changed? The Hadoop Online Prototype did this, but I can't figure out how. Ed

Re: Upgrading Hadoop from CDH3b3 to CDH3

2010-10-20 Thread ed
I don't think there is a stable CDH3 yet although we've been using CDH3B2 and it has been pretty stable for us. (at least I don't see it available on their website and they JUST announced CDH3B3 last week at HadoopWorld. ~Ed On Wed, Oct 20, 2010 at 5:57 AM, Abhinay Mehta abhinay.me

Re: Reduce function

2010-10-19 Thread ed
){ //split my key so that the bit flag is removed //take the modified key and mod it by numPartitions return the result } } Of course Key and Value would be whatever Key and Value class you're using. Hope that helps. ~Ed On Mon, Oct 18, 2010 at 8:58

Re: io.sort.mb maximum limit

2010-10-19 Thread ed
somewhere for the next reducer making the attempt so it might be counterproductive to try and eliminate spills. ~Ed On Tue, Oct 19, 2010 at 8:02 AM, Donovan Hide donovanh...@gmail.com wrote: Hi, is there a reason why the io.sort.mb setting is hard-coded to the maximum of 2047MB? MapTask.java 789

How to stop a mapper within a map-reduce job when you detect bad input

2010-10-19 Thread ed
this exception and tell hadoop to just ignore the file and move on? I think the exception is being thrown by the class reading in the Gzip file and not my mapper class. Is this correct? Is there a way to handle this type of error gracefully? Thank you! ~Ed

Re: Set number Reducer per machines.

2010-10-06 Thread ed
+); Sorry that's probably not much help to you. ~Ed On Wed, Oct 6, 2010 at 8:04 AM, Pramy Bhats pramybh...@googlemail.comwrote: Hi Ed, I was using the following file for mapreduce job. Cloud9/src/dist/edu/umd/cloud9/example/cooccur/ComputeCooccurrenceMatrixStripes.java thanks, --Pramod On Tue

Re: Problem with LzoTokenizedLoader with elephant-bird branch for Pig 0.7

2010-09-30 Thread ed
Hello, I'm using Cloudera's Hadoop CDH3B2--Hadoop-0.20.2+320 (based on Apache Hadoop 20.2) with Pig 0.7 (from Cloudera's distro). Thank you! ~Ed On Wed, Sep 29, 2010 at 11:56 PM, Rohan Rai rohan@inmobi.com wrote: Hi Which Hadoop/ PIg version are you using ?? Regards Rohan ed

Re: Read/Writing into HDFS

2010-09-30 Thread ed
+HDFS ~Ed On Thu, Sep 30, 2010 at 7:59 AM, Adarsh Sharma adarsh.sha...@orkash.comwrote: Dear all, I have set up a Hadoop cluster of 10 nodes. I want to know that how we can read/write file from HDFS (simple). Yes I know there are commands, i read the whole HDFS commands. bin/hadoop

Re: Problem with LzoTokenizedLoader with elephant-bird branch for Pig 0.7

2010-09-29 Thread ed
) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Do you think I'm forgetting some required library? Thank you! ~Ed On Tue, Sep 28, 2010 at 2:10 PM, ed hadoopn

Re: Problem with LzoTokenizedLoader with elephant-bird branch for Pig 0.7

2010-09-28 Thread ed
Thank you Rohan, I really appreciate your help! I'll give it shot and post back if it works. ~Ed On Mon, Sep 27, 2010 at 11:51 PM, Rohan Rai rohan@inmobi.com wrote: Just corrected/tested and pushed LzoTokenizedLoader to the personal fork Hopefully it works now Regards Rohan

Re: How to config Map only job to read .gz input files and output result in .lzo

2010-09-28 Thread ed
to make sure hadoop sees your jar and native library) Hope that works! ~Ed On Tue, Sep 28, 2010 at 3:06 PM, Steve Kuo kuosen...@gmail.com wrote: We have TB worth of XML data in .gz format where each file is about 20 MB. This dataset is not expected to change. My goal is to write a map-only

Re: Proper blocksize and io.sort.mb setting when using compressed LZO files

2010-09-27 Thread ed
Should this be something that needs to be added? Thank you for the help! ~Ed On Mon, Sep 27, 2010 at 11:18 AM, Ted Yu yuzhih...@gmail.com wrote: The setting should be fs.inmemory.size.mb On Mon, Sep 27, 2010 at 7:15 AM, pig hadoopn...@gmail.com wrote: HI Sriguru, Thank you for the tips

Re: Reducer-side join example

2010-04-06 Thread Ed Kohlwey
Hi, Your question has an academic sound, so I'll give it an academic answer ;). Unfortunately, there are not really any good generalized (ie. cross join a large matrix with a large matrix) methods for doing joins in map-reduce. The fundamental reason for this is that in the general case you're

Re: question on shuffle and sort

2010-03-30 Thread Ed Mazur
. This grouping is achieved by sorting which means you see keys in increasing order. Ed

Re: Strange behavior regarding stout,stderr,syslog

2010-03-14 Thread Ed Mazur
? With it enabled, I've observed that all tasks associated with a particular JVM go to the same log. Ed

Re: sort done parallel or after copy ?

2010-03-05 Thread Ed Mazur
merged down to (at most) n files and a final merge goes directly into the user reduce function. Ed On Fri, Mar 5, 2010 at 12:36 AM, prasenjit mukherjee prasen@gmail.com wrote: if I understand correctly reduce has 3 stages : copy,sort,reduce. Copy happens  parallely with mappers  still

Re: Writing a simple sort application for Hadoop

2010-02-28 Thread Ed Mazur
competition might be of interest to you. Ed [1] http://sortbenchmark.org/Yahoo2009.pdf On Sun, Feb 28, 2010 at 1:53 PM, aa...@buffalo.edu wrote: Hello,      I am trying to write a simple sorting application for hadoop. This is what I have thought till now. Suppose I have 100 lines of data

Re: How are intermediate key/value pairs materialized between map and reduce?

2010-02-24 Thread Ed Mazur
intermediate spills. To fix this, you can try tuning the per-job configurables io.sort.mb and io.sort.record.percent. Look at the counters of a few map tasks to get an idea of how much data (io.sort.mb) and how many records (io.sort.record.percent) they produce. Ed On Wed, Feb 24, 2010 at 2:45 AM

Re: How are intermediate key/value pairs materialized between map and reduce?

2010-02-23 Thread Ed Mazur
on the reduce side too during the shuffle and multi-pass merge. Ed 2010/2/23 Tim Kiefer tim-kie...@gmx.de: Hi Gang, thanks for your reply. To clarify: I look at the statistics through the job tracker. In the webinterface for my job I have columns for map, reduce and total. What I was refering

Re: Strange behaviour from a custom Writable

2010-02-08 Thread Ed Mazur
they will all be references to the last item from the iterator. Ed On Mon, Feb 8, 2010 at 12:23 PM, James Hammerton james.hammer...@mendeley.com wrote: Hi, For a particular project I created a writable for holding a long and a double called LongDoublePair. My mapper outputs LongDoublePair values

Re: Strange behaviour from a custom Writable

2010-02-08 Thread Ed Mazur
org.apache.hadoop.examples.SecondarySort for a nice example. This lets Hadoop internals do some of the heavy lifting and removes the requirement that all values for a key fit in memory (though I guess if you only care about the top 20, your space requirement is still O(1)). Ed On Mon, Feb 8, 2010 at 5:58 PM, James Hammerton

Re: hadoop under cygwin issue

2010-02-03 Thread Ed Mazur
some light on why your first attempt failed. /user/brian/input should be a directory with several xml files. Ed On Wed, Feb 3, 2010 at 5:17 PM, Brian Wolf brw...@gmail.com wrote: Alex Kozlov wrote: Live Nodes http://localhost:50070/dfshealth.jsp#LiveNodes     :       0 You datanode is dead

Re: Failed to install Hadoop on WinXP

2010-01-27 Thread Ed Mazur
I tried running 0.20.0 on XP too a few weeks ago and stuck at the same spot. No problems with standalone mode. Any insight would be appreciated, thanks. Ed On Wed, Jan 27, 2010 at 11:41 AM, Yura Taras yura.ta...@gmail.com wrote: Hi all I'm trying to deploy pseudo-distributed cluster on my

Re: do all mappers finish before reducer starts

2010-01-26 Thread Ed Mazur
between parts 1 and 2 as the reduce memory buffer fills up, merges, and spills to disk. There is also overlap between parts 2 and 3 because the final merge is fed directly into the user reduce function to minimize the amount of data written to disk. Ed On Tue, Jan 26, 2010 at 5:27 PM, adeelmahmood

Avoiding value buffering in reduce

2010-01-16 Thread Ed Mazur
back into key/value pairs, unlike the (memory-consuming) ArrayWritable approach. Ed

Re: Should mapreduce.ReduceContext reuse same object in nextKeyValue?

2010-01-13 Thread Ed Mazur
of something like Text, IntArrayWritable, is there a way to build and output the id array without buffering values? The only alternative I see is to instead use Text, IntWritable and repeat the term for every doc id, but this seems wasteful. Ed

Re: Questions about dfs and MapRed in the Hadoop.

2010-01-05 Thread Ed Mazur
that the underlying structure of an HDFS file is a collection of large blocks (64MB default) and that it is these blocks that are replicated. Ed

Re: Help on processing large amount of videos on hadoop

2009-12-22 Thread Ed Kohlwey
Hi Huazhong, Sounds like an interesting application. Here's a few tips. 1. If the frames are not independent, you should find a way to key them according to their order before dumping them in Hadoop so that they can be sorted as part of your map reduce task. BTW, the video won't appear split

Re: Can hadoop 0.20.1 programs runs on Amazon Elastic Mapreduce?

2009-12-16 Thread Ed Kohlwey
Last time I checked EMR only runs 0.18.3. You can use EC2 though, which winds up being cheaper anyways. On Wed, Dec 16, 2009 at 8:51 PM, 松柳 lamfeeli...@gmail.com wrote: Hi all, I'm wondering whether Amazon starts to support the newest stable version of Hadoop, or we can still just use 0.18.3?

Re: multiple file input

2009-12-08 Thread Ed Kohlwey
One important thing to note is that, with cross products, you'll almost always get better performance if you can fit both files on a single node's disk rather than distributing the files. On Tue, Dec 8, 2009 at 9:18 AM, laser08150815 la...@laserxyz.de wrote: pmg wrote: I am evaluating

Re: RE: Using Hadoop in non-typical large scale user-driven environment

2009-12-02 Thread Ed Kohlwey
As far as replication goes, you should look at a project called pastry. Apparently some people have used hadoop mapreduce on top of it. You will need to be clever, however, in how you do your mapreduce because you probably won't want the job to eat all the users cpu time. On Dec 2, 2009 5:11 PM,

Re: New graphic interface for Hadoop - Contains: FileManager, Daemon Admin, Quick Stream Job Setup, etc

2009-11-18 Thread Ed Kohlwey
The tool looks interesting. You should consider providing the source for it. Is it written in a language that can run on platforms besides windows? On Nov 17, 2009 10:40 AM, Cubic cubicdes...@gmail.com wrote: Hi list. This tool is a graphic interface for Hadoop. It may improove your productivity

Re: About Distribute Cache

2009-11-15 Thread Ed Kohlwey
Hi, What you can fit in distributed cache generally depends on the available disk space on your nodes. With most clusters 300 mb will not be a problem, but it depends on the cluster and the workload you're processing. On Sat, Nov 14, 2009 at 10:34 PM, 于凤东 fengdon...@gmail.com wrote: I have a