Re: hadoop ecosystem

2012-01-28 Thread Ted Yu
Same with Solr and Lily. On Sat, Jan 28, 2012 at 8:09 AM, Ted Yu wrote: > I think Bookkeeper should be included as well. > > > On Sat, Jan 28, 2012 at 7:59 AM, Ayad Al-Qershi wrote: > >> I'm compiling a list of all Hadoop ecosystem/sub projects ordered >> alphab

Re: hadoop ecosystem

2012-01-28 Thread Ted Yu
I think Bookkeeper should be included as well. On Sat, Jan 28, 2012 at 7:59 AM, Ayad Al-Qershi wrote: > I'm compiling a list of all Hadoop ecosystem/sub projects ordered > alphabetically and I need your help if I missed something. > >1. Ambari >2. Avro >3. Cascading >4. Cascalog

Re: HBase support in SHDP Was: Hadoop with Spring / Guice

2011-12-30 Thread Ted Yu
ep the feedback coming, > > P.S. just want to point out there's a dedicated forum for SHDP, if the > discussion becomes too Spring specific. > > [1] http://forum.springsource.org/forumdisplay.php?80-NoSQL > > On 12/30/2011 12:14 PM, Ted Yu wrote: > > Hi, Costin:

HBase support in SHDP Was: Hadoop with Spring / Guice

2011-12-30 Thread Ted Yu
Hi, Costin: I work on HBase. I went over http://static.springsource.org/spring-hadoop/docs/current/reference/hbase.htmlbut didn't have time to download the source code. Is there a typo: 'does more then easily' Should 'then' be 'than' ? For the following config: May I ask what would the proxies

Re: what is mapred.reduce.parallel.copies?

2011-06-28 Thread Ted Yu
Which hadoop version are you using ? If it is 0.20.2, mapred.reduce.parallel.copies is the number of copying threads in ReduceTask In the scenario you described, at least 2 concurrent connections to a single node would be made. I am not familiar with newer versions of hadoop. On Tue, Jun 28, 201

Re: Stupid questions about combiners in ...hadoop.mapreduce

2011-05-23 Thread Ted Yu
Questions 2 and 3 can be answered relatively easily: Remember, the output of the combiner is going to be consumed by the reducer. So the output key/vlaue classes of the combiner have to align with the input key/vlaue classes of the reducer. On Mon, May 23, 2011 at 11:32 AM, Mike Spreitzer wrote:

Re: mapred.min.split.size

2011-03-18 Thread Ted Yu
Cycling bits: http://search-hadoop.com/m/O7sT4278lbG/but+it+seems+a+trade+off+with+the+number+of+files+that+have+to+be+shuffled+for+the&subj=RE+HDFS+block+size+v+s+mapred+min+split+size On Fri, Mar 18, 2011 at 12:54 PM, Pedro Costa wrote: > Hi > > What's the purpose of the parameter "mapred.min.

Re: Sequence File usage queries

2011-02-23 Thread Ted Yu
know it: > i) Is there any hadoop command that can do it ? > ii) Or we will have to provide some interface to the user to see the > metadata ? > > -JJ > > On Sat, Feb 19, 2011 at 9:17 AM, Ted Yu wrote: > >> Option 2 is better. >> Please see this in Sequen

Re: Sequence File usage queries

2011-02-19 Thread Ted Yu
Option 2 is better. Please see this in SequenceFile: public static Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, int bufferSize, short replication, long blockSize, CompressionType compressio

Re: Passing messages

2010-12-18 Thread Ted Yu
In your reducer, you can utilize Reporter (getCounter and incrCounter methods) to pass this information between reducers. On Sat, Dec 18, 2010 at 8:04 AM, Martin Becker <_martinbec...@web.de> wrote: > Hello everbody, > > I am wondering if there is a feature allowing (in my case) reduce > tasks to

Re: object Writable and Serialization

2010-12-10 Thread Ted Yu
Old bits: Can you try adding 'org.apache.hadoop.io. serializer.JavaSerialization,' to the following config ? "C:\hadoop-0.20.2\src\core\core-default.xml"(87,9): io.serializations By default, only org.apache.hadoop.io.serializer.WritableSerialization is included. On Fri, Dec 10, 2010 at 7:22 A

Re: How to share Same Counter in Multiple Jobs?

2010-12-09 Thread Ted Yu
I wrote the following code today. We have our own flow execution logic which calls the following to collect counters. enum COUNT_COLLECTION { LOG,// log the counters ADD_TO_CONF// add counters to JobConf } protected static void collectCounters(Runni

Re: Memory Manager in Hadoop MR

2010-12-09 Thread Ted Yu
For 1, TMMT uses ProcessTree to check for task that is running beyond memory-limits and kills it. On Thu, Dec 9, 2010 at 3:05 AM, Pedro Costa wrote: > Hi, > > 1 - Hadoop MR contains a TaskMemoryManagerThread class that is used to > manage memory usage of tasks running under a TaskTracker. Why Ha

Re: task statistics

2010-12-06 Thread Ted Yu
We use the following in our client: JobClient jobClient = new JobClient(jobConf); TaskReport[] mapTaskReports = jobClient.getMapTaskReports(job.getID()); if (mapTaskReports != null) { for (TaskReport tr : mapTaskReports) { TIPStatus tips = tr.getC

Re: Migrating from mapred to mapreduce API

2010-11-18 Thread Ted Yu
You can get the source code here: http://archive.cloudera.com/cdh/3/hadoop-0.20.2+320.tar.gz On Thu, Nov 18, 2010 at 4:21 PM, Srihari Anantha Padmanabhan < sriha...@yahoo-inc.com> wrote: > I am using Hadoop 0.20.2. > > On Nov 18, 2010, at 4:14 PM, Ted Yu wrote: > > > hadoop > >

Re: Migrating from mapred to mapreduce API

2010-11-18 Thread Ted Yu
wrong. > > I can find only the following classes under mapreduce/lib/output > FileOutputCommitter.java FileOutputFormat.java NullOutputFormat.java > SequenceFileOutputFormat.java TextOutputFormat.java > > On Nov 18, 2010, at 3:40 PM, Ted Yu wrote: > > > ked at >

Re: Migrating from mapred to mapreduce API

2010-11-18 Thread Ted Yu
Have you looked at src/mapred/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java ? On Thu, Nov 18, 2010 at 3:15 PM, Srihari Anantha Padmanabhan < sriha...@yahoo-inc.com> wrote: > Hi, > > I am working on migrating a mapreduce program from using > org.apache.hadoop.mapred to org.apache.had

Re: MultipleOutputFormat

2010-10-31 Thread Ted Yu
>From Java doc: * Generate the file output file name based on the given key and the leaf file * name. The default behavior is that the file name does not depend on the * key. You can extend MultipleTextOutputFormat or MultipleSequenceFileOutputFormat to override the default naming. On Su

Re: How to modify heartbeat message and its return message?

2010-10-28 Thread Ted Yu
Take a look at FSNamesystem.heartbeatCheck() On Wed, Oct 27, 2010 at 5:21 PM, Shen LI wrote: > Hi, thank you very much for your reply. > > I want to modify 0.20.2 > > > On Wed, Oct 27, 2010 at 7:12 PM, Ted Yu wrote: > >> Which hadoop version do you want to modif

Re: How to modify heartbeat message and its return message?

2010-10-27 Thread Ted Yu
Which hadoop version do you want to modify ? On Wed, Oct 27, 2010 at 2:28 PM, Shen LI wrote: > Hi, > > I want to modify the heartbeat message to carry more information from > worker nodes to master node, and also want to modify the return message of > heartbeat. Do you know which file and functi

Re: ClassCastException

2010-10-07 Thread Ted Yu
Have you checked http://download.oracle.com/javase/6/docs/api/javax/xml/stream/XMLEventReader.html? On Thu, Oct 7, 2010 at 5:49 PM, Johannes.Lichtenberger < johannes.lichtenber...@uni-konstanz.de> wrote: > On 10/08/2010 02:38 AM, Johannes.Lichtenberger wrote: > > On 10/08/2010 0

Re: how to set system properties for mapper/reducer?

2010-10-07 Thread Ted Yu
, 2010 at 5:12 PM, Yin Lou wrote: > I don't understand. I want to pass something like > java.library.path="/home/.../libXXX.so" so that every mapper can load that > lib and use the native code. > > Could you give me an example? > > Thanks! > Yin > > &g

Re: ClassCastException

2010-10-07 Thread Ted Yu
< johannes.lichtenber...@uni-konstanz.de> wrote: > On 10/08/2010 12:01 AM, Ted Yu wrote: > > http://www.ibm.com/developerworks/xml/library/x-stax2.html > > I think the problem would be how to serialize XMLEvents or more > precisely I don't know if there's an existi

Re: how to set system properties for mapper/reducer?

2010-10-07 Thread Ted Yu
Take a look at bin/hadoop: hadoop: HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$JAVA_LIBRARY_PATH" On Thu, Oct 7, 2010 at 12:04 PM, Yin Lou wrote: > Hi, > > Is there any way to pass system properties, like java.library.path to each > mapper/reducer? > > Thanks, > Yin >

Re: ClassCastException

2010-10-07 Thread Ted Yu
http://www.ibm.com/developerworks/xml/library/x-stax2.html On Thu, Oct 7, 2010 at 2:54 PM, Johannes.Lichtenberger < johannes.lichtenber...@uni-konstanz.de> wrote: > On 10/07/2010 05:41 PM, Ted Yu wrote: > > Since mFormatter.format() returns a String, you don't need to i

Re: ClassCastException

2010-10-07 Thread Ted Yu
Since mFormatter.format() returns a String, you don't need to introduce newline. You can call paramOut.writeUTF() to save the String and call paramIn.readUTF() to read it back. The value class doesn't need to implement Comparable. On Thu, Oct 7, 2010 at 6:58 AM, Johannes.Lichtenberger < johannes.

Re: MiniMRCluster not found

2010-09-26 Thread Ted Yu
The source code is located at hadoop-0.20.2+320//src/test/org/apache/hadoop/mapred/MiniMRCluster.java On Sun, Sep 26, 2010 at 2:23 PM, Johannes.Lichtenberger < johannes.lichtenber...@uni-konstanz.de> wrote: > Hello, > > I'm currently trying to write testcases for my mapreduce application and > I'

Re: Remote connection bottleneck?

2010-09-25 Thread Ted Yu
t; otherwise it will say bash: hadoop: command not found. > > Thanks again :) for your time. > > Mario Maqueo > ITESM-CEM > > > > PS: "El sistema no puede encontrar la ruta especificada" = "The system > can't find the specified route" In case the span

Re: Remote connection bottleneck?

2010-09-25 Thread Ted Yu
ection> " didn't work. >> >> I am not very experienced with ssh, so I am sorry if this is basic stuff. >> >> Thanks, >> >> Mario Maqueo >> ITESM-CEM >> >> 2010/9/25 Ted Yu >> >> Mario: >>> Please produce a jar, place

Re: Remote connection bottleneck?

2010-09-25 Thread Ted Yu
Mario: Please produce a jar, place it on one of the servers in the cloud and run from there. On Sat, Sep 25, 2010 at 7:46 AM, Raja Thiruvathuru wrote: > MapReduce doesn't download the actual data, but it reads meta-data before > it starts MapReduce job > > > On Sat, Sep 25, 2010 at 7:55 AM, Mario

Re: custom task cleanup even when task is killed?

2010-09-13 Thread Ted Yu
Since I don't know what cleanup you're doing, just a reminder that you may run into: https://issues.apache.org/jira/browse/HADOOP-4829 On Mon, Sep 13, 2010 at 1:43 PM, Chase Bradford wrote: > Not yet :) I forgot about that one. > > Thanks Ted. > > On Mon, Sep 13, 2010

Re: custom task cleanup even when task is killed?

2010-09-13 Thread Ted Yu
Have you tried this ? Runtime.getRuntime().addShutdownHook(new ShutdownThread()); On Mon, Sep 13, 2010 at 1:33 PM, Chase Bradford wrote: > Thanks David, > > Unfortunately, that's only called when a task finishes consuming input > successfully. My issue deals with tasks that are killed (

Re: How to use distributed cache api

2010-09-07 Thread Ted Yu
Consider using the following to retrieve file: Path[] cacheFiles=DistributedCache.getFileClassPaths(conf); BufferedReader joinReader=new BufferedReader(new FileReader(cacheFiles[0].toString())); On Tue, Sep 7, 2010 at 6:02 AM, Cristi Cioriia < cristian-andrei.cior...@1and1.

Re: Read After Write Consistency in HDFS

2010-09-02 Thread Ted Yu
One possibility, due to the asynchronous nature of your loader, was that the consumer job started before all files from loader were written (propagated) completely. Can you describe what problem you encountered with OutputCollector ? On Thu, Sep 2, 2010 at 10:35 AM, Elton Pinto wrote: > Hello,

Re: Is continuous map reduce supported

2010-09-01 Thread Ted Yu
Try this: http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/lib/ChainReducer.html On Wed, Sep 1, 2010 at 8:33 PM, Lance Norskog wrote: > Dead link: > > > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/ChainReducer.html > > On Wed, Sep 1, 201

Re: How to debug ReduceTask?

2010-08-28 Thread Ted Yu
Have you tried http://www.karmasphere.com/ ? On Sat, Aug 28, 2010 at 6:24 PM, Pedro Costa wrote: > Hi, I would like to debug the shuffle phase from a Reducer, but I > can't because the Reducer starts as a new process. I've tried all the > options that some pages says, but it doesn't work in the

Re: Null mapper?

2010-08-16 Thread Ted Yu
You're right. You need to specify a mapper. On Mon, Aug 16, 2010 at 3:21 PM, David Rosenstrauch wrote: > On 08/16/2010 05:48 PM, Ted Yu wrote: > >> No. >> >> On Mon, Aug 16, 2010 at 1:25 PM, David Rosenstrauch> >wrote: >> >> Is it

Re: Null mapper?

2010-08-16 Thread Ted Yu
No. On Mon, Aug 16, 2010 at 1:25 PM, David Rosenstrauch wrote: > Is it possible for a M/R job to have no mapper? i.e.: > job.setMapperClass(null)? Or is it required that one at least use an > "identity mapper" (i.e., plain vanilla org.apache.hadoop.mapreduce.Mapper)? > > Thanks, > > DR >

Re: How to work around MAPREDUCE-1700

2010-08-12 Thread Ted Yu
Hopefully Cloudera will make a build for your needs. Before that happens, you can produce your own installation. We do this to produce HBase 0.20.6 + HBASE-2473 Cheers On Thu, Aug 12, 2010 at 4:21 PM, David Rosenstrauch wrote: > On 08/12/2010 07:02 PM, Ted Yu wrote: > >> How a

Re: How to work around MAPREDUCE-1700

2010-08-12 Thread Ted Yu
How about hack #3: maintain your installation of hadoop where you replace jackson jar with v1.5.4 jar ? On Thu, Aug 12, 2010 at 3:40 PM, David Rosenstrauch wrote: > Anyone have any ideas how I might be able to work around > https://issues.apache.org/jira/browse/MAPREDUCE-1700 ? It's quite a > th

Re: block errors

2010-07-24 Thread Ted Yu
Check the datanode log on 10.15.46.73 You should increase dfs.datanode.max.xcievers On Tue, Jul 13, 2010 at 3:57 AM, Some Body wrote: > Hi All, > > I had a MR job that processed 2000 small (<3MB ea.) files and it took 40 > minutes on 8 nodes. > Since the files are sm

Re: INFO: Task Id : attempt_201007191410_0002_m_000000_0, Status : FAILED

2010-07-20 Thread Ted Yu
What hadoop version are you using ? I guess you haven't specified io.serializations in your hadoop conf Then by default your class should implement org.apache.hadoop.io.Writable On Tue, Jul 20, 2010 at 6:01 AM, Khaled BEN BAHRI < khaled.ben_ba...@it-sudparis.eu> wrote: > hi > > When i wrote a ma

Re: Error: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get

2010-07-18 Thread Ted Yu
line: List loc = mapLocations.get(host); If so, you can print the value of host, or value of URI u which host is derived. Cheers On Fri, Jul 16, 2010 at 9:42 AM, Chinni, Ravi wrote: > Hadoop version: 0.20.2 > > > > *From:* Ted Yu [mailto:yuzhih...@gmail.com] >

Re: Error: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get

2010-07-16 Thread Ted Yu
What version of hadoop are you using ? On Fri, Jul 16, 2010 at 8:56 AM, Chinni, Ravi wrote: > I am trying to run the terasort example with a small input on a 4 node > cluster. I just did the minimal configuration (fs.default.name, master, > slaves etc.), but did not do anything specific to ter

Re: specify different number of mapper tasks for different machines

2010-07-14 Thread Ted Yu
hadoop-daemon.sh also needs to be modified - it would wipe your custom config files: if [ "$HADOOP_MASTER" != "" ]; then echo rsync from $HADOOP_MASTER rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*' $HADOOP_MASTER/ "$HADOOP_HOME" fi On

Re: Fixing a failed reduce task

2010-07-13 Thread Ted Yu
; > On Tue, Jul 13, 2010 at 4:51 PM, Ted Yu wrote: > >> A general solution for OOME is to reduce the size of input to (reduce) >> task so that each (reduce) task consumes less memory. >> >> >> On Tue, Jul 13, 2010 at 10:16 AM, Steve Lewis wrote: >> >>&g

Re: Fixing a failed reduce task

2010-07-13 Thread Ted Yu
A general solution for OOME is to reduce the size of input to (reduce) task so that each (reduce) task consumes less memory. On Tue, Jul 13, 2010 at 10:16 AM, Steve Lewis wrote: > I am running a map reduce ob where a few reduce tasks fail with an out of > memory error - > Increasing the memory i

Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

2010-07-09 Thread Ted Yu
owever, when I use lzop for intermediate compression I > am still having trouble - the reduce phase now freezes at 99% and > eventually fails. > No immediate problem, because I can use the default codec. > But may be of concern to someone else. > > Thanks > > On Fri, Jul 9,

Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

2010-07-08 Thread Ted Yu
e version referenced on the wiki: > http://wiki.apache.org/hadoop/UsingLzoCompression > > I will try the latest version and see if that fixes the problem. > http://github.com/kevinweil/hadoop-lzo > > Thanks > > On Fri, Jul 9, 2010 at 3:22 AM, Todd Lipcon wrote: > > On Thu,

Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

2010-07-08 Thread Ted Yu
Todd fixed a bug where LZO header or block header data may fall on read boundary: http://github.com/toddlipcon/hadoop-lzo/commit/f3bc3f8d003bb8e24f254b25bca2053f731cdd58 I am wondering if that is related to the issue you saw. On Wed, Jul 7, 2010 at 11:49 PM, bmdevelopment wrote: > A little more

Re: naming the output fle of reduce to the partition number

2010-07-08 Thread Ted Yu
Please take a look at getUniqueName() method of src/mapred/org/apache/hadoop/mapred/FileOutputFormat.java It retrieves "mapred.task.partition" On Thu, Jul 8, 2010 at 2:13 AM, Denim Live wrote: > Hi Everyone, > I am having some problem with naming the output file of each reduce task > with the pa

Re: SerializationFactory NullPointerException

2010-06-30 Thread Ted Yu
You should add this: job.setInputFormatClass(TextInputFormat.class); And your TokenizerMapper should extend Mapperwrote: > Hi, > > My input look like (userid, itemid) as follows: > ... > 122641863,5060057723326 > 123441107,9789020282948 > ... > > I tried to write a MapReduce Job with Mapper Int

Re: Need help with exception when mapper emits different key class from reducer

2010-06-19 Thread Ted Yu
There is no need to call job.setCombinerClass() Combiner is optional. On Sat, Jun 19, 2010 at 10:01 AM, Steve Lewis wrote: > Wow - I cannot tell you how much I thank you - I totally missed the fact > that the exception is thrown in the combiner since I was seeing the > exception in the reducer

Re: Multithreaded Mapper and Map runner

2010-06-16 Thread Ted Yu
If only thread is created to run mapper/reducer, how would mapred.child.java.opts be effective ? Please refer to src/mapred/org/apache/hadoop/mapred/TaskRunner.java which is not very long. On Wed, Jun 16, 2010 at 9:10 PM, Jyothish Soman wrote: > > I have another doubt, for cross checking. The nu

Re: Out of Memory during Reduce Merge

2010-06-14 Thread Ted Yu
local/hadoop/hprof, but it seems the processes are not > dumping regardless, not sure why. Any ideas on why? I'm going to try more > general heap dump requirements, and see if I can get some logs. > > - Ruben > > > > > -- > *From:* Ted

Re: Out of Memory during Reduce Merge

2010-06-13 Thread Ted Yu
rkarounds you can think of? Is there > anything else I can send your way to give you more information as to what is > happening? > > Thanks, > > - Ruben > > > ------ > *From:* Ted Yu > *To:* mapreduce-user@hadoop.apache.org > *Sent:* S

Re: Out of Memory during Reduce Merge

2010-06-13 Thread Ted Yu
>From stack trace, they are two different problems. MAPREDUCE-1182 didn't solve all issues w.r.t. shuffling. See 'Shuffle In Memory OutOfMemoryError' discussion where OOME was reported even when MAPREDUCE-1182

Re: Setting a custom splitter and/or tracking files split?

2010-06-12 Thread Ted Yu
Try this: FileSplit fileSplit = (FileSplit) context.getInputSplit(); String sFileName = fileSplit.getPath().getName(); On Fri, Jun 11, 2010 at 9:14 AM, Steve Lewis wrote: > I am running into an issue that the splitter is reading the wrong file and > causing my program to fail - > I cannot find w

Re: Split files, index files and input files

2010-06-09 Thread Ted Yu
For 1, see http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/MapFile.html For 2, see http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/InputSplit.html On Wed, Jun 9, 2010 at 9:36 AM, psdc1978 wrote: > Hi, > > I'm facing difficulty in understanding all

Re: TaskTracker vs TaskInProgress

2010-06-09 Thread Ted Yu
Please refer to http://archive.cloudera.com/cdh/3/hadoop-0.20.2+228/api/org/apache/hadoop/mapred/TaskInProgress.html See method *getTaskToRun*(String taskTracker) On Wed, Jun 9, 2010 at 9:30 AM, psdc1978 wrote: > Hi

Re: almost sorted map output

2010-05-30 Thread Ted Yu
Check out https://issues.apache.org/jira/browse/HADOOP-3442 and https://issues.apache.org/jira/browse/HADOOP-3308 On Fri, May 28, 2010 at 10:58 PM, juber patel wrote: > Hello, > > Can Hadoop take advantage of the fact that the output of each map task > is almost sorted? > > On a related note, D

Re: Incorrect dates on job tracker history

2010-05-17 Thread Ted Yu
I didn't find an existing JIRA so I logged MAPREDUCE-1796 On Mon, May 17, 2010 at 6:44 PM, Hemanth Yamijala wrote: > James, > > > According to the job tracker history viewer on our Hadoop setup, all of > our > > recent jobs were run on Wed May 05 15:58:58 UTC 2010. When you click on > the > > ind

Re: Incorrect dates on job tracker history

2010-05-17 Thread Ted Yu
Which version of hadoop are you using ? Thanks On Monday, May 17, 2010, James Hammerton wrote: > Hi, > > According to the job tracker history viewer on our Hadoop setup, all of our > recent jobs were run on Wed May 05 15:58:58 UTC 2010. When you click on the > individual jobs you get the corre