Re: Child Error

2013-05-28 Thread Jim Twensky
sometime working, sometime failing? also, can you clear you tmp directory and make sure you have enough space it it before you retry? JM 2013/5/27 Jim Twensky jim.twen...@gmail.com Hi Jean, I switched to Oracle JDK 1.6 as you suggested and ran a job successfully this afternoon which lasted

Re: Child Error

2013-05-25 Thread Jim Twensky
2013/5/24 Jim Twensky jim.twen...@gmail.com Hi again, in addition to my previous post, I was able to get some error logs from the task tracker/data node this morning and looks like it might be a jetty issue: 2013-05-23 19:59:20,595 WARN org.apache.hadoop.mapred.TaskLog: Failed to retrieve

Re: Child Error

2013-05-24 Thread Jim Twensky
/browse/MAPREDUCE-2389If so, how do I downgrade my jetty version? Should I just replace the jetty jar file in the lib directory with an earlier version and restart my cluster? Thank you. On Thu, May 23, 2013 at 7:14 PM, Jim Twensky jim.twen...@gmail.com wrote: Hello, I have a 20 node Hadoop

Child Error

2013-05-23 Thread Jim Twensky
Hello, I have a 20 node Hadoop cluster where each node has 8GB memory and an 8-core processor. I sometimes get the following error on a random basis: --- Exception in thread main

Re: Question on HDFS_BYTES_READ and HDFS_BYTES_WRITTEN

2013-05-17 Thread Jim Twensky
better to look at map/reduce input/output bytes form of counters instead. On Tue, May 14, 2013 at 10:41 PM, Jim Twensky jim.twen...@gmail.com wrote: I have an iterative MapReduce job that I run over 35 GB of data repeatedly. The output of the first job is the input to the second one

Question on HDFS_BYTES_READ and HDFS_BYTES_WRITTEN

2013-05-14 Thread Jim Twensky
I have an iterative MapReduce job that I run over 35 GB of data repeatedly. The output of the first job is the input to the second one and it goes on like that until convergence. I am seeing a strange behavior with the program run time. The first iteration takes 4 minutes to run and here is how

Re: Wrapping around BitSet with the Writable interface

2013-05-13 Thread Jim Twensky
://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29 Regards Bertrand On Sun, May 12, 2013 at 8:24 PM, Jim Twensky jim.twen...@gmail.comwrote: I have large java.util.BitSet objects that I want to bitwise-OR using a MapReduce job. I decided to wrap around each object

Re: Chaning Multiple Reducers: Reduce - Reduce - Reduce

2012-10-08 Thread Jim Twensky
://hama.apache.org 2. http://wiki.apache.org/hama/Benchmarks On Sat, Oct 6, 2012 at 1:31 AM, Jim Twensky jim.twen...@gmail.com wrote: Hi, I have a complex Hadoop job that iterates over large graph data multiple times until some convergence condition is met. I know that the map output goes

Chaning Multiple Reducers: Reduce - Reduce - Reduce

2012-10-05 Thread Jim Twensky
Hi, I have a complex Hadoop job that iterates over large graph data multiple times until some convergence condition is met. I know that the map output goes to the local disk of each particular mapper first, and then fetched by the reducers before the reduce tasks start. I can see that this is an

Re: Chaning Multiple Reducers: Reduce - Reduce - Reduce

2012-10-05 Thread Jim Twensky
, Oct 5, 2012 at 10:01 PM, Jim Twensky jim.twen...@gmail.com wrote: Hi, I have a complex Hadoop job that iterates over large graph data multiple times until some convergence condition is met. I know that the map output goes to the local disk of each particular mapper first, and then fetched

Re: Chaning Multiple Reducers: Reduce - Reduce - Reduce

2012-10-05 Thread Jim Twensky
, but not the shuffle? Or am I wrong? On Fri, Oct 5, 2012 at 11:13 PM, Jim Twensky jim.twen...@gmail.com wrote: Hi Harsh, Yes, there is actually a hidden map stage, that generates new key,value pairs based on the last reduce output but I can create those records during the reduce step instead

FileSystem API - Moving files in HDFS

2011-05-13 Thread Jim Twensky
Hi, I'd like to move and copy files from one directory in HDFS to another one. I know there are methods in the Filesystem API that enable copying files between the local disk and HDFS, but I couldn't figure out how to do this between two paths both in HDFS. I think rename(Path src, Path dest) can

Tasktracker failing and getting black listed

2010-12-23 Thread Jim Twensky
Hi, I have a 16+1 node hadoop cluster where all tasktrackers (and datanodes) are connected to the same switch and share the exact same hardware and software configuration. When I run a hadoop job, one of the task trackers always produces one of these two errors ONLY during the reduce tasks and

Creating an RMI client inside a mapper

2010-04-29 Thread Jim Twensky
I'm trying to create an instance of an RMI client that queries a remote RMI server inside my Mapper class. My application runs smoothly without the RMI client. When I add: if (System.getSecurityManager() == null) { System.setSecurityManager(new SecurityManager()); } inside my Mapper's

Re: Questions on MultithreadedMapper

2010-04-28 Thread Jim Twensky
: Looking through MultithreadedMapRunner, map() seems to be the only method called by executorService:        MultithreadedMapRunner.this.mapper.map(key, value, output, reporter); On Tue, Apr 27, 2010 at 3:46 PM, Jim Twensky jim.twen...@gmail.com wrote: Hi, I've decided to refactor some of my

Questions on MultithreadedMapper

2010-04-27 Thread Jim Twensky
Hi, I've decided to refactor some of my Hadoop jobs and implement them using MultithreadedMapper.class but I got puzzled because of some unexpected error messages at run time. Here are some relevant settings regarding my Hadoop cluster: mapred.tasktracker.map.tasks.maximum = 1

Re: I want to group similar keys in the reducer.

2010-03-15 Thread Jim Twensky
Hi Raymond, Take a look at http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setGroupingComparatorClass(java.lang.Class). I think this is what you want. Also make sure to implement a custom partitioner that only takes into account the first part of the key,

Question on GroupingComparatorClass

2010-01-25 Thread Jim Twensky
Hi, I'm using a custom grouping comparator class to simulate a secondary sort on values, and I set it via Job.setGroupingComparatorClass (using Hadoop 0.20.x) inside my driver. I'm wondering if this class is also used when grouping the records in the combiner. Using a combiner greatly improves

Running Hadoop on demand

2009-12-28 Thread Jim Twensky
Hi, I'd like to get Hadoop running on a large University cluster which is used by many people to run different types of applications. We are currently using Torque to assign nodes and manage the queue. What I want to do is to enable people to request n processors, and automatically start Hadoop

Re: Running Hadoop on demand

2009-12-28 Thread Jim Twensky
://hadoop.apache.org/common/docs/r0.20.1/hod_user_guide.html On Mon, Dec 28, 2009 at 12:14 PM, Jim Twensky jim.twen...@gmail.com wrote: Hi, I'd like to get Hadoop running on a large University cluster which is used by many people to run different types of applications. We are currently using Torque

Creating a new configuration

2009-11-13 Thread Jim Twensky
The documentation on configuration states: Unless explicitly turned off, Hadoop by default specifies two resources, loaded in-order from the

Re: Does hadoop delete the intermediate data

2009-08-31 Thread Jim Twensky
Hi Jeff, The problem may also be related to the large log files if you use the cluster for too many jobs. Check out your hadoop log directory and see how big it is. You can decrease the maximum size of a log file using one of the hadoop configuration files under conf. Jim On Mon, Aug 31, 2009

Re: Datanode high memory usage

2009-08-31 Thread Jim Twensky
The maximum and minimum amount of memory to be used by the task trackers can be specified inside the configuration files under conf. For instance, in order to allocate a maximum of 512 MB, you need to set: property namemapred.child.java.opts/name value -Xmx512M /value /property Hope

Re: Improving import performance

2009-05-25 Thread Jim Twensky
are on a 2 core, you will probably have to set the CMS to incremental: -XX:+CMSIncrementalMode To prevent the CMS GC from starving out your main threads. Good luck with it! -ryan On Wed, Apr 29, 2009 at 3:33 PM, Jim Twensky jim.twen...@gmail.com wrote: Hi, I'm doing some

Re: Performance of hbase importing

2009-04-29 Thread Jim Twensky
Hi Ryan, Have you got your new hardware? I was keeping an eye on your blog for the past few days but I haven't seen any updates there so I just decided to ask you on the list. If you have some results, would you like to give us some numbers along with hardware details? Thanks, Jim On Thu, Jan

Improving import performance

2009-04-29 Thread Jim Twensky
Hi, I'm doing some experiments to import large datasets to Hbase using a Map job. Before posting some numbers, here is a summary of my test cluster: I have 7 regionservers and 1 master. I also run HDFS datanodes and Hadoop tasktrackers on the same 7 regionservers. Similarly, I run the Hadoop

Re: Are SequenceFiles split? If so, how?

2009-04-20 Thread Jim Twensky
In addition to what Aaron mentioned, you can configure the minimum split size in hadoop-site.xml to have smaller or larger input splits depending on your application. -Jim On Mon, Apr 20, 2009 at 12:18 AM, Aaron Kimball aa...@cloudera.com wrote: Yes, there can be more than one InputSplit per

Re: getting DiskErrorException during map

2009-04-16 Thread Jim Twensky
Hadoop would be writing to /tmp. Hope this helps! Alex On Wed, Apr 15, 2009 at 2:37 PM, Jim Twensky jim.twen...@gmail.com wrote: Alex, Yes, I bounced the Hadoop daemons after I changed the configuration files. I also tried setting $HADOOP_CONF_DIR to the directory where my

Re: Hadoop basic question

2009-04-16 Thread Jim Twensky
http://wiki.apache.org/hadoop/FAQ#7 On Thu, Apr 16, 2009 at 6:52 PM, Jae Joo jaejo...@gmail.com wrote: Will anyone guide me how to avoid the the single point failure of master node. This is what I know. If the master node is donw by some reason, the hadoop system is down and there is no way

Re: getting DiskErrorException during map

2009-04-15 Thread Jim Twensky
$HADOOP_CONF_DIR to the directory where hadoop-site.xml lives. For whatever reason your hadoop-site.xml (and the hadoop-default.xml you tried to change) are probably not being loaded. $HADOOP_CONF_DIR should fix this. Good luck! Alex On Mon, Apr 13, 2009 at 11:25 AM, Jim Twensky jim.twen

Re: Total number of records processed in mapper

2009-04-14 Thread Jim Twensky
Hi Andy, Take a look at this piece of code: Counters counters = job.getCounters(); counters.findCounter(org.apache.hadoop.mapred.Task$Counter, REDUCE_INPUT_RECORDS).getCounter() This is for reduce input records but I believe there is also a counter for reduce output records. You should dig into

Re: Map-Reduce Slow Down

2009-04-13 Thread Jim Twensky
Mithila, You said all the slaves were being utilized in the 3 node cluster. Which application did you run to test that and what was your input size? If you tried the word count application on a 516 MB input file on both cluster setups, than some of your nodes in the 15 node cluster may not be

Re: Grouping Values for Reducer Input

2009-04-13 Thread Jim Twensky
I'm not sure if this is exactly what you want but, can you emit map records as: cat, doc5 - 3 cat, doc1 - 1 cat, doc5 - 1 and so on.. This way, your reducers will get the intermediate key,value pairs as cat, doc5 - 3 cat, doc5 - 1 cat, doc1 - 1 then you can split the keys (cat, doc*)

Re: Grouping Values for Reducer Input

2009-04-13 Thread Jim Twensky
Oh, I forgot to tell that you should change your partitioner to send all the keys in the form of cat,* to the same reducer but it seems like Jeremy has been much faster than me :) -Jim On Mon, Apr 13, 2009 at 5:24 PM, Jim Twensky jim.twen...@gmail.com wrote: I'm not sure if this is exactly

Re: Map-Reduce Slow Down

2009-04-13 Thread Jim Twensky
. Mithila On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky jim.twen...@gmail.com wrote: Mithila, You said all the slaves were being utilized in the 3 node cluster. Which application did you run to test that and what was your input size? If you tried the word count application on a 516

getting DiskErrorException during map

2009-04-07 Thread Jim Twensky
Hi, I'm using Hadoop 0.19.1 and I have a very small test cluster with 9 nodes, 8 of them being task trackers. I'm getting the following error and my jobs keep failing when map processes start hitting 30%: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local

Re: Please help!

2009-03-31 Thread Jim Twensky
See the original Map Reduce paper by Google at http://labs.google.com/papers/mapreduce.html and please don't spam the list. -jim On Tue, Mar 31, 2009 at 6:15 PM, Hadooper kusanagiyang.had...@gmail.comwrote: Dear developers, Is there any detailed example of how Hadoop processes input?

Re: Using HDFS for common purpose

2009-01-27 Thread Jim Twensky
You may also want to have a look at this to reach a decision based on your needs: http://www.swaroopch.com/notes/Distributed_Storage_Systems Jim On Tue, Jan 27, 2009 at 1:22 PM, Jim Twensky jim.twen...@gmail.com wrote: Rasit, What kind of data will you be storing on Hbase or directly

Re: Suitable for Hadoop?

2009-01-21 Thread Jim Twensky
Ricky, Hadoop was formerly optimized for large files, usually files of size larger than one input split. However, there is an input format called MultiFileInputFormat which can be used to utilize Hadoop to work efficiently on smaller files. You can also set the isSplittable method of an input

Re: Indexed Hashtables

2009-01-15 Thread Jim Twensky
Delip, Why do you think Hbase will be an overkill? I do something similar to what you're trying to do with Hbase and I haven't encountered any significant problems so far. Can you give some more info on the size of the data you have? Jim On Wed, Jan 14, 2009 at 8:47 PM, Delip Rao

Re: Merging reducer outputs into a single part-00000 file

2009-01-14 Thread Jim Twensky
Owen and Rasit, Thank you for the responses. I've figured that mapred.reduce.tasks was set to 1 in my hadoop-default xml and I didn't overwrite it in my hadoop-site.xml configuration file. Jim On Wed, Jan 14, 2009 at 11:23 AM, Owen O'Malley omal...@apache.org wrote: On Jan 14, 2009, at 12:46

Re: regarding datamodel inside HBase

2009-01-13 Thread Jim Twensky
Shiraz, If you would like to read some more on what you can do with Hbase and compare it to an RDBMS you may also find this article helpful: http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable Jim On Tue, Jan 13, 2009 at 10:16 AM, Jean-Daniel Cryans

Merging reducer outputs into a single part-00000 file

2009-01-10 Thread Jim Twensky
Hello, The original map-reduce paper states: After successful completion, the output of the map-reduce execution is available in the R output files (one per reduce task, with file names as specified by the user). However, when using Hadoop's TextOutputFormat, all the reducer outputs are combined in

Re: Accessing rows with number indexes

2009-01-10 Thread Jim Twensky
. Keep track of the highest prefix and use that range to select a prefix randomly. Then start a scanner at that prefix ~Tim. 2009/1/10 Jim Twensky jim.twen...@gmail.com: Hello, I have an HBase table that contains sentences as row keys and a few numeric values as columns. A simple abstract

Re: Combiner run specification and questions

2009-01-02 Thread Jim Twensky
Hello Saptarshi, E.g if there are only 10 value corresponding to a key(as outputted by the mapper), will these 10 values go straight to the reducer or to the reducer via the combiner? It depends on whether or not you use the method JobConf.setCombinerClass() or not. If you don't, Hadoop does

Re: Shared thread safe variables?

2009-01-01 Thread Jim Twensky
grows really large more than makes up for it in the long run. - Aaron On Thu, Dec 25, 2008 at 2:22 AM, Jim Twensky jim.twen...@gmail.com wrote: Hello again, I think I found an answer to my question. If I write a new WritableComparable object that extends IntWritable and then overwrite

Re: Shared thread safe variables?

2008-12-25 Thread Jim Twensky
at each combiner/reducer. Jim On Wed, Dec 24, 2008 at 12:19 PM, Jim Twensky jim.twen...@gmail.com wrote: Hi Aaron, Thanks for the advice. I actually thought of using multiple combiners and a single reducer but I was worried about the key sorting phase to be a vaste for my purpose

Shared thread safe variables?

2008-12-24 Thread Jim Twensky
Hello, I was wondering if Hadoop provides thread safe shared variables that can be accessed from individual mappers/reducers along with a proper locking mechanism. To clarify things, let's say that in the word count example, I want to know the word that has the highest frequency and how many

Re: Shared thread safe variables?

2008-12-24 Thread Jim Twensky
. Cheers, - Aaron On Wed, Dec 24, 2008 at 3:28 AM, Jim Twensky jim.twen...@gmail.com wrote: Hello, I was wondering if Hadoop provides thread safe shared variables that can be accessed from individual mappers/reducers along with a proper locking mechanism. To clarify things

Re: Using Hbase as data sink

2008-12-24 Thread Jim Twensky
, this question is related to Hadoop rather than Hbase and sorry if I'm asking something too obvious but I usually check the API documentations and the tutorials before asking questions and I got stuck. Thanks, Jim On Tue, Dec 23, 2008 at 10:05 AM, stack st...@duboce.net wrote: Jim Twensky

Re: Predefined counters

2008-12-22 Thread Jim Twensky
/browse/HADOOP-4043 a while back to address the fact they are not public. Please consider voting for it if you think it would be useful. Cheers, Tom On Mon, Dec 22, 2008 at 2:47 AM, Jim Twensky jim.twen...@gmail.com wrote: Hello, I need to collect some statistics using some of the counters

Re: Using Hbase as data sink

2008-12-22 Thread Jim Twensky
of the class, initialize it in the job initialization, and just reuse the same one in each reducer task. JG -Original Message- From: Jim Twensky [mailto:jim.twen...@gmail.com] Sent: Monday, December 22, 2008 12:38 PM To: hbase-user@hadoop.apache.org Subject: Using Hbase as data

Re: Using Hbase as data sink

2008-12-22 Thread Jim Twensky
,args[1]); ... } Notice that I don't have access to the partitioner unlike the initTableReduceJob method. Is there a way to overcome this? Thanks Jim On Mon, Dec 22, 2008 at 3:43 PM, stack st...@duboce.net wrote: Jim Twensky wrote: Hello Jonathan, Thanks for the fast response. Yes, my question

Predefined counters

2008-12-21 Thread Jim Twensky
Hello, I need to collect some statistics using some of the counters defined by the Map/Reduce framework such as Reduce input records. I know I should use the getCounter method from Counters.Counter but I couldn't figure how to use it. Can someone give me a two line example of how to read the

Re: Can hadoop sort by values rather than keys?

2008-09-24 Thread Jim Twensky
Sorting according to keys is a requirement for the map/reduce algorithm. I'd suggest running a second map/reduce phase on the output files of your application and use the values as keys in that second phase. I know that will increase the running time, but this is how I do it when I need to get my

Re: debugging hadoop application!

2008-09-24 Thread Jim Twensky
As far as I know, there is a Hadoop plug-in for Eclipse but it is not possible to debug when running on a real cluster. If you want to add watches and expressions to trace your programs or profile your code, I'd suggest looking at the log files or use other tracing tools such as xtrace (

Re: installing hadoop on a OS X cluster

2008-09-10 Thread Jim Twensky
Apparently you have one node with 2 processors where each processor has 4 cores. What do you want to use Hadoop for? If you have a single disk drive and multiple cores on one node then pseudo distributed environment seems like the best approach to me as long as you are not dealing with large

Re: Hadoop Streaming and Multiline Input

2008-09-09 Thread Jim Twensky
If I understand your question correctly, you need to write your own FileInputFormat. Please see http://hadoop.apache.org/core/docs/r0.18.0/api/index.html for details. Regards, Tim On Sat, Sep 6, 2008 at 9:20 PM, Dennis Kubes [EMAIL PROTECTED] wrote: Is is possible to set a multiline text input

Question on Streaming

2008-09-09 Thread Jim Twensky
Hello, I need to use Hadoop Streaming to run several instances of a single program on different files. Before doing it, I wrote a simple test application as the mapper, which basically outputs the standard input without doing anything useful. So it looks like the following:

Different Map and Reduce output types - weird error message

2008-08-29 Thread Jim Twensky
Hello, I am working on a Hadoop application that produces different (key,value) types after the map and reduce phases so I'm aware that I need to use JobConf.setMapOutputKeyClass and JobConf.setMapOutputValueClass. However, I still keep getting the following runtime error when I run my

Re: Different Map and Reduce output types - weird error message

2008-08-29 Thread Jim Twensky
Here is the relevant part of my mapper: (...) private final static IntWritable one = new IntWritable(1); private IntWritable bound = new IntWritable(); (...) while(...) { output.collect(bound,one); } so I'm not sure why my mapper tries to output a

Re: Different Map and Reduce output types - weird error message

2008-08-29 Thread Jim Twensky
, which contradict with the specified Mapper output types. If I'm correct, am I supposed to write a separate reducer for the local combiner in order to speed things up? Jim On Fri, Aug 29, 2008 at 6:30 PM, Jim Twensky [EMAIL PROTECTED] wrote: Here is the relevant part of my mapper