sometime working, sometime failing?
also, can you clear you tmp directory and make sure you have enough space
it it before you retry?
JM
2013/5/27 Jim Twensky jim.twen...@gmail.com
Hi Jean,
I switched to Oracle JDK 1.6 as you suggested and ran a job successfully
this afternoon which lasted
2013/5/24 Jim Twensky jim.twen...@gmail.com
Hi again, in addition to my previous post, I was able to get some error
logs from the task tracker/data node this morning and looks like it might
be a jetty issue:
2013-05-23 19:59:20,595 WARN org.apache.hadoop.mapred.TaskLog: Failed to
retrieve
/browse/MAPREDUCE-2389If
so, how do I downgrade my jetty version? Should I just replace the
jetty
jar file in the lib directory with an earlier version and restart my
cluster?
Thank you.
On Thu, May 23, 2013 at 7:14 PM, Jim Twensky jim.twen...@gmail.com wrote:
Hello, I have a 20 node Hadoop
Hello, I have a 20 node Hadoop cluster where each node has 8GB memory and
an 8-core processor. I sometimes get the following error on a random basis:
---
Exception in thread main
better to look at map/reduce input/output
bytes form of counters instead.
On Tue, May 14, 2013 at 10:41 PM, Jim Twensky jim.twen...@gmail.com
wrote:
I have an iterative MapReduce job that I run over 35 GB of data
repeatedly.
The output of the first job is the input to the second one
I have an iterative MapReduce job that I run over 35 GB of data repeatedly.
The output of the first job is the input to the second one and it goes on
like that until convergence.
I am seeing a strange behavior with the program run time. The first
iteration takes 4 minutes to run and here is how
://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
Regards
Bertrand
On Sun, May 12, 2013 at 8:24 PM, Jim Twensky jim.twen...@gmail.comwrote:
I have large java.util.BitSet objects that I want to bitwise-OR using a
MapReduce job. I decided to wrap around each object
://hama.apache.org
2. http://wiki.apache.org/hama/Benchmarks
On Sat, Oct 6, 2012 at 1:31 AM, Jim Twensky jim.twen...@gmail.com wrote:
Hi,
I have a complex Hadoop job that iterates over large graph data
multiple times until some convergence condition is met. I know that
the map output goes
Hi,
I have a complex Hadoop job that iterates over large graph data
multiple times until some convergence condition is met. I know that
the map output goes to the local disk of each particular mapper first,
and then fetched by the reducers before the reduce tasks start. I can
see that this is an
, Oct 5, 2012 at 10:01 PM, Jim Twensky jim.twen...@gmail.com wrote:
Hi,
I have a complex Hadoop job that iterates over large graph data
multiple times until some convergence condition is met. I know that
the map output goes to the local disk of each particular mapper first,
and then fetched
, but not the shuffle? Or am
I wrong?
On Fri, Oct 5, 2012 at 11:13 PM, Jim Twensky jim.twen...@gmail.com wrote:
Hi Harsh,
Yes, there is actually a hidden map stage, that generates new
key,value pairs based on the last reduce output but I can create
those records during the reduce step instead
Hi,
I'd like to move and copy files from one directory in HDFS to another
one. I know there are methods in the Filesystem API that enable
copying files between the local disk and HDFS, but I couldn't figure
out how to do this between two paths both in HDFS. I think rename(Path
src, Path dest) can
Hi,
I have a 16+1 node hadoop cluster where all tasktrackers (and
datanodes) are connected to the same switch and share the exact same
hardware and software configuration. When I run a hadoop job, one of
the task trackers always produces one of these two errors ONLY during
the reduce tasks and
I'm trying to create an instance of an RMI client that queries a
remote RMI server inside my Mapper class. My application runs smoothly
without the RMI client. When I add:
if (System.getSecurityManager() == null) {
System.setSecurityManager(new SecurityManager());
}
inside my Mapper's
:
Looking through MultithreadedMapRunner, map() seems to be the only method
called by executorService:
MultithreadedMapRunner.this.mapper.map(key, value, output,
reporter);
On Tue, Apr 27, 2010 at 3:46 PM, Jim Twensky jim.twen...@gmail.com wrote:
Hi,
I've decided to refactor some of my
Hi,
I've decided to refactor some of my Hadoop jobs and implement them
using MultithreadedMapper.class but I got puzzled because of some
unexpected error messages at run time.
Here are some relevant settings regarding my Hadoop cluster:
mapred.tasktracker.map.tasks.maximum = 1
Hi Raymond,
Take a look at
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setGroupingComparatorClass(java.lang.Class).
I think this is what you want. Also make sure to implement a custom
partitioner that only takes into account the first part of the key,
Hi,
I'm using a custom grouping comparator class to simulate a secondary
sort on values, and I set it via Job.setGroupingComparatorClass (using
Hadoop 0.20.x) inside my driver. I'm wondering if this class is also
used when grouping the records in the combiner.
Using a combiner greatly improves
Hi,
I'd like to get Hadoop running on a large University cluster which is
used by many people to run different types of applications. We are
currently using Torque to assign nodes and manage the queue. What I
want to do is to enable people to request n processors, and
automatically start Hadoop
://hadoop.apache.org/common/docs/r0.20.1/hod_user_guide.html
On Mon, Dec 28, 2009 at 12:14 PM, Jim Twensky jim.twen...@gmail.com wrote:
Hi,
I'd like to get Hadoop running on a large University cluster which is
used by many people to run different types of applications. We are
currently using Torque
The documentation on configuration states:
Unless explicitly turned off, Hadoop by default specifies two
resources, loaded in-order from the
Hi Jeff,
The problem may also be related to the large log files if you use the
cluster for too many jobs. Check out your hadoop log directory and see
how big it is. You can decrease the maximum size of a log file using
one of the hadoop configuration files under conf.
Jim
On Mon, Aug 31, 2009
The maximum and minimum amount of memory to be used by the task
trackers can be specified inside the configuration files under conf.
For instance, in order to allocate a maximum of 512 MB, you need to
set:
property
namemapred.child.java.opts/name
value
-Xmx512M
/value
/property
Hope
are on a 2 core, you will probably have to set the CMS to
incremental:
-XX:+CMSIncrementalMode
To prevent the CMS GC from starving out your main threads.
Good luck with it!
-ryan
On Wed, Apr 29, 2009 at 3:33 PM, Jim Twensky jim.twen...@gmail.com
wrote:
Hi,
I'm doing some
Hi Ryan,
Have you got your new hardware? I was keeping an eye on your blog for the
past few days but I haven't seen any updates there so I just decided to ask
you on the list. If you have some results, would you like to give us some
numbers along with hardware details?
Thanks,
Jim
On Thu, Jan
Hi,
I'm doing some experiments to import large datasets to Hbase using a Map
job. Before posting some numbers, here is a summary of my test cluster:
I have 7 regionservers and 1 master. I also run HDFS datanodes and Hadoop
tasktrackers on the same 7 regionservers. Similarly, I run the Hadoop
In addition to what Aaron mentioned, you can configure the minimum split
size in hadoop-site.xml to have smaller or larger input splits depending on
your application.
-Jim
On Mon, Apr 20, 2009 at 12:18 AM, Aaron Kimball aa...@cloudera.com wrote:
Yes, there can be more than one InputSplit per
Hadoop would be
writing to /tmp.
Hope this helps!
Alex
On Wed, Apr 15, 2009 at 2:37 PM, Jim Twensky jim.twen...@gmail.com
wrote:
Alex,
Yes, I bounced the Hadoop daemons after I changed the configuration
files.
I also tried setting $HADOOP_CONF_DIR to the directory where my
http://wiki.apache.org/hadoop/FAQ#7
On Thu, Apr 16, 2009 at 6:52 PM, Jae Joo jaejo...@gmail.com wrote:
Will anyone guide me how to avoid the the single point failure of master
node.
This is what I know. If the master node is donw by some reason, the hadoop
system is down and there is no way
$HADOOP_CONF_DIR to the directory where hadoop-site.xml lives. For
whatever reason your hadoop-site.xml (and the hadoop-default.xml you tried
to change) are probably not being loaded. $HADOOP_CONF_DIR should fix
this.
Good luck!
Alex
On Mon, Apr 13, 2009 at 11:25 AM, Jim Twensky jim.twen
Hi Andy,
Take a look at this piece of code:
Counters counters = job.getCounters();
counters.findCounter(org.apache.hadoop.mapred.Task$Counter,
REDUCE_INPUT_RECORDS).getCounter()
This is for reduce input records but I believe there is also a counter for
reduce output records. You should dig into
Mithila,
You said all the slaves were being utilized in the 3 node cluster. Which
application did you run to test that and what was your input size? If you
tried the word count application on a 516 MB input file on both cluster
setups, than some of your nodes in the 15 node cluster may not be
I'm not sure if this is exactly what you want but, can you emit map records
as:
cat, doc5 - 3
cat, doc1 - 1
cat, doc5 - 1
and so on..
This way, your reducers will get the intermediate key,value pairs as
cat, doc5 - 3
cat, doc5 - 1
cat, doc1 - 1
then you can split the keys (cat, doc*)
Oh, I forgot to tell that you should change your partitioner to send all the
keys in the form of cat,* to the same reducer but it seems like Jeremy has
been much faster than me :)
-Jim
On Mon, Apr 13, 2009 at 5:24 PM, Jim Twensky jim.twen...@gmail.com wrote:
I'm not sure if this is exactly
.
Mithila
On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky jim.twen...@gmail.com
wrote:
Mithila,
You said all the slaves were being utilized in the 3 node cluster. Which
application did you run to test that and what was your input size? If you
tried the word count application on a 516
Hi,
I'm using Hadoop 0.19.1 and I have a very small test cluster with 9 nodes, 8
of them being task trackers. I'm getting the following error and my jobs
keep failing when map processes start hitting 30%:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
valid local
See the original Map Reduce paper by Google at
http://labs.google.com/papers/mapreduce.html and please don't spam the list.
-jim
On Tue, Mar 31, 2009 at 6:15 PM, Hadooper kusanagiyang.had...@gmail.comwrote:
Dear developers,
Is there any detailed example of how Hadoop processes input?
You may also want to have a look at this to reach a decision based on your
needs:
http://www.swaroopch.com/notes/Distributed_Storage_Systems
Jim
On Tue, Jan 27, 2009 at 1:22 PM, Jim Twensky jim.twen...@gmail.com wrote:
Rasit,
What kind of data will you be storing on Hbase or directly
Ricky,
Hadoop was formerly optimized for large files, usually files of size larger
than one input split. However, there is an input format called
MultiFileInputFormat which can be used to utilize Hadoop to work efficiently
on smaller files. You can also set the isSplittable method of an input
Delip,
Why do you think Hbase will be an overkill? I do something similar to what
you're trying to do with Hbase and I haven't encountered any significant
problems so far. Can you give some more info on the size of the data you
have?
Jim
On Wed, Jan 14, 2009 at 8:47 PM, Delip Rao
Owen and Rasit,
Thank you for the responses. I've figured that mapred.reduce.tasks was set
to 1 in my hadoop-default xml and I didn't overwrite it in my
hadoop-site.xml configuration file.
Jim
On Wed, Jan 14, 2009 at 11:23 AM, Owen O'Malley omal...@apache.org wrote:
On Jan 14, 2009, at 12:46
Shiraz,
If you would like to read some more on what you can do with Hbase and
compare it to an RDBMS you may also find this article helpful:
http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable
Jim
On Tue, Jan 13, 2009 at 10:16 AM, Jean-Daniel Cryans
Hello,
The original map-reduce paper states: After successful completion, the
output of the map-reduce execution is available in the R output files (one
per reduce task, with file names as specified by the user). However, when
using Hadoop's TextOutputFormat, all the reducer outputs are combined in
. Keep track of
the highest prefix and use that range to select a prefix randomly.
Then start a scanner at that prefix
~Tim.
2009/1/10 Jim Twensky jim.twen...@gmail.com:
Hello,
I have an HBase table that contains sentences as row keys and a few
numeric
values as columns. A simple abstract
Hello Saptarshi,
E.g if there are only 10 value corresponding
to a key(as outputted by the mapper), will these 10 values go straight
to the reducer or to the reducer via the combiner?
It depends on whether or not you use the method JobConf.setCombinerClass()
or not. If you don't, Hadoop does
grows really large more than makes up for it in the long run.
- Aaron
On Thu, Dec 25, 2008 at 2:22 AM, Jim Twensky jim.twen...@gmail.com
wrote:
Hello again,
I think I found an answer to my question. If I write a new
WritableComparable object that extends IntWritable and then overwrite
at each combiner/reducer.
Jim
On Wed, Dec 24, 2008 at 12:19 PM, Jim Twensky jim.twen...@gmail.com wrote:
Hi Aaron,
Thanks for the advice. I actually thought of using multiple combiners and a
single reducer but I was worried about the key sorting phase to be a vaste
for my purpose
Hello,
I was wondering if Hadoop provides thread safe shared variables that can be
accessed from individual mappers/reducers along with a proper locking
mechanism. To clarify things, let's say that in the word count example, I
want to know the word that has the highest frequency and how many
.
Cheers,
- Aaron
On Wed, Dec 24, 2008 at 3:28 AM, Jim Twensky jim.twen...@gmail.com
wrote:
Hello,
I was wondering if Hadoop provides thread safe shared variables that can
be
accessed from individual mappers/reducers along with a proper locking
mechanism. To clarify things
, this question is related to
Hadoop rather than Hbase and sorry if I'm asking something too obvious but I
usually check the API documentations and the tutorials before asking
questions and I got stuck.
Thanks,
Jim
On Tue, Dec 23, 2008 at 10:05 AM, stack st...@duboce.net wrote:
Jim Twensky
/browse/HADOOP-4043 a while back
to address the fact they are not public. Please consider voting for it
if you think it would be useful.
Cheers,
Tom
On Mon, Dec 22, 2008 at 2:47 AM, Jim Twensky jim.twen...@gmail.com
wrote:
Hello,
I need to collect some statistics using some of the counters
of the class, initialize it in the job initialization,
and just reuse the same one in each reducer task.
JG
-Original Message-
From: Jim Twensky [mailto:jim.twen...@gmail.com]
Sent: Monday, December 22, 2008 12:38 PM
To: hbase-user@hadoop.apache.org
Subject: Using Hbase as data
,args[1]);
...
}
Notice that I don't have access to the partitioner unlike the
initTableReduceJob method. Is there a way to overcome this?
Thanks
Jim
On Mon, Dec 22, 2008 at 3:43 PM, stack st...@duboce.net wrote:
Jim Twensky wrote:
Hello Jonathan,
Thanks for the fast response. Yes, my question
Hello,
I need to collect some statistics using some of the counters defined by the
Map/Reduce framework such as Reduce input records. I know I should use
the getCounter method from Counters.Counter but I couldn't figure how to
use it. Can someone give me a two line example of how to read the
Sorting according to keys is a requirement for the map/reduce algorithm. I'd
suggest running a second map/reduce phase on the output files of your
application and use the values as keys in that second phase. I know that
will increase the running time, but this is how I do it when I need to get
my
As far as I know, there is a Hadoop plug-in for Eclipse but it is not
possible to debug when running on a real cluster. If you want to add watches
and expressions to trace your programs or profile your code, I'd suggest
looking at the log files or use other tracing tools such as xtrace (
Apparently you have one node with 2 processors where each processor has 4
cores. What do you want to use Hadoop for? If you have a single disk drive
and multiple cores on one node then pseudo distributed environment seems
like the best approach to me as long as you are not dealing with large
If I understand your question correctly, you need to write your own
FileInputFormat. Please see
http://hadoop.apache.org/core/docs/r0.18.0/api/index.html for details.
Regards,
Tim
On Sat, Sep 6, 2008 at 9:20 PM, Dennis Kubes [EMAIL PROTECTED] wrote:
Is is possible to set a multiline text input
Hello, I need to use Hadoop Streaming to run several instances of a single
program on different files. Before doing it, I wrote a simple test
application as the mapper, which basically outputs the standard input
without doing anything useful. So it looks like the following:
Hello, I am working on a Hadoop application that produces different
(key,value) types after the map and reduce phases so I'm aware that I need
to use JobConf.setMapOutputKeyClass and JobConf.setMapOutputValueClass.
However, I still keep getting the following runtime error when I run my
Here is the relevant part of my mapper:
(...)
private final static IntWritable one = new IntWritable(1);
private IntWritable bound = new IntWritable();
(...)
while(...) {
output.collect(bound,one);
}
so I'm not sure why my mapper tries to output a
, which contradict with the
specified Mapper output types. If I'm correct, am I supposed to write a
separate reducer for the local combiner in order to speed things up?
Jim
On Fri, Aug 29, 2008 at 6:30 PM, Jim Twensky [EMAIL PROTECTED] wrote:
Here is the relevant part of my mapper
62 matches
Mail list logo