Re: string conversion problems

2010-07-16 Thread Nikolay Korovaiko
Hi guys! Thank you very much for the help! Ive actually tried the both: "\\t" and "\\s+", but neither of them has worked... Even though ("") might not be working for some other cases, however this splits keys and values correctly for this particular one... I've also set my delimiter to a comma

Re: string conversion problems

2010-07-16 Thread cvkkumar
Hi, You could also try String [] tokens = line.split("\\s+"); Even this is by just eyeballing... Do let us know. Regards, CVK On Jul 16, 2010, at 1:33 PM, Jeff Bean wrote: Whitespace characters are funny. You showed me this code in the mapper: String [] tokens = line.split(""); Which does

Re: Killed : GC overhead limit exceeded

2010-07-16 Thread Ted Yu
Have you tried increasing memory beyond 1GB for your map task ? I think you have noticed that both OOME came from Pattern.compile(). Please take a look at http://www.docjar.com/html/api/java/lang/String.java.html I would suggest pre-compiling the three patterns when setting up your mapper - basi

Re: Alternatives to start-all.sh stop-all.sh func

2010-07-16 Thread Edson Ramiro
Hi Edward, If you're looking for a good tool to manage your nodes you should take a look at sdi [1] [1] http://sdi.sourceforge.net/ -- Edson Ramiro Lucas Filho {skype, twitter, gtalk}: erlfilho http://www.inf.ufpr.br/erlf07/ On 16 July 2010 13:35, Edward Capriolo wrote: > I remember when I w

Re: string conversion problems

2010-07-16 Thread Jeff Bean
Whitespace characters are funny. You showed me this code in the mapper: String [] tokens = line.split(""); Which doesn't actually match for tab, which would be line.split("\t"); This would still execute, but you'd have keys and values that look right going into the reducer, but you might not

Re: Single Node with multiple mappers?

2010-07-16 Thread Moritz Krog
Hey :) thanks for the quick response. My Systems runs on an i7 together with about 8GB of RAM. The problem with my setup is, that I'm using Hadoop to pump 40GB of JSON encoded data hashes into a MySQL database. The data is in non-relational form and needs to be normalized before it can enter the D

Re: string conversion problems

2010-07-16 Thread Nikolay Korovaiko
First, thank you very much for the reply! so, this is my input: a\tb b\tc c\ta In other words, a map function initially receives the whole string a\tb as its value. And it processes my input data correctly. I actually changed my reduce function to simply emit merged pairs from a map's input for

Re: How to patch Hadoop 0.20.2 with symbolic links patch

2010-07-16 Thread Yujun Wu
Hello Eli, Thanks a lot for your info. I will try with the 0.21 release then. Regards, Yujun On Fri, 16 Jul 2010, Eli Collins wrote: > Hey Yujun, > > Symbolic links involves a number of patches. These patches have a > number of dependencies on code in trunk (eg FileContext) so applying > to 20

Re: How to patch Hadoop 0.20.2 with symbolic links patch

2010-07-16 Thread Eli Collins
Hey Yujun, Symbolic links involves a number of patches. These patches have a number of dependencies on code in trunk (eg FileContext) so applying to 20.2 would be a lot of work. Symbolic links are in the first release candidate of the 21 release so probably best to check it out if you need symlin

Preferred Java version

2010-07-16 Thread Raymond Jennings III
Is 1.6.0_17 or 1.6.0_20 preferred as the JRE for hadoop? Thank you.

Alternatives to start-all.sh stop-all.sh func

2010-07-16 Thread Edward Capriolo
I remember when I was first setting up a hadoop cluster wondering exactly what the SSH-KEYs did and why and if they were needed. start-all.sh and stop-all.sh are good for what they do but they are not very sophisticated. I wrote a blog about using func with hadoop to remotely manage your nodes. h

Re: string conversion problems

2010-07-16 Thread Jeff Bean
Is the tab the delimiter between records or between keys and values on the input? in other words does the input file look like this: a\tb b\tc c\ta or does it look like this: a b\tb c\tc a\t ? Jeff On Thu, Jul 15, 2010 at 6:18 PM, Nikolay Korovaiko wrote: > Hi everyone, > > I hope thi

Changing Hadoop Niceness

2010-07-16 Thread Matt Pouttu-Clarke
We have a staging environment where the NameNode shares a machine with a DataNode and TaskTracker. Can anyone suggest a way to set different hadoop-env.sh values for DataNode and TaskTracker without having to duplicate the whole Hadoop conf directory? For example, to set a different HADOOP_NICENES

How to patch Hadoop 0.20.2 with symbolic links patch

2010-07-16 Thread Yujun Wu
Hello, I am new to hadoop. Recently, I installed Hadoop 0.20.2 and it works. I tried to patch it with the symbolic links patch by Eli (Mr. Eli Collins): https://issues.apache.org/jira/browse/HDFS-245 (symlink41-hdfs.patch) I always got the error with missing Hdfs.java. This is what I did: >cd

Re: Hadoop Training

2010-07-16 Thread Ken Krugler
Hi Mark - thanks for the kind words. For those starting out with Hadoop, there are 10 spots left, this coming Thursday & Friday (July 22nd & 23rd) See http://bit.ly/hadoop-bootcamp for details, and http://bit.ly/bootcamp-outline for an outline. Thanks, -- Ken On Jul 9, 2010, at 8:10am,

RE: Killed : GC overhead limit exceeded

2010-07-16 Thread Some Body
I tried again and connected to my task tracker via JMX but I still dont see what's wrong. Here's the log, it was spilling records then ran out of memory? 2010-07-16 05:27:04,295 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 2010-07-16 05:27:04,295 INFO org.apache

Killed : GC overhead limit exceeded

2010-07-16 Thread Some Body
I'm seeing this error in my tasktracker's log. FATAL org.apache.hadoop.mapred.TaskTracker: Task: attempt_201007160344_0001_m_05_1 - Killed : GC overhead limit exceed more detail from my task's log states: FATAL org.apache.hadoop.mapred.TaskTracker: Error running child : java.l

Re: Single Node with multiple mappers?

2010-07-16 Thread Asif Jan
how is your data being spilt ? using mapred.map.tasks property should let you specify how many maps you would want to run (provided your input file is big enough to be spilt into multiple chunks) asif On Jul 16, 2010, at 11:03 AM, Moritz Krog wrote: Hi everyone, I was curious if there

RE: Single Node with multiple mappers?

2010-07-16 Thread Michael Segel
Moritz, I'm not sure what you're doing, but raising the number of mapers in your configuration isn't a 'hint'. The number of mapers that you can run will depend on your configuration. You mention an i7 which is a quad core cpu, but you don't mention the amount of memory you have available, o

Single Node with multiple mappers?

2010-07-16 Thread Moritz Krog
Hi everyone, I was curious if there is any option to use Hadoop in single node mode in a way, that enables the process to use more system ressources. Right now, Hadoop uses one mapper and one reducer, leaving my i7 with about 20% CPU usage (1 core for Hadoop, .5 cores for my OS) basically idling.

Problem with DistributedCache after upgrading to CDH3b2

2010-07-16 Thread Jamie Cockrill
Dear All, We recently upgraded from CDH3b1 to b2 and ever since, all our mapreduce jobs that use the DistributedCache have failed. Typically, we add files to the cache prior to job startup, using addCacheFile(URI, conf) and then get them on the other side, using getLocalCacheFiles(conf). I believe

Re: how to do a reduce-only job

2010-07-16 Thread Asif Jan
you need to join these files into 1; you could ether do a map-side join, or reduce-side join for map-side join (slightly more involved) look at example: org.apache.hadoop.examples.Join for reduce side join simply create 2 mappers (one for each file) and one reduce (as long as you keep key-