Hi guys! Thank you very much for the help!
Ive actually tried the both: "\\t" and "\\s+", but neither of them has
worked...
Even though ("") might not be working for some other cases, however
this splits keys and values correctly for this particular one...
I've also set my delimiter to a comma
Hi,
You could also try
String [] tokens = line.split("\\s+");
Even this is by just eyeballing... Do let us know.
Regards,
CVK
On Jul 16, 2010, at 1:33 PM, Jeff Bean wrote:
Whitespace characters are funny. You showed me this code in the mapper:
String [] tokens = line.split("");
Which does
Have you tried increasing memory beyond 1GB for your map task ?
I think you have noticed that both OOME came from Pattern.compile().
Please take a look at
http://www.docjar.com/html/api/java/lang/String.java.html
I would suggest pre-compiling the three patterns when setting up your mapper
- basi
Hi Edward,
If you're looking for a good tool to manage your nodes
you should take a look at sdi [1]
[1] http://sdi.sourceforge.net/
--
Edson Ramiro Lucas Filho
{skype, twitter, gtalk}: erlfilho
http://www.inf.ufpr.br/erlf07/
On 16 July 2010 13:35, Edward Capriolo wrote:
> I remember when I w
Whitespace characters are funny. You showed me this code in the mapper:
String [] tokens = line.split("");
Which doesn't actually match for tab, which would be line.split("\t");
This would still execute, but you'd have keys and values that look right
going into the reducer, but you might not
Hey :)
thanks for the quick response. My Systems runs on an i7 together with
about 8GB of RAM. The problem with my setup is, that I'm using Hadoop
to pump 40GB of JSON encoded data hashes into a MySQL database. The
data is in non-relational form and needs to be normalized before it
can enter the D
First, thank you very much for the reply!
so, this is my input:
a\tb
b\tc
c\ta
In other words, a map function initially receives the whole string a\tb as
its value.
And it processes my input data correctly. I actually changed my reduce
function to simply emit merged pairs from a map's input for
Hello Eli,
Thanks a lot for your info. I will try with the 0.21 release then.
Regards,
Yujun
On Fri, 16 Jul 2010, Eli Collins wrote:
> Hey Yujun,
>
> Symbolic links involves a number of patches. These patches have a
> number of dependencies on code in trunk (eg FileContext) so applying
> to 20
Hey Yujun,
Symbolic links involves a number of patches. These patches have a
number of dependencies on code in trunk (eg FileContext) so applying
to 20.2 would be a lot of work. Symbolic links are in the first
release candidate of the 21 release so probably best to check it out
if you need symlin
Is 1.6.0_17 or 1.6.0_20 preferred as the JRE for hadoop? Thank you.
I remember when I was first setting up a hadoop cluster wondering
exactly what the SSH-KEYs did and why and if they were needed.
start-all.sh and stop-all.sh are good for what they do but they are
not very sophisticated.
I wrote a blog about using func with hadoop to remotely manage your nodes.
h
Is the tab the delimiter between records or between keys and values on the
input?
in other words does the input file look like this:
a\tb
b\tc
c\ta
or does it look like this:
a b\tb c\tc a\t
?
Jeff
On Thu, Jul 15, 2010 at 6:18 PM, Nikolay Korovaiko wrote:
> Hi everyone,
>
> I hope thi
We have a staging environment where the NameNode shares a machine with a
DataNode and TaskTracker.
Can anyone suggest a way to set different hadoop-env.sh values for DataNode
and TaskTracker without having to duplicate the whole Hadoop conf directory?
For example, to set a different HADOOP_NICENES
Hello,
I am new to hadoop. Recently, I installed Hadoop 0.20.2 and it works. I
tried to patch it with the symbolic links patch by Eli (Mr. Eli Collins):
https://issues.apache.org/jira/browse/HDFS-245
(symlink41-hdfs.patch)
I always got the error with missing Hdfs.java. This is what I did:
>cd
Hi Mark - thanks for the kind words.
For those starting out with Hadoop, there are 10 spots left, this
coming Thursday & Friday (July 22nd & 23rd)
See http://bit.ly/hadoop-bootcamp for details, and http://bit.ly/bootcamp-outline
for an outline.
Thanks,
-- Ken
On Jul 9, 2010, at 8:10am,
I tried again and connected to my task tracker via JMX but I still dont see
what's wrong.
Here's the log, it was spilling records then ran out of memory?
2010-07-16 05:27:04,295 INFO org.apache.hadoop.mapred.MapTask: Spilling map
output: buffer full= true
2010-07-16 05:27:04,295 INFO org.apache
I'm seeing this error in my tasktracker's log.
FATAL org.apache.hadoop.mapred.TaskTracker:
Task: attempt_201007160344_0001_m_05_1
- Killed : GC overhead limit exceed
more detail from my task's log states:
FATAL org.apache.hadoop.mapred.TaskTracker:
Error running child : java.l
how is your data being spilt ?
using mapred.map.tasks property should let you specify how many
maps you would want to run (provided your input file is big enough to
be spilt into multiple chunks)
asif
On Jul 16, 2010, at 11:03 AM, Moritz Krog wrote:
Hi everyone,
I was curious if there
Moritz,
I'm not sure what you're doing, but raising the number of mapers in your
configuration isn't a 'hint'.
The number of mapers that you can run will depend on your configuration. You
mention an i7 which is a quad core cpu, but you don't mention the amount of
memory you have available, o
Hi everyone,
I was curious if there is any option to use Hadoop in single node mode
in a way, that enables the process to use more system ressources.
Right now, Hadoop uses one mapper and one reducer, leaving my i7 with
about 20% CPU usage (1 core for Hadoop, .5 cores for my OS) basically
idling.
Dear All,
We recently upgraded from CDH3b1 to b2 and ever since, all our
mapreduce jobs that use the DistributedCache have failed. Typically,
we add files to the cache prior to job startup, using
addCacheFile(URI, conf) and then get them on the other side, using
getLocalCacheFiles(conf). I believe
you need to join these files into 1; you could ether do a map-side
join, or reduce-side join
for map-side join (slightly more involved) look at example:
org.apache.hadoop.examples.Join
for reduce side join simply create 2 mappers (one for each file) and
one reduce (as long as you keep key-
22 matches
Mail list logo