Hi Tom,
Not creating a temp file is the ideal as it saves you from having to "waste"
using the local hard disk by writing an output file just before uploading same
to Amazon S3. There are a few problems though:
1) Amazon S3 PUTs need the file length up front. You could use a chunked POST,
but
Stefan, there was a nasty memory leak in in 1.6.x before 1.6 10. It
manifested itself during major GC. We saw this on linux and solaris
and dramatically improved with an upgrade.
C
On May 8, 2009, at 6:12 PM, Stefan Will wrote:
Hi,
I just ran into something rather scary: One of my datano
Hi all!
I cannot start hdfs successful. I checked log file and found following
message:
2009-05-09 08:17:55,026 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG:
/
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = haris1.asnet.local/
Hi,
I just ran into something rather scary: One of my datanode processes that
I¹m running with Xmx256M, and a maximum number of Xceiver threads of 4095
had a virtual memory size of over 7GB (!). I know that the VM size on Linux
isn¹t necessarily equal to the actual memory used, but I wouldn¹t exp
Sorry, I misspell you name, Jason
George
georgep wrote:
>
> Hi Joe,
>
> Thank you for the reply, but do I need to include every supporting jar
> file to the application path? What is the --?
>
> George
>
>
> jason hadoop wrote:
>>
>> 1) when running under windows, include the cygwin bin di
Hi Joe,
Thank you for the reply, but do I need to include every supporting jar file
to the application path? What is the --?
George
jason hadoop wrote:
>
> 1) when running under windows, include the cygwin bin directory in your
> windows path environment variable
> 2) eclipse is not so good a
On Fri, May 8, 2009 at 1:11 PM, Ken Krugler wrote:
> You an set the mapred.child.java.opts on a per job basis
>> either via -D mapred.child.java.ops="java options" or via
>> conf.set("mapred.child.java.opts", "java options").
>>
>> Note: the conf.set must be done before the job is submitted.
>>
>
Most of the people with this need are using some variant of memcached, or
other distributed hash table.
On Fri, May 8, 2009 at 10:07 AM, Joe wrote:
>
> Hi,
> As a newcomer to Hadoop, I wonder any efficient way to support shared
> content among all mappers. For example, to implement an neural net
i.e. found a company that provides DFS/MapReduce service?
http://www.cloudera.com/
Also:
http://aws.amazon.com/elasticmapreduce/
and
http://stampedehost.com
(any body used these guys? any feedback?)
Also ScaleUnlimited provides consulting/training for hadoop services:
http://www.scaleunlimi
Hi All,
I've been testing hdfs with 3 datanodes cluster, and I've noticed that if I
stopped 1 datanode I still can read all the files, but "hadoop dfs
-copyFromLocal" command fails. In the namenode web interface I can see that
it still thinks that datanode is alive and basically detects that it's
You an set the mapred.child.java.opts on a per job basis
either via -D mapred.child.java.ops="java options" or via
conf.set("mapred.child.java.opts", "java options").
Note: the conf.set must be done before the job is submitted.
On Fri, May 8, 2009 at 11:57 AM, Philip Zeyliger wrote:
You could
How do people handle logging in a Hadoop stream job?
I'm currently looking at using syslog for this but would like to know of
other ways
people are doing this currently.
thanks
1) when running under windows, include the cygwin bin directory in your
windows path environment variable
2) eclipse is not so good at submitting supporting jar files, in your
application lauch path add a -libjars path/hadoop--examples.jar.
On Fri, May 8, 2009 at 10:13 AM, georgep wrote:
>
> Whe
You an set the mapred.child.java.opts on a per job basis
either via -D mapred.child.java.ops="java options" or via
conf.set("mapred.child.java.opts", "java options").
Note: the conf.set must be done before the job is submitted.
On Fri, May 8, 2009 at 11:57 AM, Philip Zeyliger wrote:
> You could
Is it possible that two tasks are trying to write to the same file path?
On Fri, May 8, 2009 at 11:46 AM, Sasha Dolgy wrote:
> Hi Tom (or anyone else),
> Will SequenceFile allow me to avoid problems with concurrent writes to the
> file? I stll continue to get the following exceptions/errors in
i am a big fan of happy, developed by freebase.com
http://code.google.com/p/happy/
http://research.freebase.com/
jeff
Aditya Desai wrote:
Hi All,
Is there any way that I can access the hadoop API through python. I am aware
that hadoop streaming can be used to create a mapper and reducer in
Hello,
We have installed Hadoop 0.19.1 on an 8 node cluster for an initial test. We
are planning to store and process data of approx. 400 TB with Hadoop and
HBase.
The Hadoop NameNode is on one of the machines, called s3, and all the nodes
are DataNodes (s2, s3, s4, s5, s6, s7, s8, s9). All th
You could add "-Xss" to the "mapred.child.java.opts" configuration
setting. That's controlling the Java stack size, which I think is the
relevant bit for you.
Cheers,
-- Philip
mapred.child.java.opts
-Xmx200m
Java opts for the task tracker child processes.
The following symbol, if pre
Hi Tom (or anyone else),
Will SequenceFile allow me to avoid problems with concurrent writes to the
file? I stll continue to get the following exceptions/errors in hdfs:
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:
failed to create file /foo
Hi all,
Is there a specific reason why calling Reporter.incrCounter() with a
negative amount would fail? I see that the javadocs
(http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/Reporter.html)
say the amount must be non-negative, but in looking at the code I
don't see
Hi there,
For a very specific type of reduce task, we currently need to use a
large number of threads.
To avoid running out of memory, I'd like to constrain the Linux stack
size via a "ulimit -s xxx" shell script command before starting up
the JVM. I could do this for the entire system at bo
Hi Todd,
Sorry, my response got hung up in my outbox for a couple of days.. arghh
Confirmed that 1) we are not running out of space and 2) that our
mapred.local.dir directory is not in /tmp
Not sure if this an ec2 problem with a mounted drive?
We had the same thing happen again, exact same
When run as a java application, the trace:
09/05/08 10:08:49 WARN fs.FileSystem: uri=file:///
javax.security.auth.login.LoginException: Login failed: Cannot run program
"whoami": CreateProcess error=2, ¨t²??¨ì«ü©
at
org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupIn
Hi,
As a newcomer to Hadoop, I wonder any efficient way to support shared content
among all mappers. For example, to implement an neural network algorithm, I
want the NN data structure accessible by all mappers.
Thanks for your comments!
- Joe
Grace wrote:
Thanks all for your replying.
I have run several times with different Java options for Map/Reduce
tasks. However there is no much difference.
Following is the example of my test setting:
Test A: -Xmx1024m -server -XXlazyUnlocking -XlargePages
-XgcPrio:deterministic -XXallocPrefetch
Perhaps we should revisit the implementation of NativeS3FileSystem so
that it doesn't always buffer the file on the client. We could have an
option to make it write directly to S3. Thoughts?
Regarding the problem with HADOOP-3733, you can work around it by
setting fs.s3.awsAccessKeyId and fs.s3.a
Trace:
Exception in thread "main" java.lang.ClassNotFoundException:
mapreduce.test.WordCount
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188
http://www.cloudera.com/
On Fri, May 8, 2009 at 9:43 AM, PORTO aLET wrote:
> Hi All,
>
> Just wondering if anybody has any idea about making money from using
> hadoop?
> i.e. found a company that provides DFS/MapReduce service ? or something
> like
> that?
> Or maybe something else?
>
Great ! Glad to see things are merging ... At that point, PIG and Hive are
even more competitive to each other.
Rgds, Ricky
-Original Message-
From: Ashish Thusoo [mailto:athu...@facebook.com]
Sent: Thursday, May 07, 2009 11:11 AM
To: core-user@hadoop.apache.org
Subject: RE: PIG and H
Hi All,
Just wondering if anybody has any idea about making money from using hadoop?
i.e. found a company that provides DFS/MapReduce service ? or something like
that?
Or maybe something else?
I had the same problem as well.
Luckily I found another method:
This tutorial works.
http://v-lad.org/Tutorials/Hadoop/00%20-%20Intro.html
> Dear Users,
>
> I configure Eclipse Europa according to Yahoo tutorial on hadoop:
> http://public.yahoo.com/gogate/hadoop-tutorial/html/module3.html
>
> an
Hi Kevin,
The s3n filesystem treats each file as a single block, however you may
be able to split files by setting the number of mappers appropriately
(or setting mapred.max.split.size in the new MapReduce API in 0.20.0).
S3 supports range requests, and the s3n implementation uses them, so
it woul
Perhaps we should revisit the implementation of NativeS3FileSystem so
that it doesn't always buffer the file on the client. We could have an
option to make it write directly to S3. Thoughts?
Regarding the problem with HADOOP-3733, you can work around it by
setting fs.s3.awsAccessKeyId and fs.s3.aw
> mapred.reduce.tasks 1
You've only got one reduce task, as Jason correctly surmised. Try
setting it using
-D mapred.reduce.tasks=2
when you run your job, or by calling JobConf#setNumReduceTasks()
Tom
On Fri, May 8, 2009 at 7:46 AM, Foss User wrote:
> On Thu, May 7, 2009 at 9:45 PM, jason
Currently, we are running our cluster in EC2 with HDFS stored on the local
(i.e. transient) disk. We don't want to deal with EBS, because it
complicates being able to spin up additional slaves as needed. We're looking
at moving to a combination of s3 (block) or s3n for data that we care about,
and
Can you post the entire error trace please?
On Fri, May 8, 2009 at 9:40 AM, George Pang wrote:
> Dear users,
> I got "ClassNotFoundException" when run the WordCount example on hadoop
> using Eclipse. Does anyone know where is the problem?
>
> Thank you!
>
> George
>
Dear users,
I got "ClassNotFoundException" when run the WordCount example on hadoop
using Eclipse. Does anyone know where is the problem?
Thank you!
George
37 matches
Mail list logo