RE: HDFS to S3 copy problems

2009-05-08 Thread Nowland, Ian
Hi Tom, Not creating a temp file is the ideal as it saves you from having to "waste" using the local hard disk by writing an output file just before uploading same to Amazon S3. There are a few problems though: 1) Amazon S3 PUTs need the file length up front. You could use a chunked POST, but

Re: Huge DataNode Virtual Memory Usage

2009-05-08 Thread Chris Collins
Stefan, there was a nasty memory leak in in 1.6.x before 1.6 10. It manifested itself during major GC. We saw this on linux and solaris and dramatically improved with an upgrade. C On May 8, 2009, at 6:12 PM, Stefan Will wrote: Hi, I just ran into something rather scary: One of my datano

Error when start hadoop cluster.

2009-05-08 Thread nguyenhuynh.mr
Hi all! I cannot start hdfs successful. I checked log file and found following message: 2009-05-09 08:17:55,026 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = haris1.asnet.local/

Huge DataNode Virtual Memory Usage

2009-05-08 Thread Stefan Will
Hi, I just ran into something rather scary: One of my datanode processes that I¹m running with ­Xmx256M, and a maximum number of Xceiver threads of 4095 had a virtual memory size of over 7GB (!). I know that the VM size on Linux isn¹t necessarily equal to the actual memory used, but I wouldn¹t exp

Re: ClassNotFoundException

2009-05-08 Thread georgep
Sorry, I misspell you name, Jason George georgep wrote: > > Hi Joe, > > Thank you for the reply, but do I need to include every supporting jar > file to the application path? What is the --? > > George > > > jason hadoop wrote: >> >> 1) when running under windows, include the cygwin bin di

Re: ClassNotFoundException

2009-05-08 Thread georgep
Hi Joe, Thank you for the reply, but do I need to include every supporting jar file to the application path? What is the --? George jason hadoop wrote: > > 1) when running under windows, include the cygwin bin directory in your > windows path environment variable > 2) eclipse is not so good a

Re: Setting thread stack size for child JVM

2009-05-08 Thread Philip Zeyliger
On Fri, May 8, 2009 at 1:11 PM, Ken Krugler wrote: > You an set the mapred.child.java.opts on a per job basis >> either via -D mapred.child.java.ops="java options" or via >> conf.set("mapred.child.java.opts", "java options"). >> >> Note: the conf.set must be done before the job is submitted. >> >

Re: Most efficient way to support shared content among all mappers

2009-05-08 Thread jason hadoop
Most of the people with this need are using some variant of memcached, or other distributed hash table. On Fri, May 8, 2009 at 10:07 AM, Joe wrote: > > Hi, > As a newcomer to Hadoop, I wonder any efficient way to support shared > content among all mappers. For example, to implement an neural net

Re: Make money from Hadoop ?

2009-05-08 Thread Amr Awadallah
i.e. found a company that provides DFS/MapReduce service? http://www.cloudera.com/ Also: http://aws.amazon.com/elasticmapreduce/ and http://stampedehost.com (any body used these guys? any feedback?) Also ScaleUnlimited provides consulting/training for hadoop services: http://www.scaleunlimi

Shorten interval between datanode going down and being detected as dead by namenode?

2009-05-08 Thread nesvarbu No
Hi All, I've been testing hdfs with 3 datanodes cluster, and I've noticed that if I stopped 1 datanode I still can read all the files, but "hadoop dfs -copyFromLocal" command fails. In the namenode web interface I can see that it still thinks that datanode is alive and basically detects that it's

Re: Setting thread stack size for child JVM

2009-05-08 Thread Ken Krugler
You an set the mapred.child.java.opts on a per job basis either via -D mapred.child.java.ops="java options" or via conf.set("mapred.child.java.opts", "java options"). Note: the conf.set must be done before the job is submitted. On Fri, May 8, 2009 at 11:57 AM, Philip Zeyliger wrote: You could

Logging in Hadoop Stream jobs

2009-05-08 Thread Mayuran Yogarajah
How do people handle logging in a Hadoop stream job? I'm currently looking at using syslog for this but would like to know of other ways people are doing this currently. thanks

Re: ClassNotFoundException

2009-05-08 Thread jason hadoop
1) when running under windows, include the cygwin bin directory in your windows path environment variable 2) eclipse is not so good at submitting supporting jar files, in your application lauch path add a -libjars path/hadoop--examples.jar. On Fri, May 8, 2009 at 10:13 AM, georgep wrote: > > Whe

Re: Setting thread stack size for child JVM

2009-05-08 Thread jason hadoop
You an set the mapred.child.java.opts on a per job basis either via -D mapred.child.java.ops="java options" or via conf.set("mapred.child.java.opts", "java options"). Note: the conf.set must be done before the job is submitted. On Fri, May 8, 2009 at 11:57 AM, Philip Zeyliger wrote: > You could

Re: large files vs many files

2009-05-08 Thread jason hadoop
Is it possible that two tasks are trying to write to the same file path? On Fri, May 8, 2009 at 11:46 AM, Sasha Dolgy wrote: > Hi Tom (or anyone else), > Will SequenceFile allow me to avoid problems with concurrent writes to the > file? I stll continue to get the following exceptions/errors in

Re: Using Hadoop API through python

2009-05-08 Thread Jeff Turner
i am a big fan of happy, developed by freebase.com http://code.google.com/p/happy/ http://research.freebase.com/ jeff Aditya Desai wrote: Hi All, Is there any way that I can access the hadoop API through python. I am aware that hadoop streaming can be used to create a mapper and reducer in

Hadoop 0.19.1 with -d64 option on Solaris 5.10 and java6 doesn't start with exception "java.io.IOException: Invalid argument"

2009-05-08 Thread Alexandra Alecu
Hello, We have installed Hadoop 0.19.1 on an 8 node cluster for an initial test. We are planning to store and process data of approx. 400 TB with Hadoop and HBase. The Hadoop NameNode is on one of the machines, called s3, and all the nodes are DataNodes (s2, s3, s4, s5, s6, s7, s8, s9). All th

Re: Setting thread stack size for child JVM

2009-05-08 Thread Philip Zeyliger
You could add "-Xss" to the "mapred.child.java.opts" configuration setting. That's controlling the Java stack size, which I think is the relevant bit for you. Cheers, -- Philip mapred.child.java.opts -Xmx200m Java opts for the task tracker child processes. The following symbol, if pre

Re: large files vs many files

2009-05-08 Thread Sasha Dolgy
Hi Tom (or anyone else), Will SequenceFile allow me to avoid problems with concurrent writes to the file? I stll continue to get the following exceptions/errors in hdfs: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /foo

Decrementing counters

2009-05-08 Thread Ken Krugler
Hi all, Is there a specific reason why calling Reporter.incrCounter() with a negative amount would fail? I see that the javadocs (http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/Reporter.html) say the amount must be non-negative, but in looking at the code I don't see

Setting thread stack size for child JVM

2009-05-08 Thread Ken Krugler
Hi there, For a very specific type of reduce task, we currently need to use a large number of threads. To avoid running out of memory, I'd like to constrain the Linux stack size via a "ulimit -s xxx" shell script command before starting up the JVM. I could do this for the entire system at bo

Re: Infinite Loop Resending status from task tracker

2009-05-08 Thread Lance Riedel
Hi Todd, Sorry, my response got hung up in my outbox for a couple of days.. arghh Confirmed that 1) we are not running out of space and 2) that our mapred.local.dir directory is not in /tmp Not sure if this an ec2 problem with a mounted drive? We had the same thing happen again, exact same

Re: ClassNotFoundException

2009-05-08 Thread georgep
When run as a java application, the trace: 09/05/08 10:08:49 WARN fs.FileSystem: uri=file:/// javax.security.auth.login.LoginException: Login failed: Cannot run program "whoami": CreateProcess error=2, ¨t²??¨ì«ü© at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupIn

Most efficient way to support shared content among all mappers

2009-05-08 Thread Joe
Hi, As a newcomer to Hadoop, I wonder any efficient way to support shared content among all mappers. For example, to implement an neural network algorithm, I want the NN data structure accessible by all mappers. Thanks for your comments! - Joe

Re: Is there any performance issue with Jrockit JVM for Hadoop

2009-05-08 Thread Steve Loughran
Grace wrote: Thanks all for your replying. I have run several times with different Java options for Map/Reduce tasks. However there is no much difference. Following is the example of my test setting: Test A: -Xmx1024m -server -XXlazyUnlocking -XlargePages -XgcPrio:deterministic -XXallocPrefetch

Re: HDFS to S3 copy problems

2009-05-08 Thread Ken Krugler
Perhaps we should revisit the implementation of NativeS3FileSystem so that it doesn't always buffer the file on the client. We could have an option to make it write directly to S3. Thoughts? Regarding the problem with HADOOP-3733, you can work around it by setting fs.s3.awsAccessKeyId and fs.s3.a

Re: ClassNotFoundException

2009-05-08 Thread georgep
Trace: Exception in thread "main" java.lang.ClassNotFoundException: mapreduce.test.WordCount at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188

Re: Make money from Hadoop ?

2009-05-08 Thread Andy Liu
http://www.cloudera.com/ On Fri, May 8, 2009 at 9:43 AM, PORTO aLET wrote: > Hi All, > > Just wondering if anybody has any idea about making money from using > hadoop? > i.e. found a company that provides DFS/MapReduce service ? or something > like > that? > Or maybe something else? >

RE: PIG and Hive

2009-05-08 Thread Ricky Ho
Great ! Glad to see things are merging ... At that point, PIG and Hive are even more competitive to each other. Rgds, Ricky -Original Message- From: Ashish Thusoo [mailto:athu...@facebook.com] Sent: Thursday, May 07, 2009 11:11 AM To: core-user@hadoop.apache.org Subject: RE: PIG and H

Make money from Hadoop ?

2009-05-08 Thread PORTO aLET
Hi All, Just wondering if anybody has any idea about making money from using hadoop? i.e. found a company that provides DFS/MapReduce service ? or something like that? Or maybe something else?

Re: On usig Eclipse IDE

2009-05-08 Thread Reza
I had the same problem as well. Luckily I found another method: This tutorial works. http://v-lad.org/Tutorials/Hadoop/00%20-%20Intro.html > Dear Users, > > I configure Eclipse Europa according to Yahoo tutorial on hadoop: > http://public.yahoo.com/gogate/hadoop-tutorial/html/module3.html > > an

Re: Mixing s3, s3n and hdfs

2009-05-08 Thread Tom White
Hi Kevin, The s3n filesystem treats each file as a single block, however you may be able to split files by setting the number of mappers appropriately (or setting mapred.max.split.size in the new MapReduce API in 0.20.0). S3 supports range requests, and the s3n implementation uses them, so it woul

Re: HDFS to S3 copy problems

2009-05-08 Thread Tom White
Perhaps we should revisit the implementation of NativeS3FileSystem so that it doesn't always buffer the file on the client. We could have an option to make it write directly to S3. Thoughts? Regarding the problem with HADOOP-3733, you can work around it by setting fs.s3.awsAccessKeyId and fs.s3.aw

Re: All keys went to single reducer in WordCount program

2009-05-08 Thread Tom White
> mapred.reduce.tasks 1 You've only got one reduce task, as Jason correctly surmised. Try setting it using -D mapred.reduce.tasks=2 when you run your job, or by calling JobConf#setNumReduceTasks() Tom On Fri, May 8, 2009 at 7:46 AM, Foss User wrote: > On Thu, May 7, 2009 at 9:45 PM, jason

Mixing s3, s3n and hdfs

2009-05-08 Thread Kevin Peterson
Currently, we are running our cluster in EC2 with HDFS stored on the local (i.e. transient) disk. We don't want to deal with EBS, because it complicates being able to spin up additional slaves as needed. We're looking at moving to a combination of s3 (block) or s3n for data that we care about, and

Re: ClassNotFoundException

2009-05-08 Thread tim robertson
Can you post the entire error trace please? On Fri, May 8, 2009 at 9:40 AM, George Pang wrote: > Dear  users, > I got "ClassNotFoundException" when run the WordCount example on hadoop > using Eclipse.  Does anyone know where is the problem? > > Thank you! > > George >

ClassNotFoundException

2009-05-08 Thread George Pang
Dear users, I got "ClassNotFoundException" when run the WordCount example on hadoop using Eclipse. Does anyone know where is the problem? Thank you! George