Copying a file to specified nodes

2009-02-10 Thread Rasit OZDAS
Hi, We have thousands of files, each dedicated to a user. (Each user has access to other users' files, but they do this not very often.) Each user runs map-reduce jobs on the cluster. So we should seperate his/her files equally across the cluster, so that every machine can take part in the

Re: Re: Re: Re: Re: Regarding Hadoop multi cluster set-up

2009-02-10 Thread nitesh bhatia
in hadoop-site.xml change valuemaster:54311/value to valuehdfs://master:54311/value --nitesh On Tue, Feb 10, 2009 at 9:50 PM, shefali pawar shefal...@rediffmail.comwrote: I tried that, but it is not working either! Shefali On Sun, 08 Feb 2009 05:27:54 +0530 wrote I ran into this

Weird Results with Streaming

2009-02-10 Thread S D
[I'm starting a new thread because there is sufficiently new/weird info but I've attached an initial thread in case that might be useful.] I'm running Hadoop 0.19.0. I have an input file consisting of several lines that each contain a name and URL. The idea is pretty simple: download the contents

Loading native libraries

2009-02-10 Thread Mimi Sun
Hi, I'm new to Hadoop and I'm wondering what the recommended method is for using native libraries in mapred jobs. I've tried the following separately: 1. set LD_LIBRARY_PATH in .bashrc 2. set LD_LIBRARY_PATH and JAVA_LIBRARY_PATH in hadoop-env.sh 3. set -Djava.library.path=... for

Re: Loading native libraries

2009-02-10 Thread Arun C Murthy
On Feb 10, 2009, at 11:06 AM, Mimi Sun wrote: Hi, I'm new to Hadoop and I'm wondering what the recommended method is for using native libraries in mapred jobs. I've tried the following separately: 1. set LD_LIBRARY_PATH in .bashrc 2. set LD_LIBRARY_PATH and JAVA_LIBRARY_PATH in

anybody knows an apache-license-compatible impl of Integer.parseInt?

2009-02-10 Thread Zheng Shao
We need to implement a version of Integer.parseInt/atoi from byte[] instead of String to avoid the high cost of creating a String object. I wanted to take the open jdk code but the license is GPL: http://www.docjar.com/html/api/java/lang/Integer.java.html Does anybody know an implementation

Best practices on spliltting an input line?

2009-02-10 Thread Andy Sautins
I have question. I've dabbled with different ways of tokenizing an input file line for processing. I've noticed in my somewhat limited tests that there seem to be some pretty reasonable performance differences between different tokenizing methods. For example, roughly it seems to split a

Re: Loading native libraries

2009-02-10 Thread Mimi Sun
I see UnsatisfiedLinkError. Also I'm calling System.getProperty(java.library.path) in the reducer and logging it. The only thing that prints out is ...hadoop-0.18.2/bin/../lib/native/ Mac_OS_X-i386-32 I'm using Cascading, not sure if that affects anything. - Mimi On Feb 10, 2009, at

Re: Copying a file to specified nodes

2009-02-10 Thread Jeff Hammerbacher
Hey Rasit, I'm not sure I fully understand your description of the problem, but you might want to check out the JIRA ticket for making the replica placement algorithms in HDFS pluggable (https://issues.apache.org/jira/browse/HADOOP-3799) and add your use case there. Regards, Jeff On Tue, Feb

Re: what's going on :( ?

2009-02-10 Thread Jeff Hammerbacher
Hey Mark, In NameNode.java, the DEFAULT_PORT specified for NameNode RPC is 8020. From my understanding of the code, your fs.default.name setting should have overridden this port to be 9000. It appears your Hadoop installation has not picked up the configuration settings appropriately. You might

File Transfer Rates

2009-02-10 Thread Wasim Bari
Hi, Could someone help me to find some real Figures (transfer rate) about Hadoop File transfer from local filesystem to HDFS, S3 etc and among Storage Systems (HDFS to S3 etc) Thanks, Wasim

Re: File Transfer Rates

2009-02-10 Thread Brian Bockelman
On Feb 10, 2009, at 4:10 PM, Wasim Bari wrote: Hi, Could someone help me to find some real Figures (transfer rate) about Hadoop File transfer from local filesystem to HDFS, S3 etc and among Storage Systems (HDFS to S3 etc) Thanks, Wasim What are you looking for? Maximum possible

Re: File Transfer Rates

2009-02-10 Thread Mark Kerzner
Brian, I have a similar question: why does transfer from a local filesystem to SequenceFile takes so long (about 1 second per Meg)? Thank you, Mark On Tue, Feb 10, 2009 at 4:46 PM, Brian Bockelman bbock...@cse.unl.eduwrote: On Feb 10, 2009, at 4:10 PM, Wasim Bari wrote: Hi, Could someone

Re: File Transfer Rates

2009-02-10 Thread Brian Bockelman
On Feb 10, 2009, at 4:53 PM, Mark Kerzner wrote: Brian, I have a similar question: why does transfer from a local filesystem to SequenceFile takes so long (about 1 second per Meg)? Hey Mark, I saw your question about speed the other day ... unfortunately, I didn't have any specific

Re: File Transfer Rates

2009-02-10 Thread Amit Chandel
With my setup. I have been able to get 10MBps write speed, 40MBps read speed while writing multiple files (ranging a few Bytes to 100MB) into SequenceFiles, and reading them back. The cluster has 1Gbps backbone. On Tue, Feb 10, 2009 at 5:53 PM, Mark Kerzner markkerz...@gmail.com wrote: Brian,

Re: Reporter for Hadoop Streaming?

2009-02-10 Thread scruffy323
Do you know how to access those counters programmatically after the job has run? S D-5 wrote: This does it. Thanks! On Thu, Feb 5, 2009 at 9:14 PM, Arun C Murthy a...@yahoo-inc.com wrote: On Feb 5, 2009, at 1:40 PM, S D wrote: Is there a way to use the Reporter interface (or

Is there a way to tell whether you're in a map task or a reduce task?

2009-02-10 Thread Matei Zaharia
I'd like to write a combiner that shares a lot of code with a reducer, except that the reducer updates an external database at the end. As far as I can tell, since both combiners and reducers must implement the Reducer interface, there is no way to have this be the same class. Is there a

Re: anybody knows an apache-license-compatible impl of Integer.parseInt?

2009-02-10 Thread Min Zhou
Hey zheng, Maybe you can try ragel ,which can compile very effective codes for your fsm from regex. The atoi function produced by ragel can run faster than glibc's. It also targets java. http://www.complang.org/ragel/ On Wed, Feb 11, 2009 at 4:18 AM, Zheng Shao zs...@facebook.com wrote: We

Testing with Distributed Cache

2009-02-10 Thread Nathan Marz
I have some unit tests which run MapReduce jobs and test the inputs/ outputs in standalone mode. I recently started using DistributedCache in one of these jobs, but now my tests fail with errors such as: Caused by: java.io.IOException: Incomplete HDFS URI, no host: hdfs:/// tmp/file.data at

Re: Is there a way to tell whether you're in a map task or a reduce task?

2009-02-10 Thread Owen O'Malley
On Feb 10, 2009, at 5:20 PM, Matei Zaharia wrote: I'd like to write a combiner that shares a lot of code with a reducer, except that the reducer updates an external database at the end. The right way to do this is to either do the update in the output format or do something like: class

stable version

2009-02-10 Thread Vadim Zaliva
Hi! Kind of novice question, but I need to know, what Hadoop version is considered stable. I was trying to run version 0.19, and I've seen numerous stability issues with it. Maybe version 0.18 is better suited for production environment? Vadim

Re: Testing with Distributed Cache

2009-02-10 Thread Amareshwari Sriramadasu
Nathan Marz wrote: I have some unit tests which run MapReduce jobs and test the inputs/outputs in standalone mode. I recently started using DistributedCache in one of these jobs, but now my tests fail with errors such as: Caused by: java.io.IOException: Incomplete HDFS URI, no host:

could this be an error in hadoop documentation of a bug

2009-02-10 Thread Mark Kerzner
Hi, the Quick Starthttp://hadoop.apache.org/core/docs/current/quickstart.htmlhas this sample configuration namefs.default.name/name valuehdfs://localhost:9000/value but it does not seem to work: even though the daemons do listen to 9000, the following command always uses 8020 hadoop fs

Re: File Transfer Rates

2009-02-10 Thread Mark Kerzner
Brian, large files using command-line hadoop go fast, so it is something about my computer or network. I won't worry about this now, especially in light of Amit reporting fast writes and reads. Mark On Tue, Feb 10, 2009 at 5:00 PM, Brian Bockelman bbock...@cse.unl.eduwrote: On Feb 10, 2009,

Re: File Transfer Rates

2009-02-10 Thread Brian Bockelman
On Feb 10, 2009, at 11:09 PM, Mark Kerzner wrote: Brian, large files using command-line hadoop go fast, so it is something about my computer or network. I won't worry about this now, especially in light of Amit reporting fast writes and reads. You're creating files using SequenceFile,

Re: File Transfer Rates

2009-02-10 Thread Mark Kerzner
Brian, I saw that Stuart herehttp://stuartsierra.com/2008/04/24/a-million-little-filesmentions slow writes to SequenceFile. If so, I will either use his tar approach or try to parallelize it if I can. On Tue, Feb 10, 2009 at 11:14 PM, Brian Bockelman bbock...@cse.unl.eduwrote: On Feb 10, 2009,

Re: File Transfer Rates

2009-02-10 Thread Brian Bockelman
Just to toss out some numbers (and because our users are making interesting numbers right now) Here's our external network router: http://mrtg.unl.edu/~cricket/?target=%2Frouter-interfaces%2Fborder2%2Ftengigabitethernet2_2;view=Octets Here's the application-level transfer graph:

Re: File Transfer Rates

2009-02-10 Thread Mark Kerzner
I say, that's very interesting and useful. On Tue, Feb 10, 2009 at 11:37 PM, Brian Bockelman bbock...@cse.unl.eduwrote: Just to toss out some numbers (and because our users are making interesting numbers right now) Here's our external network router:

Re: java.io.IOException: Could not get block locations. Aborting...

2009-02-10 Thread Wu Wei
We got the same problem as you when using MultipleOutputFormat both on hadoop 0.18 and 0.19. On hadoop 0.18, increasing the xceivers count does not fix the problem. But we found many error message complaining that xceiverCount exceeded the limit of concurrent xcievers in datanode (running on