Re: stable version

2009-02-11 Thread Rasit OZDAS
Yes, version 18.3 is the most stable one. It has added patches, without not-proven new functionality. 2009/2/11 Owen O'Malley omal...@apache.org: On Feb 10, 2009, at 7:21 PM, Vadim Zaliva wrote: Maybe version 0.18 is better suited for production environment? Yahoo is mostly on 0.18.3 +

Re: Reporter for Hadoop Streaming?

2009-02-11 Thread Tom White
You can retrieve them from the command line using bin/hadoop job -counter job-id group-name counter-name Tom On Wed, Feb 11, 2009 at 12:20 AM, scruffy323 steve.mo...@gmail.com wrote: Do you know how to access those counters programmatically after the job has run? S D-5 wrote: This does

Re: anybody knows an apache-license-compatible impl of Integer.parseInt?

2009-02-11 Thread Steve Loughran
Zheng Shao wrote: We need to implement a version of Integer.parseInt/atoi from byte[] instead of String to avoid the high cost of creating a String object. I wanted to take the open jdk code but the license is GPL: http://www.docjar.com/html/api/java/lang/Integer.java.html Does anybody know

Re: File Transfer Rates

2009-02-11 Thread Steve Loughran
Brian Bockelman wrote: Just to toss out some numbers (and because our users are making interesting numbers right now) Here's our external network router: http://mrtg.unl.edu/~cricket/?target=%2Frouter-interfaces%2Fborder2%2Ftengigabitethernet2_2;view=Octets Here's the

Hadoop setup questions

2009-02-11 Thread bjday
Good morning everyone, I have a question about correct setup for hadoop. I have 14 Dell computers in a lab. Each connected to the internet and each independent of each other. All run CentOS. Logins are handled by NIS. If userA logs into the master and starts the daemons and UserB logs

Re: stable version

2009-02-11 Thread Vadim Zaliva
The particular problem I am having is this one: https://issues.apache.org/jira/browse/HADOOP-2669 I am observing it in version 19. Could anybody confirm that it have been fixed in 18, as Jira claims? I am wondering why bug fix for this problem might have been committed to 18 branch but not 19.

Finding small subset in very large dataset

2009-02-11 Thread Thibaut_
Hi, Let's say the smaller subset has name A. It is a relatively small collection 100 000 entries (could also be only 100), with nearly no payload as value. Collection B is a big collection with 10 000 000 entries (Each key of A also exists in the collection B), where the value for each key is

Re: stable version

2009-02-11 Thread Raghu Angadi
Vadim Zaliva wrote: The particular problem I am having is this one: https://issues.apache.org/jira/browse/HADOOP-2669 I am observing it in version 19. Could anybody confirm that it have been fixed in 18, as Jira claims? I am wondering why bug fix for this problem might have been committed to

Re: Finding small subset in very large dataset

2009-02-11 Thread Amit Chandel
Are the keys in collection B unique? If so, I would like to try this approach: For each key, value of collection B, make a file out of it with file name given by MD5 hash of the key, and value being its content, and then store all these files into a HAR archive. The HAR archive will create an

can't edit the file that mounted by fuse_dfs by editor

2009-02-11 Thread zhuweimin
Hey all I was trying to edit the file that mounted by fuse_dfs by vi editor, but the contents could not save. The command is like the following: [had...@vm-centos-5-shu-4 src]$ vi /mnt/dfs/test.txt The error message from system log (/var/log/messages) is the following: Feb 12 09:53:48

Re: Finding small subset in very large dataset

2009-02-11 Thread Aaron Kimball
I don't see why a HAR archive needs to be involved. You can use a MapFile to create a scannable index over a SequenceFile and do lookups that way. But if A is small enough to fit in RAM, then there is a much simpler way: Write it out to a file and disseminate to all mappers via the

Reducer Out of Memory

2009-02-11 Thread Kris Jirapinyo
Hi all, I am running a data-intensive job on 18 nodes on EC2, each with just 1.7GB of memory. The input size is 50GB, and as a result, my mapper splits it up automatically to 786 map tasks. This runs fine. However, I am setting the reduce task number to 18. This is where I get a java heap

Re: Reducer Out of Memory

2009-02-11 Thread Rocks Lei Wang
Maybe you need allocate larger vm- memory to use parameter -Xmx1024m On Thu, Feb 12, 2009 at 10:56 AM, Kris Jirapinyo kjirapi...@biz360.comwrote: Hi all, I am running a data-intensive job on 18 nodes on EC2, each with just 1.7GB of memory. The input size is 50GB, and as a result, my

Re: Reducer Out of Memory

2009-02-11 Thread Kris Jirapinyo
Darn that send button. Anyways, so I was wondering if my understanding is correct. There will only be the exact same number of output files as the number of reducer tasks I set. Thus, in my output directory from the reducer, I should always see only 18 files. However, if my understanding is

Re: Reducer Out of Memory

2009-02-11 Thread Kris Jirapinyo
I tried that, but with 1.7GB, that will not allow me to run 1 mapper and 1 reducer concurrently (as I think when you do -Xmx1024m it tries to reserve that physical memory?). Thus, to be safe, I set it to -Xmx768m. The error I get when I do 1024m is this: java.io.IOException: Cannot run program

Re: Hadoop setup questions

2009-02-11 Thread Amar Kamat
bjday wrote: Good morning everyone, I have a question about correct setup for hadoop. I have 14 Dell computers in a lab. Each connected to the internet and each independent of each other. All run CentOS. Logins are handled by NIS. If userA logs into the master and starts the daemons

Re: Hadoop setup questions

2009-02-11 Thread james warren
Like Amar said. Try adding property namedfs.permissions/name valuefalse/value /property to your conf/hadoop-site.xml file (or flip the value in hadoop-default.xml), restart your daemons and give it a whirl. cheers, -jw On Wed, Feb 11, 2009 at 8:44 PM, Amar Kamat ama...@yahoo-inc.com wrote:

Re: Loading native libraries

2009-02-11 Thread Rasit OZDAS
I have also the same problem. It would be wonderful if someone has some info about this.. Rasit 2009/2/10 Mimi Sun m...@rapleaf.com: I see UnsatisfiedLinkError.  Also I'm calling  System.getProperty(java.library.path) in the reducer and logging it. The only thing that prints out is

Re: Loading native libraries

2009-02-11 Thread Arun C Murthy
On Feb 10, 2009, at 12:24 PM, Mimi Sun wrote: I see UnsatisfiedLinkError. Also I'm calling System.getProperty(java.library.path) in the reducer and logging it. The only thing that prints out is ...hadoop-0.18.2/bin/../lib/ native/Mac_OS_X-i386-32 I'm using Cascading, not sure if that

Re: what's going on :( ?

2009-02-11 Thread Rasit OZDAS
Hi, Mark Try to add an extra property to that file, and try to examine if hadoop recognizes it. This way you can find out if hadoop uses your configuration file. 2009/2/10 Jeff Hammerbacher ham...@cloudera.com: Hey Mark, In NameNode.java, the DEFAULT_PORT specified for NameNode RPC is 8020.