Re: JNI and calling Hadoop jar files

2009-03-23 Thread Jeff Eastman
This looks somewhat similar to my Subtle Classloader Issue from yesterday. I'll be watching this thread too. Jeff Saptarshi Guha wrote: Hello, I'm using some JNI interfaces, via a R. My classpath contains all the jar files in $HADOOP_HOME and $HADOOP_HOME/lib My class is public

Subtle Classloader Issue

2009-03-22 Thread Jeff Eastman
I'm trying to run the Dirichlet clustering example from (http://cwiki.apache.org/MAHOUT/syntheticcontroldata.html). The command line: $HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.1.job org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job ... loads our

Re: RecordReader design heuristic

2009-03-18 Thread Jeff Eastman
Hi Josh, It seemed like you had a conceptual wire crossed and I'm glad to help out. The neat thing about Hadoop mappers is - since they are given a replicated HDFS block to munch on - the job scheduler has replication factor number of node choices where it can run each mapper. This means

Re: RecordReader design heuristic

2009-03-17 Thread Jeff Eastman
If you send a single point to the mapper, your mapper logic will be clean and simple. Otherwise you will need to loop over your block of points in the mapper. In Mahout clustering, I send the mapper individual points because the input file is point-per-line. In either case, the record reader

Re: RecordReader design heuristic

2009-03-17 Thread Jeff Eastman
Message- From: Jeff Eastman [mailto:j...@windwardsolutions.com] Sent: Tuesday, March 17, 2009 5:11 PM To: core-user@hadoop.apache.org Subject: Re: RecordReader design heuristic If you send a single point to the mapper, your mapper logic will be clean and simple. Otherwise you will need

Re: Hadoop 0.17 AMI?

2008-05-22 Thread Jeff Eastman
, 0.17.0 was released yesterday, from what I can tell. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Jeff Eastman [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Wednesday, May 21, 2008 11:18:56 AM Subject: Re: Hadoop 0.17 AMI? Any

Users Group Meeting Slides

2008-05-22 Thread Jeff Eastman
I uploaded the slides from my Mahout overview to our wiki (http://cwiki.apache.org/confluence/display/MAHOUT/FAQ) along with another recent talk by Isabel Drost. Both are similar in content but their differences reflect the rapid evolution of the project in the month that separates them in

Re: Users Group Meeting Slides

2008-05-22 Thread Jeff Eastman
find the code? Thanks! Tanton On Thu, May 22, 2008 at 11:36 AM, Jeff Eastman [EMAIL PROTECTED] wrote: I uploaded the slides from my Mahout overview to our wiki (http://cwiki.apache.org/confluence/display/MAHOUT/FAQ) along with another recent talk by Isabel Drost. Both are similar in content

Re: Hadoop 0.17 AMI?

2008-05-21 Thread Jeff Eastman
been released yet. I (or Mukund) is hoping to call a vote this afternoon or tomorrow. Nige On May 14, 2008, at 12:36 PM, Jeff Eastman wrote: I'm trying to bring up a cluster on EC2 using (http://wiki.apache.org/hadoop/AmazonEC2) and it seems that 0.17 is the version to use because of the DNS

Hadoop 0.17 AMI?

2008-05-14 Thread Jeff Eastman
I'm trying to bring up a cluster on EC2 using (http://wiki.apache.org/hadoop/AmazonEC2) and it seems that 0.17 is the version to use because of the DNS improvements, etc. Unfortunately, I cannot find a public AMI with this build. Is there one that I'm not finding or do I need to create one? Jeff

RE: Hadoop input path - can it have subdirectories

2008-04-01 Thread Jeff Eastman
My experience running with the Java API is that subdirectories in the input path do cause an exception, so the streaming file input processing must be different. Jeff Eastman -Original Message- From: Norbert Burger [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 01, 2008 9:46 AM

RE: Hadoop summit video capture?

2008-03-25 Thread Jeff Eastman
I don't know if there was a live version, but the entire summit was recorded on video so it will be available. BTW, it was an overwhelming success and the speakers are all well worth waiting for. I personally got a lot of positive feedback and interest in Mahout, so expect your inbox to explode in

RE: Performance / cluster scaling question

2008-03-21 Thread Jeff Eastman
21, 2008 2:36 PM To: core-user@hadoop.apache.org Subject: Re: Performance / cluster scaling question 3 - the default one... Jeff Eastman wrote: What's your replication factor? Jeff -Original Message- From: André Martin [mailto:[EMAIL PROTECTED] Sent: Friday, March 21

RE: Performance / cluster scaling question

2008-03-21 Thread Jeff Eastman
-user@hadoop.apache.org Subject: Re: Performance / cluster scaling question Right, I totally forgot about the replication factor... However sometimes I even noticed ratios of 5:1 for block numbers to files... Is the delay for block deletion/reclaiming an intended behavior? Jeff Eastman wrote

RE: why the value of attribute in map function will change ?

2008-03-16 Thread Jeff Eastman
Consider that your mapper and driver execute in different JVMs and cannot share static values. Jeff -Original Message- From: ma qiang [mailto:[EMAIL PROTECTED] Sent: Saturday, March 15, 2008 10:35 PM To: core-user@hadoop.apache.org Subject: why the value of attribute in map function

RE: Map/Reduce Type Mismatch error

2008-03-07 Thread Jeff Eastman
The key provided by the default FileInputFormat is not Text, but an integer offset into the split(which is not very usful IMHO). Try changing your mapper back to WritableComparable, Text. If you are expecting the file name to be the key, you will (I think) need to write your own InputFormat. Jeff

RE: Decompression Blues

2008-02-26 Thread Jeff Eastman
Message- From: Arun C Murthy [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 26, 2008 3:47 PM To: core-user@hadoop.apache.org Subject: Re: Decompression Blues Jeff, On Feb 26, 2008, at 12:58 PM, Jeff Eastman wrote: I'm processing a number of .gz compressed Apache and other logs using Hadoop

RE: Best Practice?

2008-02-11 Thread Jeff Eastman
about this, but now I won't. Thanks, Jeff -Original Message- From: Owen O'Malley [mailto:[EMAIL PROTECTED] Sent: Monday, February 11, 2008 10:40 AM To: core-user@hadoop.apache.org Subject: Re: Best Practice? On Feb 9, 2008, at 4:21 PM, Jeff Eastman wrote: I'm trying to wait until

Best Practice?

2008-02-09 Thread Jeff Eastman
What's the best way to get additional configuration arguments to my mappers and reducers? Jeff

RE: Best Practice?

2008-02-09 Thread Jeff Eastman
Well, I tried saving the OutputCollectors in an instance variable and writing to them during close and it seems to work. Jeff -Original Message- From: Jeff Eastman [mailto:[EMAIL PROTECTED] Sent: Saturday, February 09, 2008 4:21 PM To: core-user@hadoop.apache.org Subject: RE: Best

RE: Starting up a larger cluster

2008-02-08 Thread Jeff Eastman
I noticed that phenomena right off the bat. Is that a designed feature or just an unhappy consequence of how blocks are allocated? Ted compensates for this by aggressively rebalancing his cluster often by adjusting the replication up and down, but I wonder if an improvement in the allocation

RE: Starting up a larger cluster

2008-02-07 Thread Jeff Eastman
Oops, should be TaskTracker. -Original Message- From: Jeff Eastman [mailto:[EMAIL PROTECTED] Sent: Thursday, February 07, 2008 12:24 PM To: core-user@hadoop.apache.org Subject: RE: Starting up a larger cluster Hi Ben, I've been down this same path recently and I think I understand your

RE: Starting up a larger cluster

2008-02-07 Thread Jeff Eastman
Hi Ben, I've been down this same path recently and I think I understand your issues: 1) Yes, you need the hadoop folder to be in the same location on each node. Only the master node actually uses the slaves file, to start up DataNode and JobTracker daemons on those nodes. 2) If you did not