Re: Read Little Endian Input File Format

2012-07-09 Thread Owen O'Malley
On Mon, Jul 9, 2012 at 1:33 PM, Mike S wrote: > The input file to my M/R job is a file with binary data (20 mix of > int, long, float and double per record) which are all saved in little > endian. I have implement my custom record reader to read a record and > to do so I am currently using the By

Re: Which hadoop version shoul I install in a production environment

2012-07-03 Thread Owen O'Malley
On Tue, Jul 3, 2012 at 1:19 PM, Pablo Musa wrote: > Which is the latest stable hadoop version to install in a production > environment with package manager support? > The current stable version of Hadoop is 1.0.3. It is available as both source and rpms from here: http://hadoop.apache.org/comm

Re: hadoop kerberos security / unix kdc

2012-06-29 Thread Owen O'Malley
On Fri, Jun 29, 2012 at 2:07 PM, Tony Dean wrote: > Hadoop 1.0.3, JDK1.6.0_21 with JCE export jars for strong encryption. You need to move up to a JDK > 1.6.0_27. I'd suggest 1.6.0_31. For details, look at: http://wiki.apache.org/hadoop/HadoopJavaVersions -- Owen

Re: hadoop kerberos security / unix kdc

2012-06-29 Thread Owen O'Malley
On Fri, Jun 29, 2012 at 1:50 PM, Tony Dean wrote: > First, I’d like to thank the community for the time and effort they put > into sharing their knowledge… > Which version of Hadoop are you running? Which JDK are you using? You probably need HDFS-2617 and JDK 1.6.0_31. -- Owen

Re: Need example programs other then wordcount for hadoop

2012-06-29 Thread Owen O'Malley
On Fri, Jun 29, 2012 at 9:46 AM, Saravanan Nagarajan < saravanan.nagarajan...@gmail.com> wrote: > HI all, > > I ran word count examples in hadoop and it's very good starting point for > hadoop.But i am looking for more programs with advanced concept . If you > have any programs or suggestion, plea

Re: Hadoop security

2012-06-25 Thread Owen O'Malley
On Mon, Jun 25, 2012 at 8:02 AM, Fabio Pitzolu wrote: > Hi community! > I have a question concerning the Hadoop security, in particular I need some > advice to configure the Kerberos authentication: > > 1 - I have an Active Directory domain, do I have to connect the Linux > Hadoop nodes to the AD

Re: Terasort

2012-05-14 Thread Owen O'Malley
On Mon, May 14, 2012 at 10:40 AM, Barry, Sean F wrote: > I am having a bit of trouble understanding how the Terasort benchmark works, > especially the fundamentals of how the data is sorted. If the data is being > split into many chunks wouldn't it all have to be re-integrated back into the > e

Re: Is TeraGen's generated data deterministic?

2012-04-14 Thread Owen O'Malley
Yes, both versions of teragen are completely deterministic. They each use a random number generator with a fixed seed. -- Owen On Apr 14, 2012, at 1:53 PM, David Erickson wrote: > Hi we are doing some benchmarking of some of our infrastructure and > are using TeraGen/TeraSort to do the benchm

Re: Very strange Java Collection behavior in Hadoop

2012-03-19 Thread Owen O'Malley
On Mon, Mar 19, 2012 at 11:05 PM, madhu phatak wrote: > Hi Owen O'Malley, > Thank you for that Instant reply. It's working now. Can you explain me > what you mean by "input to reducer is reused" in little detail? Each time the statement "Text value = val

Re: Very strange Java Collection behavior in Hadoop

2012-03-19 Thread Owen O'Malley
On Mon, Mar 19, 2012 at 10:52 PM, madhu phatak wrote: > Hi All, > I am using Hadoop 0.20.2 . I am observing a Strange behavior of Java > Collection's . I have following code in reducer That is my fault. *sigh* The input to the reducer is reused. Replace: list.add(value); with: list.add(new

Re: High quality hadoop logo?

2012-03-01 Thread Owen O'Malley
On Thu, Mar 1, 2012 at 2:14 PM, Keith Wiley wrote: > Sorry, false alarm.  I was looking at the popup thumbnails in google image > search.  If I click all the way through, there are some high quality > versions available.  Why is the version on the Apache site (and the Wikipedia > page) so poor?

Re: Hadoop and Hibernate

2012-02-28 Thread Owen O'Malley
On Tue, Feb 28, 2012 at 5:15 PM, Geoffry Roberts wrote: > If I create an executable jar file that contains all dependencies required > by the MR job do all said dependencies get distributed to all nodes? You can make a single jar and that will be distributed to all of the machines that run the t

Re: Hadoop Oppurtunity

2012-02-19 Thread Owen O'Malley
The Hadoop PMC can create a Hadoop ecosystem specific job list if we want one. Would people find it useful? -- Owen On Feb 19, 2012, at 3:20 AM, Harsh J wrote: > Job-related mails must always go to the dedicated j...@apache.org > mailing list. For more information, see > http://www.apachenews.

Re: Hadoop Example in java

2012-02-17 Thread Owen O'Malley
On Fri, Feb 17, 2012 at 1:00 AM, vikas jain wrote: > > Hi All, > > I am looking for example in java for hadoop. I have done lots of search but > I have only found word count. Are there any other exapmple for the same. If you want to find them on the web, you can look in subversion: http://svn.ap

Re: Sorting text data

2012-02-08 Thread Owen O'Malley
On Wed, Feb 8, 2012 at 5:59 AM, sangroya wrote: > Hi, > > I tried to run the sort example by specifying the input format. But I got > the following error, while running it. You actually need a different mapper to make the whole thing work. I made a patch for Sort.java that should do the trick. h

Re: Regarding security in hadoop

2012-01-30 Thread Owen O'Malley
On Mon, Jan 30, 2012 at 12:45 AM, renuka wrote: > > > Hi All, > > As per the below link security feature•Security (strong authentication via > Kerberos authentication protocol) is added in hadoop 1.0.0 release. > http://www.infoq.com/news/2012/01/apache-hadoop-1.0.0 Actually, it was first release

Re: Automate Hadoop installation

2011-12-07 Thread Owen O'Malley
On Mon, Dec 5, 2011 at 2:32 AM, praveenesh kumar wrote: > Hi all, > > Can anyone guide me how to automate the hadoop installation/configuration > process? We are rapidly making progress on Ambari. Ambari is an Apache project that will deploy, configure, and administer Hadoop clusters with all of

Re: Authentication

2011-11-18 Thread Owen O'Malley
On Fri, Nov 18, 2011 at 6:52 AM, Jignesh Patel wrote: > > Harsh, > Does that mean to implement authentication we need to have oozie jars with > hadoop jars? To be clear, all of the functionality is in Hadoop. The user "oozie" was used as an example and we should probably change the example to loo

Re: source code of hadoop 0.20.2

2011-11-15 Thread Owen O'Malley
On Tue, Nov 15, 2011 at 5:23 AM, Uma Maheswara Rao G wrote: > http://svn.apache.org/repos/asf/hadoop/common/branches/ > all branches code will be under this. > You can choose required one. Actually, you are looking for the tag: http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.2/

Re: Hadoop MapReduce Poster

2011-10-31 Thread Owen O'Malley
On Mon, Oct 31, 2011 at 6:14 AM, Mathias Herberts < mathias.herbe...@gmail.com> wrote: > Hi, > > I'm in the process of putting together a 'Hadoop MapReduce Poster' so > my students can better understand the various steps of a MapReduce job > as ran by Hadoop. Most of it is probably beneath the r

Re: Combiners

2011-10-31 Thread Owen O'Malley
On Mon, Oct 31, 2011 at 5:41 AM, Mathias Herberts < mathias.herbe...@gmail.com> wrote: > Thanks for listing the 5 requirements, if you don't mind I'll add them > to the Hadoop MapReducer Poster. > Sure.

Re: Combiners

2011-10-30 Thread Owen O'Malley
On Sat, Oct 29, 2011 at 3:52 AM, Mathias Herberts < mathias.herbe...@gmail.com> wrote: > My question is, what happens if the combiner outputs different keys > than what it is being fed? The output of the combiner will suffer two > flaws: > > 1. It won't be sorted > 2. It might end up in the wrong

Re: Sudoku Example Program Inputs

2011-10-18 Thread Owen O'Malley
On Tue, Oct 18, 2011 at 3:20 PM, Owen O'Malley wrote: > > > On Tue, Oct 18, 2011 at 1:23 PM, Adam wrote: > >> Does anyone know the syntax for the sudoku example program input and if I >> can find some datasets for it? >> > > There is an example puzzle

Re: Sudoku Example Program Inputs

2011-10-18 Thread Owen O'Malley
On Tue, Oct 18, 2011 at 1:23 PM, Adam wrote: > Does anyone know the syntax for the sudoku example program input and if I > can find some datasets for it? > There is an example puzzle at: puzzle1.dta

Re: Disable Sorting?

2011-09-10 Thread Owen O'Malley
On Sat, Sep 10, 2011 at 12:33 PM, Meng Mao wrote: > Is there a way to collate the possibly large number of map output files, > though? You can make fewer mappers by setting the mapred.min.split.size to define the smallest input that will be given to a mapper. There isn't currently a way of get

Re: Binary content

2011-09-01 Thread Owen O'Malley
On Thu, Sep 1, 2011 at 8:37 AM, Mohit Anchlia wrote: Thanks! Is there a specific tutorial I can focus on to see how it could be > done? > Take the word count example and change its output format to be SequenceFileOutputFormat. job.setOutputFormatClass(SequenceFileOutputFormat.class); and it wil

Re: Skipping Bad Records in M/R Job

2011-08-09 Thread Owen O'Malley
On Tue, Aug 9, 2011 at 5:28 PM, Maheshwaran Janarthanan < ashwinwa...@hotmail.com> wrote: > > Hi, > > I have written a Map reduce job which uses third party libraries to process > unseen data which makes job fail because of errors in records. > > I realized 'Skipping Bad Records' feature in Hadoop

Re: Is it ok to manually delta ~hadoop/mapred/local/taskTracker/archive/*

2011-08-09 Thread Owen O'Malley
On Tue, Aug 9, 2011 at 8:34 AM, Robert J Berger wrote: > Looks like I have something not configured particularly well so that > mapred/local/taskTracker/archive is a local filesystem and its filling > things up. > Configure the size of the distributed cache on each node using local.cache.size, w

Re: Which release to use?

2011-07-15 Thread Owen O'Malley
On Jul 15, 2011, at 7:58 AM, Michael Segel wrote: > So while you can use the Apache release, it may not make sense for your > organization to do so. (Said as I don the flame retardant suit...) I obviously disagree. *grin* Apache Hadoop 0.20.203.0 is the most stable and well tested release and

Re: Which release to use?

2011-07-14 Thread Owen O'Malley
On Jul 14, 2011, at 4:33 PM, Teruhiko Kurosaka wrote: > I'm a newbie and I am confused by the Hadoop releases. > I thought 0.21.0 is the latest & greatest release that I > should be using but I noticed 0.20.203 has been released > lately, and 0.21.X is marked "unstable, unsupported". > > Should

Re: Can Mapper get paths of inputSplits ?

2011-05-13 Thread Owen O'Malley
On Thu, May 12, 2011 at 10:16 PM, Mark question wrote: > Who's filling the map.input.file and map.input.offset (ie. which class) > so I can extend it to have a function to return these strings. MapTask.updateJobWithSplit is the method doing the work. -- Owen

Re: Can Mapper get paths of inputSplits ?

2011-05-12 Thread Owen O'Malley
On Thu, May 12, 2011 at 9:23 PM, Mark question wrote: > So there is no way I can see the other possible splits (start+length)? > like > some function that returns strings of map.input.file and map.input.offset > of > the other mappers ? > No, there isn't any way to do it using the public API.

Re: Can Mapper get paths of inputSplits ?

2011-05-12 Thread Owen O'Malley
On Thu, May 12, 2011 at 8:59 PM, Mark question wrote: > Hi > > I'm using FileInputFormat which will split files logically according to > their sizes into splits. Can the mapper get a pointer to these splits? and > know which split it is assigned ? > Look at http://hadoop.apache.org/common/docs

Re: Stable Release

2011-04-29 Thread Owen O'Malley
On Thu, Apr 28, 2011 at 12:28 PM, Juan P. wrote: > Hi guys, > I wanted to know exactly which was the latest stable release of Hadoop. 0.20.2 is the current stable release. I actually rolled a 0.20.3 release candidate, but didn't call a vote on it since 0.20.203.0 will quickly supersede it. I've

Re: Applications creates bigger output than input?

2011-04-29 Thread Owen O'Malley
On Fri, Apr 29, 2011 at 5:02 AM, elton sky wrote: > For my benchmark purpose, I am looking for some non-trivial, real life > applications which creates *bigger* output than its input. Trivial example > I > can think about is cross join... > As you say, almost all cross join jobs have that proper

Re: Does it mean that single disk failure causes the whole datanode to fail?

2011-04-26 Thread Owen O'Malley
On Tue, Apr 26, 2011 at 6:46 AM, Xiaobo Gu wrote: > How can I download the patched version of hadoop, I only know the > initial versions of each release from the official download website. The 0.20.204 version is still being tested. I'd expect a release next month. You can look at the sources a

Re: Sequence.Sorter Performance

2011-04-25 Thread Owen O'Malley
The SequenceFile sorter is ok. It used to be the sort used in the shuffle. *grin* Make sure to set io.sort.factor and io.sort.mb to appropriate values for your hardware. I'd usually use io.sort.factor as 25 * drives and io.sort.mb is the amount of memory you can allocate to the sorting. -- Owen

Re: Does it mean that single disk failure causes the whole datanode to fail?

2011-04-25 Thread Owen O'Malley
On Mon, Apr 25, 2011 at 9:17 AM, Mathias Herberts < mathias.herbe...@gmail.com> wrote: > You can configure how many failed volumes a datanode can tolerate. That code doesn't handle the corner cases very well. In particular, we've had problems with nodes with bad drives causing problems when the

Re: Hadoop client jar dependencies

2011-03-01 Thread Owen O'Malley
On Tue, Mar 1, 2011 at 2:33 AM, Bryan Keller wrote: > I am writing an application that submits job jar files to the job tracker. > The application writes some files to HDFS among other things before > triggering the job. I am using the hadoop-core library in the Maven central > repository. Unfort

Slides and videos from Feb 2011 Bay Area HUG posted

2011-02-24 Thread Owen O'Malley
The February 2011 Bay Area HUG had a record turn out with 336 people signed up. We had two great talks: * The next generation of Hadoop MapReduce by Arun Murthy * The next generation of Hadoop Operations at Facebook by Andrew Ryan The videos and slides are posted on Yahoo's blog: http://develo

Re: MRUnit and Herriot

2011-02-02 Thread Owen O'Malley
Please keep user questions off of general and use the user lists instead. This is defined here . MRUnit is for testing user's MapReduce applications. Herriot is for testing the framework in the presence of failures. -- Owen On Wed, Feb 2, 2011 at 5:44

Re: Problem write on HDFS

2011-01-26 Thread Owen O'Malley
Please direct user questions to common-user@hadoop.apache.org. -- Owen On Tue, Jan 25, 2011 at 3:27 AM, Alessandro Binhara wrote: > I build a servlet with a hadoop... > i think that tomcat enviroment will be find a hadoop-core-0.20.2.jar .. but > a get a same error > > *ype* Exception report > >

Re: Why Hadoop uses HTTP for file transmission between Map and Reduce?

2011-01-13 Thread Owen O'Malley
At some point, we'll replace Jetty in the shuffle, because it imposes too much overhead and go to Netty or some other lower level library. I don't think that using HTTP adds that much overhead although it would be interesting to measure that. -- Owen

Re: SequenceFiles and streaming or hdfs thrift api

2011-01-04 Thread Owen O'Malley
On Tue, Jan 4, 2011 at 10:02 AM, Marc Sturlese wrote: > The thing is I want this file to be a SequenceFile, where the key should be > a Text and the value a Thrift serialized object. Is it possible to reach > that goal? > I've done the work to support that in Java. See my patch in HADOOP-6685. It

Re: Caution using Hadoop 0.21

2010-11-15 Thread Owen O'Malley
I'm very sorry that you got burned by the change. Most MapReduce applications don't extend the Context classes since those are objects that are provided by the framework. In 0.21, we've marked which interfaces are stable and which are still evolving. We try and hold all of the interfaces stable, bu

Re: Problem with custom WritableComparable

2010-11-12 Thread Owen O'Malley
On Thu, Nov 11, 2010 at 4:29 PM, Aaron Baff wrote: > I'm having a problem with a custom WritableComparable that I created to use > as a Key object. I basically have a number of identifier's with a timestamp, > and I'm wanting to group the Identifier's together in the reducer, and order > the reco

Re: Prime number of reduces vs. linear hash function

2010-10-27 Thread Owen O'Malley
Prime numbers only matter if the hash function is bad and you are using a hash partitioner. In most cases, the hashes are fine and thus the number of reduces can be dictated by the desired degree of parallelism. -- Owen

Re: BUG: Anyone use block size more than 2GB before?

2010-10-21 Thread Owen O'Malley
The block sizes were 2G. The input format made splits that were more than a block because that led to better performance. -- Owen

Re: BUG: Anyone use block size more than 2GB before?

2010-10-18 Thread Owen O'Malley
Block sizes larger than 2**31 are known to not work. I haven't ever tracked down the problem, just set my block size to be smaller than that. -- Owen

Re: Hadoop - Solaris

2010-10-17 Thread Owen O'Malley
On Oct 16, 2010, at 1:08 PM, Bruce Williams wrote: If anyone with experience with Hadoop and Solaris can contact me off list, even to just say I am doing it and it is OK it would be appreciated. LinkedIn is currently running Hadoop on Solaris. Hopefully, Allen Wittenauer can get back to yo

Re: Architecture

2010-10-13 Thread Owen O'Malley
Here is a presentation from Hadoop Summit 2009 "HBase goes Realtime" that gives numbers for latency with HBase. Redirecting to common-user. http://bit.ly/aJEwYj -- Owen

Re: Why hadoop is written in java?

2010-10-10 Thread Owen O'Malley
The real answer is that Hadoop was written originally to support Nutch, which is in Java. Java has mostly served us well being reliable, extremely powerful libraries, and being far easier to debug than C++. There are issues of course... Java's interface to the OS is very weak, object memory over

Re: Generating an Index for sequence files

2010-10-02 Thread Owen O'Malley
On Sat, Oct 2, 2010 at 5:25 AM, Harsh J wrote: > Maybe you should take a look at the TFile classes? The TFiles give you the meta information you want including row counts and an index that is integrated with the compression. The only downside is that you'll need to handle the serialization yourse

Re: Hive Configuration

2010-09-28 Thread Owen O'Malley
On Sep 28, 2010, at 2:18 PM, Matt Tanquary wrote: How do I change the port that it tries to connect to? Please move this discussion over to hive-u...@hadoop.apache.org. Thanks, Owen

Re: do you need to call super in Mapper.Context.setup()?

2010-09-17 Thread Owen O'Malley
On Sep 17, 2010, at 7:29 AM, David Rosenstrauch wrote: On 09/16/2010 11:38 PM, Mark Kerzner wrote: Hi, any need for this, protected void setup(Mapper.Context context) throws IOException, InterruptedException { super.setup(context); // TODO - does this need to be done? this.co

Re: changing SequenceFile format

2010-09-14 Thread Owen O'Malley
On Sep 13, 2010, at 9:19 PM, Matthew John wrote: To sum it up, I should be writing InputFormat , OutputFormat where I will be defining my RecordReader/Writer and InputSplits. Now, why cant I use the FpMetadata and FpMetaId I implemented as the value and key classes. Would not that solve a

Re: changing SequenceFile format

2010-09-13 Thread Owen O'Malley
On Sep 13, 2010, at 12:11 PM, Matthew John wrote: The terasort input you have implemented is text type. And the input is line format where as I am dealing with sequence binary file. For my requirement I have created two writable implementables for the key and value respectively I would j

Re: changing SequenceFile format

2010-09-13 Thread Owen O'Malley
On Sep 13, 2010, at 2:15 AM, Matthew John wrote: Hi guys, I wanted to take in file with input : .. binary sequence file (key and value length are constant) as input for the Sort (examples) . But as I understand the data in a standard Sequencefile of hadoop is in the format : .

Re: Sorting Numbers using mapreduce

2010-09-05 Thread Owen O'Malley
The critical item is that your map's output key should be IntWritable instead of Text. The default comparator for IntWritable will give you properly sorted numbers. If you stringify the numbers and output them as text, they'll get sorted as strings. -- Owen

Re: Do I need to write a RawComparator if my custom writable is not used as a Key?

2010-09-02 Thread Owen O'Malley
No, RawComparator is only needed for Keys. -- Owen On Sep 2, 2010, at 3:35, Vitaliy Semochkin wrote: > Hello, > > Do I need to write a RawComparator if my custom writable is not used > as a Key to improve performance? > > Regards, > Vitaliy S

Re: Job performance issue: output.collect()

2010-09-01 Thread Owen O'Malley
On Sep 1, 2010, at 5:18 AM, Oded Rosen wrote: I would like to know what happens in the output.collect line that takes lots of time, in order to cut down this job's running time. Please keep in mind that I have a combiner, and to my understanding different things happen to the map output when

Re: api doc incomplete

2010-09-01 Thread Owen O'Malley
On Sep 1, 2010, at 8:56 AM, Gang Luo wrote: Hi all, does anybody notice the online api doc is incomplete? At http://hadoop.apache.org/common/docs/current/api/ there is even no mapred or mapreduce package there. I remember I use it well before. What happen? When {common,hdfs,mapreduce}-0.21

Re: Combining Only Once?

2010-08-31 Thread Owen O'Malley
There used to be a compatibility switch, but I believe it was removed in 0.19 or 0.20. Can you describe what you are trying to accomplish? Combiners were always intended to only be used for operations that are idempotent, associative, and commutative. Clearly your combiner doesn't satisfy one of

Re: Job in 0.21

2010-08-29 Thread Owen O'Malley
On Sun, Aug 29, 2010 at 4:39 PM, Mark wrote: >  How should I be creating a new Job instance in 0.21. It looks like > Job(Configuration conf, String jobName) has been deprecated. Go ahead and use that method. I have a jira open to undeprecate it. -- Owen

Re: Command line arguments

2010-08-29 Thread Owen O'Malley
You would need to save the arguments into the Configuration (aka JobConf) that you create your job with. -- Owen

Re: Ivy

2010-08-27 Thread Owen O'Malley
On Aug 27, 2010, at 8:04 AM, Mark wrote: Is there a public ivy repo that has the latest hadoop? Thanks The hadoop jars and poms should be pushed into the central Maven repositories, which Ivy uses. -- Owen

Re: svn/git revisions for 0.20.2

2010-08-25 Thread Owen O'Malley
On Aug 25, 2010, at 3:20 PM, Johannes Zillmann wrote: Hey folks, can somebody tell me how to get the source versions from git/svn for hadoop-hdfs and hadoop-mapreduce ? In hadoop-common there are branches and tags for the release. But how to get the corresponding version of the other 2 pro

Re: Hadoop sorting algorithm on equal keys

2010-08-24 Thread Owen O'Malley
On Aug 24, 2010, at 2:21 AM, Teodor Macicas wrote: Hello, Let's say that we have two maps outputs which will be sorted before the reducer will start. Doesn't matter what {a,b0,b1,c} mean, but let's assume that b0=b1. Map output1 : a, b0 Map output2: c, b1 In this case we can have 2 diffe

Re: Where is Hadoop 20.3?

2010-08-14 Thread Owen O'Malley
I'll probably roll a 0.20.3 in a couple of weeks. -- Owen On Aug 14, 2010, at 5:32, thinke365 wrote: > > 0.20.3 is not released yet, the latest release is 0.21.0rc1 > > Pete Tyler wrote: >> >> Apologies for the newbie question but I think I'm a little lost. Hadoop >> 20.2 came out in Feb 20

Re: Passing information to Map Reduce

2010-08-13 Thread Owen O'Malley
he looks hopeful. However, at first glance it looks good for > distributing files but not instance data. Ideally I'm looking for something > similar to, say, objects being passed between client and server by RMI. > > -Pete > > On Aug 13, 2010, at 3:15 PM, Owen O'Mall

Re: Passing information to Map Reduce

2010-08-13 Thread Owen O'Malley
On Aug 13, 2010, at 12:55 PM, Pete Tyler wrote: I have only found two options, neither of which I really like, 1. Encode information in the job name string - a bit hokey and limited to strings I'd state this as encode the information into a string and add it to the JobConf. Look at the Ba

Re: Partitioner in Hadoop 0.20

2010-08-04 Thread Owen O'Malley
On Aug 4, 2010, at 10:58 AM, David Rosenstrauch wrote: So my partitioner needs to implement Configurable, then not JobConfigurable. Tnx much! ReflectionUtils.newInstance will use either Configurable or JobConfigurable (or both!). So implementing either one will work fine. -- Owen

Re: Partitioner in Hadoop 0.20

2010-08-04 Thread Owen O'Malley
On Aug 4, 2010, at 8:38 AM, David Rosenstrauch wrote: Anyone know if there's any particular reason why the new Partitioner class doesn't implement JobConfigurable? (And, if not, whether there's any plans to fix this omission?) We're working on a somewhat complex partitioner, and it would

Re: Set variables in mapper

2010-08-03 Thread Owen O'Malley
On Aug 3, 2010, at 6:12 AM, Erik Test wrote: Really? This seems pretty nice. In the future, with your implementation, would the value always have to be wrapped in a MyMapper instance? How would parameters be removed if necessary? Sorry, I wasn't clear. I mean that if you make the sub-clas

Re: Set variables in mapper

2010-08-02 Thread Owen O'Malley
On Aug 2, 2010, at 9:17 AM, Erik Test wrote: I'm trying to set a variable in my mapper class by reading an argument from the command line and then passing the entry to the mapper from main. Is this possible? Others have already answered with the current solution of using JobConf to stor

Re: It is possible a bug,about BooleanWritable

2010-07-25 Thread Owen O'Malley
It is a bug. It was fixed as part of MAPREDUCE-365. The relevant fix is: Index: src/java/org/apache/hadoop/io/BooleanWritable.java === --- src/java/org/apache/hadoop/io/BooleanWritable.java (revision 769338) +++ src/java/org/apache/h

Re: WritableComparable question

2010-07-19 Thread Owen O'Malley
On Jul 19, 2010, at 2:15 PM, Raymond Jennings III wrote: The only way I could fix this was to re-initialize my vectors in the "public void readFields(DataInput in)" method. This does not seem like I should have to do this or do I ??? Yes, readFields has to clear the data structures. MapR

Re: Terasort problem

2010-07-11 Thread Owen O'Malley
On Jul 10, 2010, at 4:29 AM, Tonci Buljan wrote: mapred.tasktracker.reduce.tasks.maximum <- Is this configured on every datanode separately? What number shall I put here? mapred.tasktracker.map.tasks.maximum <- same question as mapred.tasktracker.reduce.tasks.maximum Generally, RAM is the s

Re: Next Release of Hadoop version number and Kerberos

2010-07-10 Thread Owen O'Malley
On Wed, Jul 7, 2010 at 8:54 AM, Todd Lipcon wrote: > On Wed, Jul 7, 2010 at 8:29 AM, Ananth Sarathy > wrote: > > The Security/Kerberos support is a huge project that has been in progress > for several months, so the implementation spans tens (if not hundreds?) of > patches. Manually adding these p

Re: Terasort problem

2010-07-09 Thread Owen O'Malley
I would guess that you didn't set the number of reducers for the job, and it defaulted to 2. -- Owen

Re: Is the sort(in sort and shuffle) always required

2010-06-19 Thread Owen O'Malley
On Sat, Jun 19, 2010 at 9:16 AM, Saptarshi Guha wrote: > My question: is the sort (in the sort and shuffle) absolutely required? > If I wanted mapreduce to partition (using the map) and then aggregate(using > reduce) without a need for the keys to be sorted > is it possible to turn of the sorting?

Re: Using wget to download file from HDFS

2010-06-15 Thread Owen O'Malley
On Jun 15, 2010, at 9:30 AM, Jaydeep Ayachit wrote: Thanks, data node may not be known. Is it possible to direct url to namenode and namenode handling streaming by fetching data from various data nodes? If you access the servlet on the NameNode, it will automatically redirect you to a da

Re: Caching in HDFS C API Client

2010-06-14 Thread Owen O'Malley
Indeed. On the terasort benchmark, I had to run intermediate jobs that were larger than ram on the cluster to ensure that the data was not coming from the file cache. -- Owen

Re: Is it possible ....!!!

2010-06-10 Thread Owen O'Malley
You can define your own socket factory by setting the configuration parameter: hadoop.rpc.socket.factory.class.default to a class name of a SocketFactory. It is also possible to define socket factories on a protocol by protocol basis. Look at the code in NetUtils.getSocketFactory. -- Owen

Re: the same key in different reducers

2010-06-09 Thread Owen O'Malley
On Wed, Jun 9, 2010 at 3:15 PM, Alex Kozlov wrote: > So I assume it is entirely possible to write a partitioner that distributes > the same key to multiple reducers and it does not have to be > non-deterministic.  It can assign the partition based on the value. > > Is this correct? Yes. I've neve

Re: the same key in different reducers

2010-06-09 Thread Owen O'Malley
On Jun 9, 2010, at 1:17 AM, Oleg Ruchovets wrote: So is that case possible or every and every reducer has unique output key? The partitioner controls which reduce a given key is sent to. If the partitioner is non-deterministic, the key can end up going to different reduces. If you are us

Re: calling C programs from Hadoop

2010-05-29 Thread Owen O'Malley
On Sat, May 29, 2010 at 12:52 PM, Asif Jan wrote: > Look at Hadoop streaming, may be it is helpful to you. There is also Pipes, which is the C++ interface to MapReduce. -- Owen

Re: Encryption in Hadoop 0.20.1?

2010-05-27 Thread Owen O'Malley
On Thu, May 27, 2010 at 6:58 AM, Arv Mistry wrote: > Thanks for responding Ted. I did see that link before but there wasn't enough > details there for me to make sense of it. I'm not sure who Owen is ;( I'm Owen, although I think I've used at least 5 different email addresses on these lists at v

Re: Can a Partitioner access the Reporter?

2010-05-12 Thread Owen O'Malley
On May 11, 2010, at 11:06 PM, gmar wrote: I'd like to be able to have my customised Partitioner update counters in the Reporter. i.e. So that I know how many keys have been sent to each partition. So, is it possible for the partitioner to obtain a reference to the reporter? No, even in t

Re: Questions about SequenceFiles

2010-05-11 Thread Owen O'Malley
On Tue, May 11, 2010 at 7:48 AM, Ananth Sarathy wrote: > Ok,  how can I report that? File a jira on the project that manages the type. I assume it is Lucene in this case. >  Also, it seems that requiring a no argument constructor but using an > interface is kind of a broken paradigm. Shouldn't t

Re: Questions about SequenceFiles

2010-05-11 Thread Owen O'Malley
Assumption for Writables that should be documented somewhere: * Each type must have a 0 argument constructor. * Each call to write must not assume any shared state. * Each call to readFields must consume exactly the number of bytes produced by write. SequenceFile also assumes: * All keys a

Re: Applying HDFS-630 patch to hadoop-0.20.2 tarball release?

2010-05-04 Thread Owen O'Malley
On Tue, May 4, 2010 at 10:03 AM, Joseph Chiu wrote: > Thanks Todd.    Where I really need help is to get up to speed on that > process of recompiling (and re-installing the build outputs) with ant. The place to look is in the wiki: http://wiki.apache.org/hadoop/HowToRelease It walks through the

Re: hadoop conf for dynamically changing ips

2010-03-26 Thread Owen O'Malley
On Mar 26, 2010, at 9:39 AM, Gokulakannan M wrote: I have a LAN in which the IPs of the machines will be changed dynamically by the DHCP sever. I think you'd need to use a NAT translation so that inside your cluster you have stable IP addrs in 10.x.x.x but the external IP addr

Re: DeDuplication Techniques

2010-03-26 Thread Owen O'Malley
On Mar 25, 2010, at 11:09 AM, Joseph Stein wrote: I have been researching ways to handle de-dupping data while running a map/reduce program (so as to not re-calculate/re-aggregate data that we have seen before[possibly months before]). So roughly, your problem is that you have large amounts o

Re: Measuring running times

2010-03-17 Thread Owen O'Malley
On Mar 17, 2010, at 4:47 AM, Antonio D'Ettole wrote: Hi everybody, as part of my project work at school I'm running some Hadoop jobs on a cluster. I'd like to measure exactly how long each phase of the process takes: mapping, shuffling (ideally divided in copying and sorting) and reducing.

Re: Security issue: hadoop fs shell bypass authentication?

2010-03-06 Thread Owen O'Malley
On Mar 5, 2010, at 4:49 PM, Allen Wittenauer wrote: On 3/5/10 1:57 PM, "jiang licht" wrote: So, this means that hadoop fs shell does not require any authentication and can be fired from anywhere? There is no authentication/security layer in any released version of Hadoop. True, althou

Re: problem building trunk

2010-02-26 Thread Owen O'Malley
On Feb 26, 2010, at 10:22 AM, Massoud Mazar wrote: I'm having issues building the trunk. I follow steps mentioned at http://wiki.apache.org/hadoop/BuildingHadoopFromSVN It is a documentation error. Giri, can you update it with the current targets (ie. mvn-install)? Thanks, Owen

Re: CDH2 or Apache Hadoop - Official Debian packages

2010-02-25 Thread Owen O'Malley
On Feb 25, 2010, at 10:20 AM, Allen Wittenauer wrote: Actually my hope is in the plan of hadoop to once establish a stable API (as planned) so that an upgrade will be backwards compatible. History shows you are in for a long wait. I hope not and I'm trying to make sure that isn't true. At

Re: Security Mechanisms in HDFS

2010-01-05 Thread Owen O'Malley
On Jan 5, 2010, at 7:44 AM, Yu Xi wrote: Could any hadoop gurus tell me what kinds of security mechanisms are already(or planed to be) implemented in hadoop filesystem? It looks like you've found the ones that are already there. You can see my slides about it here: http://www.slideshare.

Re: use List in reducer

2009-12-26 Thread Owen O'Malley
On Dec 26, 2009, at 5:00 PM, Bryan McCormick wrote: What appears to be going on is that the Iterable values seemed to be reusing the Text object being exposed in the for loop and just changing the content of the Text. That is correct. activeList.add(new Text(val.toString)); It

  1   2   >