RE: Using set/list data types for intermediate keys

2010-01-31 Thread Jones, Nick
Jørn, I found it fairly quick and simple to implement WritableComparable in a specific class for the intermediate dataset. I needed two keys for every value to make sure each reducer had the right data. The class just used two longs internally and implemented the appropriate outputs for WritableC

Re: map side only behavior

2010-01-31 Thread Aaron Kimball
In a map-only job, map tasks will be connected directly to the OutputFormat. So calling output.collect() / context.write() in the mapper will emit data straight to files in HDFS without sorting the data. There is no sort buffer involved. If you want exactly one output file, follow Nick's advice. -

Re: DBOutputFormat Speed Issues

2010-01-31 Thread Aaron Kimball
Nick, I'm afraid that right now the only available OutputFormat for JDBC is that one. You'll note that DBOutputFormat doesn't really include much support for special-casing to MySQL or other targets. Your best bet is to probably copy the code from DBOutputFormat and DBConfiguration into some othe

Re: hadoop under cygwin issue

2010-01-31 Thread Aaron Kimball
Brian, it looks like you missed a step in the instructions. You'll need to format the hdfs filesystem instance before starting the NameNode server: You need to run: $ bin/hadoop namenode -format .. then you can do bin/start-dfs.sh Hope this helps, - Aaron On Sat, Jan 30, 2010 at 12:27 AM, Bria

Using set/list data types for intermediate keys

2010-01-31 Thread Jørn Schou-Rode
What are the options for using sets/lists as keys in the output from the mapper? My initial idea was to use ArrayWritable as key type, but that is not allowed, as the class does not implement WritableComparable. Do I need to define a custom class, or is there some other set like class in the Hadoo

Apache Hadoop Get Together Berlin March 2010

2010-01-31 Thread Isabel Drost
Hello, this is to announce the next Apache Hadoop Get Together Berlin: When: March 10th, 5p.m. Where: Newthinking store Berlin Talks scheduled so far: * Bram Smeets (JTeam/ Amsterdam): Spatial Search. * Dragan Milosevic (zanox/ Berlin: Produc

Re: Bible Code and some input format ideas

2010-01-31 Thread Edward Capriolo
On Tue, Jan 12, 2010 at 5:49 PM, Edward Capriolo wrote: > On Tue, Jan 12, 2010 at 5:37 PM, Alan Gates wrote: >> I'm guessing that you want to set the width of the text to avoid the issue >> where if you split by block, then all splits but the first will have an >> unknown offset. >> >> Most texts