RE: decompressing bzip2 data with a custom InputFormat

2012-03-16 Thread Tony Burton
Cool - thanks for the confirmation and link, Joey, very helpful. -Original Message- From: Joey Echeverria [mailto:j...@cloudera.com] Sent: 14 March 2012 19:03 To: common-user@hadoop.apache.org Subject: Re: decompressing bzip2 data with a custom InputFormat Yes you have to deal with

Re: decompressing bzip2 data with a custom InputFormat

2012-03-14 Thread Joey Echeverria
To: common-user@hadoop.apache.org > Subject: decompressing bzip2 data with a custom InputFormat > >  Hi, > > I'm setting up a map-only job that reads large bzip2-compressed data files, > parses the XML and writes out the same data in plain text format. My XML >

RE: decompressing bzip2 data with a custom InputFormat

2012-03-14 Thread Tony Burton
done? Thanks! Tony -Original Message- From: Tony Burton [mailto:tbur...@sportingindex.com] Sent: 12 March 2012 18:05 To: common-user@hadoop.apache.org Subject: decompressing bzip2 data with a custom InputFormat Hi, I'm setting up a map-only job that reads large bzip2-compressed

decompressing bzip2 data with a custom InputFormat

2012-03-12 Thread Tony Burton
Hi, I'm setting up a map-only job that reads large bzip2-compressed data files, parses the XML and writes out the same data in plain text format. My XML InputFormat extends TextInputFormat and has a RecordReader based upon the one you can see at http://xmlandhadoop.blogspot.com/ (my version of

Custom InputFormat for Multiline Input File Hive/Hadoop

2011-10-10 Thread Mike Sukmanowsky
ackoverflow.com/questions/7692994/custom-inputformat-with-hive. If there's a resource someone can point me to that'd also be great. Many thanks in advance, Mike

custom InputFormat and RecordReader

2010-05-25 Thread Mo Zhou
Hi, I am quite new to hadoop. I write my own StreamFastaInputFormat an StreamFastaRecordReader in $ hadoopbase/src/contrib/streaming/src/java/org/apache/hadoop/streaming/ I run "$ant" under the directory $hadoopbase/src/contrib/streaming/ using the default build.xml. However it failed due to the

Re: Want to create custom inputformat to read from solr

2010-02-24 Thread Rekha Joshi
The last I heard, there were some discussions of instead creating solr index using hadoop mapreduce rather than pushing solr index into hdfs and so on. SOLR-1045 ad SOLR-1301 can provide you more info. Cheers, /R On 2/24/10 4:23 PM, "Rakhi Khatwani" wrote: Hi, Has anyone tried creatin

Want to create custom inputformat to read from solr

2010-02-24 Thread Rakhi Khatwani
Hi, Has anyone tried creating customInputFormat which reads from solrIndex for processing using mapreduce??? is it possible doin tht?? and how? Regards, Raakhi

Custom InputFormat: LineRecordReader.LineReader reads 0 bytes

2010-02-23 Thread Alexey Tigarev
Hi All! I am implementing a custom InputFormat. Its custom RecordReader uses LineRecordReader.LineReader inside. In some cases its read() method returns 0, i.e. reads 0 bytes. This happen also in unit test where it reads form a regular file on UNIX filesystem. What does it mean and how should I

Re: custom InputFormat

2010-01-12 Thread valentina kroshilina
text.write(out); > } > >} > > > > -Original Message- > From: valentina kroshilina [mailto:kroshil...@gmail.com] > Sent: 2010年1月8日 12:05 > To: common-user@hadoop.apache.org > Subject: custom InputFormat > > I have LongWritab

RE: custom InputFormat

2010-01-08 Thread Jeff Zhang
che.org Subject: custom InputFormat I have LongWritable, IncidentWritable key-value pair as output from one job, that I want to read as input in my second job, where IncidentWritable is custom Writable(see code below). How do I read IncidentWritable in my custom Reader? I don't know how

custom InputFormat

2010-01-08 Thread valentina kroshilina
I have LongWritable, IncidentWritable key-value pair as output from one job, that I want to read as input in my second job, where IncidentWritable is custom Writable(see code below). How do I read IncidentWritable in my custom Reader? I don't know how to convert byte[] to IncidentWritable. Code

Re: Custom InputFormat, problem with constructors

2009-12-15 Thread Kevin Weil
FYI, basing off of Antonio's great work, I finally got around to making this InputFormat tonight: see http://github.com/kevinweil/IntegerListInputFormat. If people are interested, I'm happy to format it and license it appropriately and commit it to core hadoop. Let me know, otherwise I'll just le

Re: Custom InputFormat, problem with constructors

2009-12-13 Thread Kevin Weil
Antonio, If you're interested in open sourcing this, I'd be interested in using/helping. We do something internally that's similar to this, and I've been meaning to write a general-purpose version for a while. Would love to see what you've done and contribute back any changes we make. Github?

Re: Custom InputFormat, problem with constructors

2009-12-11 Thread Antonio D'Ettole
Philip, that was quick and precise. I learned something today. Thank you! Antonio On Fri, Dec 11, 2009 at 8:20 PM, Philip Zeyliger wrote: > Hi Antonio, > > Check out MapTask.java. When your job gets instantiated on the cluster, an > InputSplit object is created for the task, using reflection.

Re: Custom InputFormat, problem with constructors

2009-12-11 Thread Philip Zeyliger
Hi Antonio, Check out MapTask.java. When your job gets instantiated on the cluster, an InputSplit object is created for the task, using reflection. An InputSplit is a Writable, and, like all writables, it gets created with an empty constructor and initialized with readFields(). If you implement

Custom InputFormat, problem with constructors

2009-12-11 Thread Antonio D'Ettole
Hi, I've been trying to code a pretty simple InputFormat. The idea is this: I have an array of numbers (say, the range [0-5000]) and I want each mapper to receive a split of size 500 i.e. 500 LongWritable's. this is an excerpt from the class extending InputSplit: public class myInputSplit extend