Thanks Ed! Can you also file an improvement JIRA under https://issues.apache.org/jira/browse/AVRO with a patch that changes it to make more sense?
On Thu, Jan 16, 2014 at 5:14 PM, ed <edor...@gmail.com> wrote: > Hi Harsh, > > Thank you for your response which was invaluable in helping me to figure out > my issue. The Java-Doc is in fact incorrect when it states that > AvroJob.setOutputSchema cannot accept non-Pair configs as it turns out it > can. What was throwing me off is that if you use AvroJob.setOutputSchema to > set a non-Pair config, then you also need to call AvroJob.setMapOutputSchema > (which does require the use of Pair). Otherwise, by default, the map output > schema gets set to whatever you set in setOutputSchema and if that is > non-pair you'll get an error at runtime. > > Maybe the JavaDoc should say something along the lines of: > >> Configure a job's output schema. If this is a not a Pair-schema then you >> must explicitly set the job's map output schema using setMapOutputSchema > > > Thank you! > > Best Regards, > > Ed > > > > > On Thu, Jan 16, 2014 at 6:47 PM, Harsh J <ha...@cloudera.com> wrote: >> >> Hello Ed, >> >> The AvroReducer per >> >> http://avro.apache.org/docs/1.7.4/api/java/org/apache/avro/mapred/AvroReducer.html >> has a simple spec of <K,V,OUT>, where OUT can be any record type and >> not necessarily a Pair<KO,VO> type. >> >> AvroJob.setOutputSchema(…) should accept non-pair configs. I think its >> java-doc is incorrect though. I wrote a test case yesterday at >> http://issues.apache.org/jira/browse/AVRO-1439, in which I set a >> non-Pair schema via the same call without any trouble. We could get >> the java-doc fixed, if it is indeed wrong. >> >> On Thu, Jan 16, 2014 at 2:14 PM, ed <edor...@gmail.com> wrote: >> > Hello, >> > >> > I am currently reading in lots of small avro files and then writing them >> > out >> > into one large avro file using Map Reduce MR1. I'm trying to do this >> > using >> > the AvroMapper and AvroReducer and it's almost working how I want. >> > >> > The problem right now is that it looks like I have to use >> > "org.apache.avro.mapred.Pair" if I use "AvroJob.setOutputSchema". Is >> > there >> > a way to output a Pair schema from AvroReducer and have the "key" in >> > that >> > schema be ignored (i.e., not included in the output from the reducer)? >> > Right now when I check the Reducer output there is an added field in >> > each >> > record called "key" which I'd like to not have there. >> > >> > Essentially I'm looking for something like NullWritable where the key >> > will >> > just be ignored in the final output. >> > >> > Thank you for any assistance or guidance you can provide! >> > >> > Best Regards, >> > >> > Ed >> >> >> >> -- >> Harsh J > > -- Harsh J