Hi Pradeep, Yes, I implemented the outputSchema method and it fixed that issue.
We are also planning to evaluate to store intermediate and final results in Cassandra. > Date: Mon, 4 Nov 2013 17:08:56 -0800 > Subject: Re: Java UDF and incompatible schema > From: [email protected] > To: [email protected] > > This is most likely because you haven't defined the outputSchema method of > the UDF. The AS keyword merges the schema generated by the UDF with the > user specified schema. If the UDF does not override the method and specify > the output schema, it is considered null and you will not be able to use AS > to override the schema. > > Out of curiosity, if each one of your small files describes a user, is > there any reason why you can't use a database (e.g. HBase) to store this > information? It seems like any file based storage may not be the best > solution given my extremely limited knowledge of your problem domain. > > > On Mon, Nov 4, 2013 at 4:26 PM, Sameer Tilak <[email protected]> wrote: > > > Hi everyone, > > > > I have written my custom parser and since my files are sm,all I am using > > sequence file for efficiency. Each file in the equence file has info about > > one user and I am parsing that file and I would like to get a bag of tuples > > for every user/file/. In my Parser class I have implemented exec function > > that will be called for each file/user. I then gather the info and package > > it as tuples. Each user will generate multiple tuples sine the file is > > quite rich and complex. Is it correct to assume that the the relation AU > > will contact one bag per user? > > > > When I execute the following script, I get the following error. Any help > > with this would be great! > > ERROR 1031: Incompatable field schema: declared is > > > > "bag_0:bag{:tuple(id:int,class:chararray,name:chararray,begin:int,end:int,probone:chararray,probtwo:chararray)}", > > infered is ":Unknown" > > > > > > Java UDF code snippet > > > > PopulateBag > > { > > > > for (MyItems item : items) > > { > > > > > > Tuple output = TupleFactory.getInstance().newTuple(7); > > > > > > output.set(0, item.getId()); > > > > output.set(1, item.getClass()); > > > > output.set(2,item.getName()); > > > > output.set(3,item.Begin()); > > > > output.set(4,item.End()); > > > > output.set(5,item.Probabilityone()); > > > > output.set(6,item.Probtwo()); > > > > m_defaultDataBag.add(output); > > > > > > } > > } > > > > public DefaultDataBag exec(Tuple input) throws IOException { > > > > try > > { > > > > this.ParseFile((String)input.get(0)); > > this.PopulateBag(); > > return m_defaultDataBag; > > } catch (Exception e) { > > System.err.println("Failed to process th i/p \n"); > > return null; > > } > > } > > > > > > Pig Script > > > > REGISTER > > /users/p529444/software/pig-0.11.1/contrib/piggybank/java/piggybank.jar; > > REGISTER /users/p529444/software/pig-0.11.1/parser.jar > > > > DEFINE SequenceFileLoader > > org.apache.pig.piggybank.storage.SequenceFileLoader(); > > > > A = LOAD '/scratch/file.seq' USING SequenceFileLoader AS (key: chararray, > > value: chararray); > > DESCRIBE A; > > STORE A into '/scratch/A'; > > > > AU = FOREACH A GENERATE parser.Parser(key) AS {(id: int, class: chararray, > > name: chararray, begin: int, end: int, probone: chararray, probtwo: > > chararray)}; > > > > > > > > > >
