You are trying to read a string that represents a tuple using binary deserialization.
Pig has an abstraction called LoadFunc that knows how to read data off disk and turn it into tuples (yes, records are tuples). PigStorage is one such LoadFunc, and it reads data represented as strings such as what you are trying to feed in. There are other load funcs that know how to read other serializations and interpret the data in very different ways (json, avro, thrift, records from a database, xml...). There is no way for Tuple.readFields to know what format you are trying to feed into it. Tuples serialization is used for intermediate serialization between MR jobs and is not intended for the end-user. You should be using the appropriate LoadFunc to create tuples (PigStorage in this case?), or create them in code as I demonstrated earlier. You might find ReadToEndLoader, which wraps a real loadfunc and helps with some details of instantiating input formats, getting splits, etc, helpful: http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/impl/io/ReadToEndLoader.html But really, you should just create the tuples you want in code rather than involve all of this machinery. D On Sun, Apr 22, 2012 at 9:56 AM, Mohit Anchlia <mohitanch...@gmail.com> wrote: > Could someone help mw answer this question if records (each line) == tuples? > > On Fri, Apr 20, 2012 at 4:22 PM, Mohit Anchlia <mohitanch...@gmail.com>wrote: > >> I am writing unit test but I had a doubt. My understanding is that >> complete record is a tuple. So record "a b >> {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X >> Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55 ),(MARITAL:Married)}" >> which is one line in a file is a tuple? But I somehow feel it's not right. >> Could someone please clarify? >> >> Below is the code, my test is incomplete but just pasting it to show how I >> am constructing this tuple. >> >> >> TupleFactory mTupleFactory = TupleFactory.getInstance(); >> BagFactory mBagFactory = BagFactory.getInstance(); >> >> @Test >> public void evalFuncTest() throws IOException{ >> String record = "a b >> {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X >> Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55 ),(MARITAL:Married)}"; >> Tuple t = mTupleFactory.newTuple(); >> DataInput in = new DataInputStream(new >> ByteArrayInputStream(record.getBytes())); >> t.readFields(in); >> } >>