Thanks for the response, that helps. I was thinking the same but now knowing too much about pig I wanted to clarify. I'll look at how to use PigStorage in my unit test.
On Sun, Apr 22, 2012 at 3:47 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote: > You are trying to read a string that represents a tuple using binary > deserialization. > > Pig has an abstraction called LoadFunc that knows how to read data off > disk and turn it into tuples (yes, records are tuples). PigStorage is > one such LoadFunc, and it reads data represented as strings such as > what you are trying to feed in. There are other load funcs that know > how to read other serializations and interpret the data in very > different ways (json, avro, thrift, records from a database, xml...). > There is no way for Tuple.readFields to know what format you are > trying to feed into it. Tuples serialization is used for intermediate > serialization between MR jobs and is not intended for the end-user. > > You should be using the appropriate LoadFunc to create tuples > (PigStorage in this case?), or create them in code as I demonstrated > earlier. > > You might find ReadToEndLoader, which wraps a real loadfunc and helps > with some details of instantiating input formats, getting splits, etc, > helpful: > http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/impl/io/ReadToEndLoader.html > > But really, you should just create the tuples you want in code rather > than involve all of this machinery. > > D > > > On Sun, Apr 22, 2012 at 9:56 AM, Mohit Anchlia <mohitanch...@gmail.com> > wrote: > > Could someone help mw answer this question if records (each line) == > tuples? > > > > On Fri, Apr 20, 2012 at 4:22 PM, Mohit Anchlia <mohitanch...@gmail.com > >wrote: > > > >> I am writing unit test but I had a doubt. My understanding is that > >> complete record is a tuple. So record "a b > >> {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X > >> Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55 ),(MARITAL:Married)}" > >> which is one line in a file is a tuple? But I somehow feel it's not > right. > >> Could someone please clarify? > >> > >> Below is the code, my test is incomplete but just pasting it to show > how I > >> am constructing this tuple. > >> > >> > >> TupleFactory mTupleFactory = TupleFactory.getInstance(); > >> BagFactory mBagFactory = BagFactory.getInstance(); > >> > >> @Test > >> public void evalFuncTest() throws IOException{ > >> String record = "a b > >> {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X > >> Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55 ),(MARITAL:Married)}"; > >> Tuple t = mTupleFactory.newTuple(); > >> DataInput in = new DataInputStream(new > >> ByteArrayInputStream(record.getBytes())); > >> t.readFields(in); > >> } > >> >