You are trying to read a string that represents a tuple using binary
deserialization.

Pig has an abstraction called LoadFunc that knows how to read data off
disk and turn it into tuples (yes, records are tuples).  PigStorage is
one such LoadFunc, and it reads data represented as strings such as
what you are trying to feed in.  There are other load funcs that know
how to read other serializations and interpret the data in very
different ways (json, avro, thrift, records from a database, xml...).
There is no way for Tuple.readFields to know what format you are
trying to feed into it. Tuples serialization is used for intermediate
serialization between MR jobs and is not intended for the end-user.

You should be using the appropriate LoadFunc to create tuples
(PigStorage in this case?), or create them in code as I demonstrated
earlier.

You might find ReadToEndLoader, which wraps a real loadfunc and helps
with some details of instantiating input formats, getting splits, etc,
helpful: 
http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/impl/io/ReadToEndLoader.html

But really, you should just create the tuples you want in code rather
than involve all of this machinery.

D


On Sun, Apr 22, 2012 at 9:56 AM, Mohit Anchlia <mohitanch...@gmail.com> wrote:
> Could someone help mw answer this question if records (each line) == tuples?
>
> On Fri, Apr 20, 2012 at 4:22 PM, Mohit Anchlia <mohitanch...@gmail.com>wrote:
>
>> I am writing unit test but I had a doubt. My understanding is that
>> complete record is a tuple. So record "a b
>> {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X
>> Xxxxxx)}        {(OCCUP:xxxxxxx xxxxx),(AGE:55    ),(MARITAL:Married)}"
>> which is one line in a file is a tuple? But I somehow feel it's not right.
>> Could someone please clarify?
>>
>> Below is the code, my test is incomplete but just pasting it to show how I
>> am constructing this tuple.
>>
>>
>>   TupleFactory mTupleFactory = TupleFactory.getInstance();
>>  BagFactory mBagFactory = BagFactory.getInstance();
>>
>>  @Test
>>  public void evalFuncTest() throws IOException{
>>   String record = "a b
>> {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X
>> Xxxxxx)}        {(OCCUP:xxxxxxx xxxxx),(AGE:55    ),(MARITAL:Married)}";
>>   Tuple t = mTupleFactory.newTuple();
>>   DataInput in = new DataInputStream(new
>> ByteArrayInputStream(record.getBytes()));
>>   t.readFields(in);
>>  }
>>

Reply via email to