Hi

I ve been stuck with this issue for a while and unable to get any help.

I was wondering if anyone can help.

I m trying to load email messages into a messages relation and unable to and i 
was wondeirng if anyone may have a sample email dataset which would allow me to 
play around with this script:


Following is the code from Agile Data Science book:

/* Load the emails in avro format (edit the path to match where you saved them) 
using the AvroStorage UDF from Piggybank */
messages = LOAD '/me/Data/test_mbox' USING AvroStorage();

I have manually downloaded my gmail which ends up being 350MB and then i have 
tried loading this file into messages and i got this error message:
*************************************

2014-03-03 01:52:26,294 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1000: Error during parsing. Encountered " "(" "( "" at line 1, column 84.
Was expecting one of:
"as" ...
"parallel" ...
";" ...
"." ...
"$" ...
*************************************


Details at logfile: /home/cloudera/pig_1393839871002.log



I have then downloaded a sample email dataset and tried to load that one into 
the messages relation above
i get the same error.

Then i tried saving the following content from the book in a file and load it 
into the relation and i get the same error message:

here is the content:
*************************************

*************************************



Will keep the weeds from taking over.



Russell Jurney datasyndrome.com
----

I have also tried sending an email to russel but no response.

I am wondering if anyone may have a sample email dataset which would load with 
the avro so i can try out my next steps.
Any help will b appreciated really.
Please let me know.
Thanks
Sai

Reply via email to