Unable to store data into HBase

2012-09-03 Thread Mohammad Tariq
Hello list, I have a file in my Hdfs and I am reading this file and trying to store the data into an HBase table through Pig Shell. Here are the commands I am using :i z = load '/mapin/testdata2.csv/part-m-0' using PigStorage(',') as (rowkey:int, id:int, age:float, gender:chararray,

Re: Unable to store data into HBase

2012-09-03 Thread Mohammad Tariq
I don't think there is any problem with that as I am able to execute other queries, like loading data from an HBase table and storing it into another HBase table. Regards, Mohammad Tariq On Mon, Sep 3, 2012 at 1:57 PM, shashwat shriparv dwivedishash...@gmail.com wrote: What can conclude

Re: AvroStorage load and store, schema with maps

2012-09-03 Thread Johannes Schwenk
Thank you very much! I was confused because it seems to be ok to pass parameters to DEFINEd functions. If this does not work, it should be a syntax error trying to pass them anyway. Maybe a parser exception could be thrown? Thanks again! Johannes Am 23.08.2012 21:02, schrieb Cheolsoo Park:

Re: Unable to store data into HBase

2012-09-03 Thread chethan
STORE raw_data INTO ‘hbase://sample_names’ USING org.apache.pig.backend.hadoop.hbase.HBaseStorage ( ‘info:fname info:lname’); As above is the example of the HBaseStorage, 1. it take the column family and value( internally it is separated by space as u have given comma for separation this might

Re: Unable to store data into HBase

2012-09-03 Thread Mohammad Tariq
Thank you for the response. But even after removing the comma it's not working. I have noticed 2 strange things here : 1- If I am reading data from HBase and putting it back in some HBase table it works fine. 2- When I am trying the same thing using older versions, HBase(0.90.4) and Pig(0.9.1), it

UDF Performance Problem

2012-09-03 Thread James Newhaven
Hi, I'd appreciate if anyone has some ideas/pointers regarding a pig script and custom UDF I have written. I've found it runs too slowly on my hadoop cluster to be useful... I have two million records inside a single 600MB file. For each record, I need to query a web service to retrieve

Re: UDF Performance Problem

2012-09-03 Thread Dmitriy Ryaboy
That's cause you used group all which groups everything into one group, which by definition can only go to one reducer. What if instead you group into some large-enough number of buckets? A = LOAD 'records.txt' USING PigStorage('\t') AS (recordId:int); A_PRIME = FOREACH A generate *,

Re: UDF Performance Problem

2012-09-03 Thread James Newhaven
Thanks Dmitriy, all sorted now. James On Mon, Sep 3, 2012 at 6:21 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote: That's cause you used group all which groups everything into one group, which by definition can only go to one reducer. What if instead you group into some large-enough number of