Re: LOAD multiple files with glob

2012-11-26 Thread Bart Verwilst
To answer myself again, I compiled Pig 0.11 and Piggybank, and it's working very well now, globbing seems to be fully supported! Bart Verwilst schreef op 26.11.2012 15:33: To answer myself, could this be part of the solution? : https://issues.apache.org/jira/browse/PIG-2492 Guess I&#x

Re: LOAD multiple files with glob

2012-11-26 Thread Bart Verwilst
To answer myself, could this be part of the solution? : https://issues.apache.org/jira/browse/PIG-2492 Guess I'll have to wait for 0.11 then? Bart Verwilst schreef op 26.11.2012 14:19: 14:16:08 centos6-hadoop-hishiru ~ $ cat avro-test.pig REGISTER 'hdfs:///lib/avro-1.7.2.jar&

Re: LOAD multiple files with glob

2012-11-26 Thread Bart Verwilst
27;. But the above error (Projected field [tracetype] does not exist) is not because of this. URISyntaxException is what you will get because of '[ ]'. Thanks, Cheolsoo On Sun, Nov 25, 2012 at 10:25 AM, Bart Verwilst wrote: Just tried this: -

Re: LOAD multiple files with glob

2012-11-26 Thread Bart Verwilst
doesnt work, but wildcard works? You're right. AvroStorage internally uses Hadoop path globing, and Hadoop path globing doesn't support '[ ]'. But the above error (Projected field [tracetype] does not exist) is not because of this. URISyntaxException is wh

Re: LOAD multiple files with glob

2012-11-25 Thread Bart Verwilst
files that reproduce the error? Thanks, Cheolsoo On Sun, Nov 25, 2012 at 1:02 AM, Bart Verwilst wrote: Hi, I've tried loading a csv with PigStorage(), getting this: txt = load '/import.mysql/trace_ejb3_**2011/part-m-0' USING PigStorage(','); describe txt; Sche

Re: LOAD multiple files with glob

2012-11-25 Thread Bart Verwilst
"name": "pkey", "type": "string"} ] } } } ] } Thanks! Kind regards, Bart Cheolsoo Park schreef op 25.11.2012 15:33: H

Re: LOAD multiple files with glob

2012-11-25 Thread Bart Verwilst
torage, not globbing. Try this with pigstorage. Russell Jurney twitter.com/rjurney On Nov 24, 2012, at 5:15 AM, Bart Verwilst wrote: Hello, Thanks for your suggestion! I switch my avro variable to avro = load '$INPUT' USING AvroStorage(); However I get the same results this way: $ pig -

Re: LOAD multiple files with glob

2012-11-24 Thread Bart Verwilst
works. change line to accept parameters like avro = load '$INPUT' USING AvroStorage(); bin/pig -p INPUT="/data/2012/trace_ejb3/2012-**01-0[12].avro" I think if you dont give double quotes then the expansion is done by OS. Please let us know if it doesnt work...

LOAD multiple files with glob

2012-11-23 Thread Bart Verwilst
Hello, I have the following files on HDFS: -rw-r--r-- 3 hdfs supergroup 22989179 2012-11-22 11:17 /data/2012/trace_ejb3/2012-01-01.avro -rw-r--r-- 3 hdfs supergroup 240551819 2012-11-22 14:27 /data/2012/trace_ejb3/2012-01-02.avro -rw-r--r-- 3 hdfs supergroup 324464635 2012-11-22 18:2

Reading Avro files with Pig

2012-11-19 Thread Bart Verwilst
Hi, I'm trying to read the Avro file i stored on HDFS, but I seem to be hitting a snag. I'm hoping some of you will be able to shed some light on this and allow me to continue my adventure! REGISTER 'hdfs:///lib/avro-1.7.2.jar'; REGISTER 'hdfs:///lib/json-simple-1.1.1.jar'; REGISTER 'hdfs://