Re: including multiple delimited fields (of unknown count) into one

2010-05-19 Thread Dmitriy Ryaboy
At the moment the answer is "Preprocessor" I believe On Wed, May 19, 2010 at 3:37 PM, Bill Graham wrote: > Hi, > > Is there a way to read a collection (of unknown size) of tab-delimited > values into a single data type (tuple?) during the LOAD phase? > > Here's specifically what I'm looking to

Re: including multiple delimited fields (of unknown count) into one

2010-05-19 Thread Bill Graham
Thanks Mridul, but how would I access the items in the numbered fields 3..N where I don't know what N is? Are you suggesting I pass A to a custom UDF to convert to a tuple of [time, count, rest_of_line]? On Wed, May 19, 2010 at 4:11 PM, Mridul Muralidharan wrote: > > You can simply skip specifyi

Re: including multiple delimited fields (of unknown count) into one

2010-05-19 Thread Mridul Muralidharan
You can simply skip specifying schema in the load - and access the fields either through the udf or through $0, etc positional indexes. Like : A = load 'myfile' USING PigStorage(); B = GROUP A by round_hour($0) PARALLEL $PARALLELISM; C = ... Regards, Mridul On Thursday 20 May 2010 04:07

including multiple delimited fields (of unknown count) into one

2010-05-19 Thread Bill Graham
Hi, Is there a way to read a collection (of unknown size) of tab-delimited values into a single data type (tuple?) during the LOAD phase? Here's specifically what I'm looking to do. I have a given input file format of tab-delimited fields like so: [timestamp] [count] [field1] [field2] [field2] .

Re: Exception: Unable to find clone for op Project 4-16 Projections

2010-05-19 Thread Yonggang Qiao
will definitely give it a try. Thanks! On Tue, May 18, 2010 at 5:04 PM, Jeff Zhang wrote: > And you can refer here http://issues.apache.org/jira/browse/PIG-240 > People has done some work for this issue although it is still not > resolved completely. > > > > On Tue, May 18, 2010 at 12:34 PM, Ashu