At the moment the answer is "Preprocessor" I believe
On Wed, May 19, 2010 at 3:37 PM, Bill Graham wrote:
> Hi,
>
> Is there a way to read a collection (of unknown size) of tab-delimited
> values into a single data type (tuple?) during the LOAD phase?
>
> Here's specifically what I'm looking to
Thanks Mridul, but how would I access the items in the numbered fields 3..N
where I don't know what N is? Are you suggesting I pass A to a custom UDF to
convert to a tuple of [time, count, rest_of_line]?
On Wed, May 19, 2010 at 4:11 PM, Mridul Muralidharan
wrote:
>
> You can simply skip specifyi
You can simply skip specifying schema in the load - and access the
fields either through the udf or through $0, etc positional indexes.
Like :
A = load 'myfile' USING PigStorage();
B = GROUP A by round_hour($0) PARALLEL $PARALLELISM;
C = ...
Regards,
Mridul
On Thursday 20 May 2010 04:07
Hi,
Is there a way to read a collection (of unknown size) of tab-delimited
values into a single data type (tuple?) during the LOAD phase?
Here's specifically what I'm looking to do. I have a given input file format
of tab-delimited fields like so:
[timestamp] [count] [field1] [field2] [field2] .
will definitely give it a try. Thanks!
On Tue, May 18, 2010 at 5:04 PM, Jeff Zhang wrote:
> And you can refer here http://issues.apache.org/jira/browse/PIG-240
> People has done some work for this issue although it is still not
> resolved completely.
>
>
>
> On Tue, May 18, 2010 at 12:34 PM, Ashu