Re: Reading fixed width files in pig

2012-11-06 Thread Віталій Тимчишин
Sorry for late response, but in case you still need it. I've used regular expression loader for such task. The regexp would look something like ^(.{12})(.{10})$ for fields of width 12 and 10 2012/10/23 ranjith raghunath > Team, > > Are any out of the box load functions for fixed width files? >

Re: accessing like array

2012-11-06 Thread Mohammad Tariq
load the data into a relation and use 'generate' to take only the required fields from this relation and put into some other relation..then store the 2nd relation into some file. Regards, Mohammad Tariq On Tue, Nov 6, 2012 at 7:43 PM, jamal sasha wrote: > Hi, > I have data in form > 1,0

computing avg in pig

2012-11-06 Thread jamal sasha
> I have data in format > > > 1,1.2 > > 2,1.3 > > and so on.. > > > > So basically this is id, val combination where id is unique... > > > > I want to calculate the average of all the values.. > > > > So here.. avg(1.2,1.3) > > > > I was going thru the documentation but most of the

Re: computing avg in pig

2012-11-06 Thread Alan Gates
A = load 'input_file'; B = group A all; C = foreach B generate AVG(A.$1); This groups all of your records into one bag and then takes the average of the second column. Alan. On Nov 6, 2012, at 11:19 AM, jamal sasha wrote: >> I have data in format > >> >> >>1,1.2 >> >>2,1.3 >> >>

Having troubles with PigStorage

2012-11-06 Thread William Oberman
I'm trying to play around with Amazon EMR, and I currently have self hosted Cassandra as the source of data. I was going to try to do: Cassandra -> S3 -> EMR. I've traced my problems to PigStorage. At this point I can recreate my problem "locally" without involving S3 or Amazon. In my local tes

Re: Having troubles with PigStorage

2012-11-06 Thread Cheolsoo Park
Hi Will, >> data = LOAD 'hdfs://ZZZ/tmp/test' USING PigStorage() AS (key:chararray,columns:bag {column:tuple (name, value)}); Can you please provide some of your data from this file (hdfs://ZZZ/tmp/test) that can help us to reproduce your problem? 1 ~ 2 rows would be sufficient. Thanks, Cheolsoo

Re: Having troubles with PigStorage

2012-11-06 Thread William Oberman
This is a dumb question, but PigStorage escapes the delimiter, right? I was assuming I didn't have to select a delimiter such that it doesn't appear in the data as it would get escaped by the export process, and unescaped in the import process On Tue, Nov 6, 2012 at 4:01 PM, Cheolsoo Park w

Re: Having troubles with PigStorage

2012-11-06 Thread Cheolsoo Park
>> This is a dumb question, but PigStorage escapes the delimiter, right? No it doesn't. On Tue, Nov 6, 2012 at 1:29 PM, William Oberman wrote: > This is a dumb question, but PigStorage escapes the delimiter, right? I > was assuming I didn't have to select a delimiter such that it doesn't > appe

Re: Having troubles with PigStorage

2012-11-06 Thread William Oberman
Wow, ok. That is completely unexpected. Thanks for the heads up! In my case, because part of my data is binary (UUIDs from Cassandra) all possible characters can appear in the data, making PigStorage unhelpful ;-) I just tried AvroStorage in piggybank and that is able to store/load my data

Re: Having troubles with PigStorage

2012-11-06 Thread William Oberman
Just in case someone hits this thread by having the same issue, please vote for this bug: https://issues.apache.org/jira/browse/PIG-1271 On Tue, Nov 6, 2012 at 4:50 PM, William Oberman wrote: > Wow, ok. That is completely unexpected. Thanks for the heads up! > > In my case, because part of my

Re: How to create an empty alias

2012-11-06 Thread Vitalii Tymchyshyn
Sorry, for late response. In case you still need this. You can try to read from file:/dev/null This should work for most formats. 2012/10/18 Kevin LION > Hello, > > I've a script which group lot of alias and is doing some operation on it. > But it can happen that I don't need one of this alias.

RE: accessing like array

2012-11-06 Thread yogesh dhari
Hi Jamal, Have you followed Mohammad Tariq's steps? Or else you can go like this, A = load '/filename' using PigStorage(',') as (id, val1,val2); B=foreach A generate id,val2; C = group B by id; D = foreach C generate flatten(B.id), MAX(B.val2); E = distinct D; Dump E; Thanks & Regards Yogesh