Hi
I have a group and foreach statements as below
grouped = GROUP filterdata BY (page_name,web_session_id);
x = foreach grouped {
distinct_web_cookie_id= DISTINCT filterdata.web_cookie_id;
distinct_encrypted_customer_id= DISTINCT filterdata.encrypted_customer_id;
distinct_web_session_id= DISTINC
Hi,
I am trying query a data set on HDFS using PIG.
Data = LOAD '/user/xx/20130523/*;
x = FOREACH Data GENERATE cookie_id;
I get below error.
Invalid field projection. Projected field [cookie_id]
does not exist
How do i find the column names in the bag "Data" . The developer who
created the
I wrote a script as below.
Data = LOAD 'part-r-0' AS (session_start_gmt:long)
FilterData = FILTER Data BY session_start_gmt=1369546091667
I get below error
2013-07-01 22:48:06,510 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1200: For input string: "1369546091667"
In detail log it
...
If I send only $inputDate , it's working fine. Am i missing anything?
Thanks
On Thu, Jun 20, 2013 at 12:02 PM, Mix Nin wrote:
> I want to pass multiple parameters to a pig script from a shell script.
>
> I tried below
>
> From Shell Script
>
> pig -f $ROOT_DIR/pig
I want to pass multiple parameters to a pig script from a shell script.
I tried below
>From Shell Script
pig -f $ROOT_DIR/pig0.pig -param inputDatePig=$inputDate -param
StartDate =$SDate -param EndDate=$EDate
PIG script is as follows
eventData = LOAD 'eap-prod://event' USING
com.engine.dat
PigStorage) already handles that.
>
> Also, can you double check your path is not "/Output/part-m* as opposed to
> backward slashes?
>
>
> On Tue, Jun 11, 2013 at 2:26 PM, Mix Nin wrote:
>
> > I have a directory "Output2. It has file names as below
> >
&
I have a directory "Output2. It has file names as below
-
_SUCCESS
part-m-0
part-m-1
part-m-2
part-m-3
.
.
.
.
part-m-00100
-
The above files are produced by PIG output STORE command .
I want to read the files starting with "part-m-" using PIG comm
PIG STORE command produces multiple output files. I want a single output
file and I tried using command as below
STORE (foreach (group NoNullData all) generate flatten($1)) into '';
This command produces one single file but at the same time forces to use
single reducer which kills performanc
as been generated.
>>
>> Regards,
>> Shahab
>>
>>
>> On Mon, May 13, 2013 at 2:16 PM, Mix Nin wrote:
>>
>>> Ok, let re modify my requirement. I should have specified in the
>>> beginning itself.
>>>
>>> I need to get coun
On Mon, May 13, 2013 at 11:37 PM, Mix Nin wrote:
>
>> It is a text file.
>>
>> If we want to use wc, we need to copy file from HDFS and then use wc, and
>> this may take time. Is there a way without copying file from HDFS to local
>> directory?
>>
pointers.
>
> what kind of files are we talking about. for text you can use wc , for
> avro data files you can use avro-tools.
>
> or get the job that pig is generating , get the counters for that job from
> the jt of your hadoop cluster.
>
> Thanks,
> Rahul
>
>
&
Hello,
What is the bets way to get the count of records in an HDFS file generated
by a PIG script.
Thanks
runs fine
On Wed, Mar 27, 2013 at 2:40 PM, Mix Nin wrote:
> I wrote a pig script as follows and stored it in x.pig file
>
> Data = LOAD '/' as ( )
> NoNullData= FILTER Data by qe is not null;
> STORE (foreach (group NoNullData all) generate flatten($1))
I wrote a pig script as follows and stored it in x.pig file
Data = LOAD '/' as ( )
NoNullData= FILTER Data by qe is not null;
STORE (foreach (group NoNullData all) generate flatten($1)) into
'exp/$inputDatePig';
evnt_dtl =LOAD 'exp/$inputDatePig/part-r-0' AS (cust,)
I execut
I wrote a pig script as follows and stored it in x.pig file
Data = LOAD '/' as ( )
NoNullData= FILTER Data by qe is not null;
STORE (foreach (group NoNullData all) generate flatten($1)) into
'exp/$inputDatePig';
evnt_dtl =LOAD 'exp/$inputDatePig/part-r-0' AS (cust,)
I execut
Hi
I have data in a file as follows . There are 3 columns separated by
semicolon(;). Each column would have multiple values separated by comma
(,).
11,22,33;144,244,344;yny;
I need output data in below format. It is like transposing values of each
column.
11 144 y
22 244 n
33 344 y
Can we wri
16 matches
Mail list logo