question about Pig local mode

2022-03-11 Thread Dominique De Vito
Hi, I am using Windows 10 + git bash for testing Pig local mode. I have just executed "pig -x local"... and got an error message. But what surprised me is that when running the local mode I have got "hadoop-config.sh: No such file or directory". OK, I may have a non-perfect Hadoop installation

Re: newbie question regarding sorted data process, and sequential match of records

2015-03-20 Thread Arvind S
i am not an expert ..learning just like you .. but i will attempt to answer and provide some justification .. imo ..depending on the sorted file naming/ sorted data would restrict your processing logic. What you want to leverage is "parallelism" so let the digest and process logic be generic so th

newbie question regarding sorted data process, and sequential match of records

2015-03-16 Thread Troy X
Hi Experts, I'm trying to transform couple of thousands delimited files that is stored on HDFS using PIG. Each file is between 20 to 200 MB in size. The files have very simple column definitions like event history ; TimeStamp, Location, Source, Target, EventType,Description The logic is as foll

elephantbird parse nested json question

2015-03-09 Thread Henry Chen
D 'input.txt' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]); data = FOREACH raw_data GENERATE json#'character_guid' AS (character_guid:chararray), json#'stat_key' AS (stat_key:chararray), json#'logged_a

Re: Question 1

2014-12-04 Thread Akhil Das
1. Who wants to write a bulky buggy (possibly 100+ lines of code) mapreduce job and at the end the expected answer is wrong and you gotta sit debugging the whole stack, So there comes Pig and Hive and all. 2. Pig is more like an analytical tool and the person who is writing the Pig job doesn't nee

Question 1

2014-12-04 Thread Mohan Krishna
Why PIG came in to picture? What makes Hadoop to focus on PIG?

A question about Pig join

2014-10-22 Thread Sameer Tilak
Hi All,I have the following problem: This is a statement in my Pig script. cust_joined = JOIN cust_filtered BY (LOW, HIGH, NORMAL), cust_conversion BY (Low, High, Normal); When Datatypes are chararray for all these fields. So null should be interpreted as 'null', right? cust_filtered includes the

question: handling transactions with store functions

2014-01-17 Thread Segerlind, Nathan L
Hello. I am trying to write a storage function in Pig and I'd like to know what the guarantees are on the StoreFunc's prepareToWrite , cleanupOnFailure and cleanupOnSucccess methods are. In particular, when are these functions called? Is it once per task or once per tuple? The store that I

Re: A question regarding schema

2013-12-31 Thread centerqi hu
I encountered your problem, you can handle as follows There are two ways to add field One is append(object) function, the other is set(index,object) function my code: DataBag inputBag = (DataBag)input.get(0); for (Tuple t : inputBag) {

A question regarding schema

2013-12-30 Thread Sameer Tilak
Hi All,I have a UDF that returns a tuple. The number of elements in the tuple will differ for each user. For example: (userid1, item1, item2, item 100, item 400)(userid1, item1, item200)(userid1, item1, item2, item 100, item200, item250, item300, item 400)(userid1, item 100, item 200, item250, i

Re: Join Question

2013-09-05 Thread F. Jerrell Schivers
Hi Pradeep, This is exactly what I'm looking for. I was going to process this data inside a UDF anyway, so it's easy for me to pick out what I need. Many thanks. --Jerrell On Wed, 4 Sep 2013, Pradeep Gollakota wrote: I think there's probably some convoluted way to do this. First thing yo

Re: Join Question

2013-09-04 Thread Pradeep Gollakota
I think there's probably some convoluted way to do this. First thing you'll have to do is flatten your data. data1 = A, B _ X, X1 X, X2 Y, Y1 Y, Y2 Y, Y3 Then do a join by "B" onto you second dataset. This should produce the following data2 = data1::A, data1::B, data2::A, data2::B, data2::C

Join Question

2013-09-04 Thread F. Jerrell Schivers
Howdy folks, Let's say I have a set of data that looks like this: X, (X1, X2) Y, (Y1, Y2, Y3) So there could be an unknown number of members of each tuple per row. I also have a second set of data that looks like this: X1, 4, 5, 6 X2, 3, 7, 3 I'd like to join these such that I get: X, (X1,

Re: question about syntax for nested evaluations using bincond

2013-07-15 Thread Alan Gates
No, both are equally correct. == has higher precedence than ?: Alan. On Jul 5, 2013, at 1:39 PM, mark meyer wrote: > hello, > > i am new to pig and have a question regarding the syntax arrangement for > nested evaluations using bincond. > > both of these seem to work and

question about syntax for nested evaluations using bincond

2013-07-05 Thread mark meyer
hello, i am new to pig and have a question regarding the syntax arrangement for nested evaluations using bincond. both of these seem to work and produce identical results. is one syntax "more" correct? C = foreach B generate id, name, role, ((locati

RE: A question on joins

2013-06-19 Thread Sameer Tilak
Thanks! Yes, that did the trick. Now I have the following statement: myresults = join userdeliverydetails by $3, deliverystatuscodes by $0; I am running pig in local mode. I have another question. dump myresults has the following statements at the end. However it does not print the resulting

Re: A question on joins

2013-06-19 Thread Barclay Dunn
it turns out that "output" is a reserved word in Pig. if you change your alias to "thingy" your script will work. Barclay Dunn On 6/19/13 3:10 AM, Sameer Tilak wrote: Dear Pig users, I am trying to do simple joins by following an example on a Blog. Your help will be great. UserDetails.txt

A question on joins

2013-06-19 Thread Sameer Tilak
Dear Pig users, I am trying to do simple joins by following an example on a Blog. Your help will be great. UserDetails.txt 123456, Jim 456123, Tom 789123, Harry 789456, Richa DeliveryDetails.txt 123456, 001 456123, 002 789123, 003 789456, 004 DeliveryStatusCodes.txt 001, Delivered 002, Pending

?????? question??about PigStorag

2013-05-22 Thread ????
thank you?? you are right -- -- ??: "felix gao"; : 2013??5??23??(??) 1:22 ??: "user"; : Re: question??about PigStorag I think you spelled the storage wrong. it is PigStorage() not PigStor

question??about PigStorag

2013-05-22 Thread ????
in.] Details at logfile: /opt/hadoop/pig/pig-0.11.1/bin/pig_1369284615000.log my question: I store data in hive ,hive default use '\010' split field. how can i read hive data from pig ? thank you !

Re: Pig question

2013-05-07 Thread abhishek
Coveney, Thanks for the reply Got the answer using nested foreach. a = load 'data' using PigStorage(','); b = foreach a { c = substring(col1,0,4); generate 332 as x, c; } Sent from my iPhone On May 7, 2013, at 4:33 AM, Jonathan Coveney wrote: > cdh-user to bcc >

Re: Pig question

2013-05-07 Thread Jonathan Coveney
cdh-user to bcc Your question doesn't make much sense...I think you may have left a piece off? 2013/5/7 abhishek > Hi all, > > In my script > > a = load 'data' using PigStorage(); > > b = foreach a generate > 342 as col1, > substring(x,0,4) as col2, &

Pig question

2013-05-06 Thread abhishek
Hi all, In my script a = load 'data' using PigStorage(); b = foreach a generate 342 as col1, substring(x,0,4) as col2, ; I want to use col2 later in foreach statement. derived col2 should be used below. Regards Abhi

Re: pig question

2013-04-27 Thread Russell Jurney
values = LOAD 'my_path' AS (id1:int, id2:chararray, value:int); overall = FOREACH (GROUP values BY id1) GENERATE group AS id1, value/MAX(value) as div_max; Russell Jurney http://datasyndrome.com On Apr 27, 2013, at 2:32 AM, jamal sasha wrote: > Hi, > I have data of format > > id1,id2, value >

pig question

2013-04-27 Thread jamal sasha
Hi, I have data of format id1,id2, value 1 , abc, 2993 1, dhu, 9284 1,dus,2389 2, acs,29392 and so on For each id1, I want to find the maximum value and then divide value by max_value so in example above: 1,abc, 2993/9284 1,dhu ,9284/9284 1,dus, 2389/9284 2,acs, 29392/max_value_for_this id H

Re: Filter on tuple question, and how to deal with dity datas?

2013-04-19 Thread Ruslan Al-Fakikh
Hi: Q1: maybe there is something wrong with the udf itself? Q2: How do you specify the data as dirty? One of your 6 fields is null? then you could something like: FILTER BY ($0 IS NULL OR $1 IS NULL...) Ruslan On Fri, Apr 19, 2013 at 6:57 AM, 何琦 wrote: > > Hi, > > Q1:I have a qu

Filter on tuple question, and how to deal with dity datas?

2013-04-18 Thread 何琦
Hi, Q1:I have a question about how to use filter on tuple. The code is: REGISTER pig.jar; raw = LOAD 'data.txt' USING PigStorage('|') AS (phoneNum, tag, flow, duration, count); sumed = FOREACH (GROUP r

Re: Join question

2013-04-01 Thread Mehmet Tepedelenlioglu
gt;2,1.0 > >as 1 +3 + 5 /3 = 3 >whereas in the example.. count(inpt) should give me 4? > >How do i achieve this. >Thanks > > > > > > > > >On Mon, Apr 1, 2013 at 2:24 PM, Mehmet Tepedelenlioglu > >wrote: >> >> Are your ids unique? &g

Re: Join question

2013-04-01 Thread jamal sasha
me 4? How do i achieve this. Thanks On Mon, Apr 1, 2013 at 2:24 PM, Mehmet Tepedelenlioglu wrote: > > Are your ids unique? > > On 4/1/13 2:06 PM, "jamal sasha" wrote: > > >Hi, > > I have a simple join question. > >base = load 'input1

Re: Join question

2013-04-01 Thread Mehmet Tepedelenlioglu
Are your ids unique? On 4/1/13 2:06 PM, "jamal sasha" wrote: >Hi, > I have a simple join question. >base = load 'input1' USING PigStorage( ',' ) as (id1, field1, field2); >stats = load 'input2' USING PigStorage(',') as (id1, me

Join question

2013-04-01 Thread jamal sasha
Hi, I have a simple join question. base = load 'input1' USING PigStorage( ',' ) as (id1, field1, field2); stats = load 'input2' USING PigStorage(',') as (id1, mean, median); joined = JOIN base BY id1, stats BY id1; final = FOREACH joined GENERATE bas

Re: Question about properties for Loader

2013-02-24 Thread Dmitriy Ryaboy
Yuan wrote: > Thanks for the pointers Prashant. I will take a look at PigStorage. > > I have a system for storing metadata, so users don't have to specify it. > > With respect to the properties, I guess my question is, are the ones > passed in from the command li

Re: Question about properties for Loader

2013-02-24 Thread Jeff Yuan
Thanks for the pointers Prashant. I will take a look at PigStorage. I have a system for storing metadata, so users don't have to specify it. With respect to the properties, I guess my question is, are the ones passed in from the command line via -p stored in Property or Configuration fro

Re: Question about properties for Loader

2013-02-24 Thread Prashant Kommireddi
take a look at PigStorage that does something very similar (look for the method applySchema(Tuple tup) ) On Sun, Feb 24, 2013 at 3:33 PM, Jeff Yuan wrote: > I'm trying to write a loader, extending LoadFunc, to read a specific > file format. > > My question, how do I pass prop

Question about properties for Loader

2013-02-24 Thread Jeff Yuan
I'm trying to write a loader, extending LoadFunc, to read a specific file format. My question, how do I pass properties to it (for example the schema of the file type I'm loading)? Would it be using the -p parameter from the cmdline when issuing the query? The second part of the q

Question about setting default loader

2013-02-21 Thread Jeff Yuan
Hi, I'm a new user of pig, so I apologize if my question seems simplistic. Is there a way to specify (via configuration or cmdline input) a different loader to be used as default? What I mean is, if you don't specify explicitly in your load statement, PigStorage is used as a loader.

Re: Question regarding a custom LoadFunc implementation

2012-12-11 Thread Bill Graham
We had a yml file that mapped physical datasources to the loader that the generic one serves as a facade to. Now we're moving to an HCatalog based solution that handles that as well as the logical to physical resolution. Basically the mappings are stored in a DB. On Tue, Dec 11, 2012 at 8:20 AM,

Re: Question regarding a custom LoadFunc implementation

2012-12-11 Thread Prashant Kommireddi
Thanks Bill. Any ideas on how to hide the location of HDFS files from the end user? On Tue, Dec 11, 2012 at 9:42 PM, Bill Graham wrote: > I think the latter would be better. Since the LoadFunc would be decoupled > from the data exporter you could schedule the exporting independent of the > loadi

Re: Question regarding a custom LoadFunc implementation

2012-12-11 Thread Bill Graham
I think the latter would be better. Since the LoadFunc would be decoupled from the data exporter you could schedule the exporting independent of the loading. We do something similar, without the $query part. On Tue, Dec 11, 2012 at 1:10 AM, Prashant Kommireddi wrote: > I was working on a LoadFun

Re: Small question

2012-10-12 Thread Abhishek
Thanks Arun you are right. Sent from my iPhone On Oct 12, 2012, at 11:26 AM, Arun Ahuja wrote: > From my interpretation Hive coaelsce returns the first non-null value. > > So it seems you are just doing a null check on x and return y if it is > null and z otherwise? > > In Pig you could do s

Re: question

2012-10-12 Thread Arun Ahuja
Instead of count = foreach perCust generate group, COUNT(filtered_times.movie); use count = foreach perCust generate FLATTEN(group), COUNT(filtered_times.movie); FLATTEN is a special operator that replaces a tuple with the elements inside the tuple. On Thu, Oct 11, 2012 at 4:36 PM, jamal sasha

Re: Small question

2012-10-12 Thread Arun Ahuja
>From my interpretation Hive coaelsce returns the first non-null value. So it seems you are just doing a null check on x and return y if it is null and z otherwise? In Pig you could do something like --- " (x is null ? y : z) This a standard ternary if/else. Don't see if the 0.00 actually plays

Small question

2012-10-12 Thread Abhishek
Hi all, How to Hive coalesce statement in pig Example: Case when Coalesce(x,0.00)=0.00 then y else z How to write this in pig Regards Abhi

question

2012-10-11 Thread jamal sasha
> >I have a data file in format > > > > User, movie, price > > 123,abc,22.2 > > 123,daw,39 > > 123,abc,99 ß Note that the user and movie is same but price is different > > > > I want to generate a pig script where I am counting how many times a user has rented a particular movie > > > > > > in

Re: Adding keywords to Pig Latin (Was: Question about UDFs and tuple ordering)

2012-10-09 Thread Gianmarco De Francisci Morales
taken some time to understand how a Logical Plan progresses to a > Physical and MR Plan (thanks for the boost, Alan!) > > My next question is centered around Logical Plan generation. If one were > to add a new keyword (sticking with the theme in my last message, say, > SUPERSPECIALJOIN), th

Adding keywords to Pig Latin (Was: Question about UDFs and tuple ordering)

2012-10-09 Thread Brian Stempin
I've taken some time to understand how a Logical Plan progresses to a Physical and MR Plan (thanks for the boost, Alan!) My next question is centered around Logical Plan generation. If one were to add a new keyword (sticking with the theme in my last message, say, SUPERSPECIALJOIN),

Re: A question of pig default load function

2012-10-07 Thread Prashant Kommireddi
Default loader is PigStorage which takes '\t' as delimiter. In your 2nd example, you need to explicitly specify comma as a delimiter (load 'foo' using PigStorage(',') as ...) Sent from my iPhone On Oct 7, 2012, at 12:00 PM, yonghu wrote: > Dear all, > > When I load the data stored in txt file i

RE: Question about UDFs and tuple ordering

2012-10-05 Thread Brian Stempin
Awesome -- I really appreciate that insight. Is that recorded anywhere? If not, then perhaps I'll spend some time writing about how these things are implemented in the wiki for when others come along with similar questions. Thanks, Alan! This e-mail is intended solely for the above-mentione

Re: Question about UDFs and tuple ordering

2012-10-05 Thread Alan Gates
Many operators, such as join and group by, are not implemented by a single physical operation. Also, they are spread through the code as they have logical components and physical components. The logical components of join are in org.apache.pig.newplan.logical.relational.LOJoin.java. That gets

RE: Question about UDFs and tuple ordering

2012-10-05 Thread Brian Stempin
Thanks Russell -- That's really useful. Just for kicks and giggles: Where would I look in the code base to see how the JOIN keyword is implemented? I've found the built in functions, but not the keywords (JOIN, GROUP, etc). Perhaps that would give me some hints. Perhaps it'll show me that a

Re: Question about UDFs and tuple ordering

2012-10-05 Thread Russell Jurney
You can write an EvalFunc UDF that depends on a sort, and there are several in piggybank that do so. COR (the correlate UDF) is such an example. You call these UDFs on a relation after ordering them. For example: answers = foreach (group data by key) { sorted = order data by value; generate m

Question about UDFs and tuple ordering

2012-10-05 Thread Brian Stempin
Hi, I'm fairly new to writing UDFs and Pig in general. I want to be able to write a UDF that can take advantage of MapReduce's sorting of data. Specifically, I'm trying to conceive how I'd write a UDF to do a specialized join or a pivot. In both cases, sorting would be useful. EvalFunc seems

Re: Pig question.

2012-10-03 Thread Russell Jurney
0) convert dates to ISO format via CustomFormatToISO 1) convert dates to unix time longs via ISOToUnix 2) use foreach/generate with ternary operator to add/subtract hours based on the value of the other field 3) convert dates back to ISO format with UnixToISO Call it a day. Russell Jurney twitter

Re: Pig question.

2012-10-03 Thread TianYi Zhu
Hi Jamal, you can write a UDF convert time between different time zones with following utilities, java.text.DateFormat; java.text.SimpleDateFormat; java.util.Date; java.util.TimeZone; Thanks, TianYi On Thu, Oct 4, 2012 at 12:53 AM, jamal sasha wrote: > Hi, > > I have a table in format: > >

Pig question.

2012-10-03 Thread jamal sasha
Hi, I have a table in format: Id: int, amount: float, true_date: chararray, time:chararray, state:chararray Fortunately, there are only two states in my db. So if I have a state as “CA” then add +1 to datetime If state is “MA”, then add +5 to datetime And then save the results. Also a c

Small question in pig

2012-10-02 Thread Abhishek
hi all, I am fairly new to pig.Can any one tell me how to write below hive query in pig latin. In this query iam using Cartesian join to achieve instring or contains in java. Example col1 -- 145678341212 col2 -- %67834% insert into table t1 select t2.col1, t3.col2 from table2 t2 join table

Fwd: Question Regarding HBaseStorage Pig 0.8.1

2012-08-31 Thread Dan Therrien
Originally sent this on this thread http://www.mail-archive.com/user%40pig.apache.org/msg06085.html but can't find out how to reply to the thread. (Buttons at the bottom weren't working) I'm getting an error instantiating HBaseStorage ONLY when run on a cluster. Running in local mode with -x loc

Re: Question Regarding HBaseStorage Pig 0.8.1

2012-08-29 Thread Subir S
Atleast STORE was not working for me using CDH3u3 bundled pig. The conversation in this link seemed to me like matching issue in my case. * https://groups.google.com/a/cloudera.org/group/cdh-user/tree/browse_frm/month/2011-12/b968e9b45faea69f?rnum=141&_done=/a/cloudera.org/group/cdh-user/browse_fr

Re: Question Regarding HBaseStorage Pig 0.8.1

2012-08-25 Thread Dmitriy Ryaboy
It works. Dan, pig should have printed out the name of a file it's logging errors to. That file will have a more complete error trace. Can you send that? D On Sat, Aug 25, 2012 at 5:43 PM, Subir S wrote: > I think HBaseStorage does not work in this version of pig. There were > few JIRAs, I cann

Re: Question Regarding HBaseStorage Pig 0.8.1

2012-08-25 Thread Subir S
I think HBaseStorage does not work in this version of pig. There were few JIRAs, I cannot recollect numbers. On 8/25/12, Dan Therrien wrote: > I'm getting an error instantiating HBaseStorage ONLY when run on a cluster. > Running in local mode with -x local does not produce the error and my pig >

Question Regarding HBaseStorage Pig 0.8.1

2012-08-24 Thread Dan Therrien
I'm getting an error instantiating HBaseStorage ONLY when run on a cluster. Running in local mode with -x local does not produce the error and my pig script runs successfully and the data is properly written to HBase. The error I'm getting is below. java.lang.RuntimeException: could not instanti

Hadoop version question...

2012-08-07 Thread Dan Young
I noticed that Amazon EMR now supports Hadoop 1.0.3, does pig 0.10.x work/certified with Hadoop 1.0.3? Regards, Dano

Re: Design question - parsing clickstream with query parameters

2012-06-18 Thread Mohit Anchlia
Does it make sense to just use UDF functions for each dimension. So for instance if there are 2 dimensions: 1. geo/network 2. visitor We write 2 UDFs that converts query parameters in respective format which then gets stored in 2 separate files for each dimension. I am thinking UDF functions woul

Re: Design question - parsing clickstream with query parameters

2012-06-15 Thread Mohit Anchlia
On Fri, Jun 15, 2012 at 3:34 PM, Jonathan Coveney wrote: > We just use the Java Map class, with the restriction that the key must be a > String. There are some helper methods in trunk to work with maps, and you > can you # to dereference ie map#'key' > thanks! If you don't mind could you please s

Re: Design question - parsing clickstream with query parameters

2012-06-15 Thread Jonathan Coveney
We just use the Java Map class, with the restriction that the key must be a String. There are some helper methods in trunk to work with maps, and you can you # to dereference ie map#'key' 2012/6/15 Mohit Anchlia > On Fri, Jun 15, 2012 at 9:12 AM, Alan Gates wrote: > > > This seems reasonable, e

Re: Design question - parsing clickstream with query parameters

2012-06-15 Thread Mohit Anchlia
On Fri, Jun 15, 2012 at 9:12 AM, Alan Gates wrote: > This seems reasonable, except it seems like it would make more sense to > convert query parameters to maps. By definition a query parameter is > key=value. And a map is easier to work with in general then a bag, since > there's no need to fla

Re: Design question - parsing clickstream with query parameters

2012-06-15 Thread Alan Gates
This seems reasonable, except it seems like it would make more sense to convert query parameters to maps. By definition a query parameter is key=value. And a map is easier to work with in general then a bag, since there's no need to flatten them. Alan. On Jun 11, 2012, at 10:55 AM, Mohit Anc

Design question - parsing clickstream with query parameters

2012-06-11 Thread Mohit Anchlia
I am looking at how to parse URL with query parameters to process clickstream data. Are there any examples I can look at? My steps that I envision are: 1) Read lines and convert query parameters into bags that is a group of fields for a particular dimension table. So if Geo is one of the dimension

Re: Question regarding DefaultTuple(size) implementation

2012-05-27 Thread Subir S
Is this @highponte a spam? It is there for all mails(hadoop,hbase,pig). Can hadoop mailing lists spammed? On 5/27/12, highpointe wrote: > Here is my SS: 259 71 2451 > > On May 26, 2012, at 9:13 PM, Jonathan Coveney wrote: > >> -user >> +dev >> >> Haha, I made this very same comment somewhere, a

Re: Question regarding DefaultTuple(size) implementation

2012-05-26 Thread highpointe
Here is my SS: 259 71 2451 On May 26, 2012, at 9:13 PM, Jonathan Coveney wrote: > -user > +dev > > Haha, I made this very same comment somewhere, and noticed the exact same > thing (I think I mention it in my SchemaTuple benchmarking). > > The reason is so that tuple.size() will return the ri

Re: Question regarding DefaultTuple(size) implementation

2012-05-26 Thread Jonathan Coveney
-user +dev Haha, I made this very same comment somewhere, and noticed the exact same thing (I think I mention it in my SchemaTuple benchmarking). The reason is so that tuple.size() will return the right value. Also, the expectation is that if you append, it goes to the end of all of the nulls, no

Re: Pig UDF question

2012-05-16 Thread Thejas Nair
On 5/15/12 4:18 PM, Mohit Anchlia wrote: I am trying to write an UDF that indexes data in elasticsearch after converting it to JSON. I had 2 questions: 1. If I create a static member in UDF class is that one instance per mapper task? Yes, every mapper task uses single jvm , so you would see one

Re: Pig UDF question

2012-05-15 Thread Mohit Anchlia
Thanks for the reference, Yes I am aware of it but I can't use it as is. For my future references also it would be good for me to know: 1. If I create a static member in UDF class is that one instance per mapper task? 2. Is there a method that gets called at the end of mapper method that I can

Re: Pig UDF question

2012-05-15 Thread Russell Jurney
Are you aware of Wonderdog, which already does this? Unfortunately, finding reusable pig components can be very hard, as they exist across many proprietary projects. https://github.com/infochimps/wonderdog A post explaining how to use it, end to end, is here: http://www.quora.com/Autocomplete/Wha

Pig UDF question

2012-05-15 Thread Mohit Anchlia
I am trying to write an UDF that indexes data in elasticsearch after converting it to JSON. I had 2 questions: 1. If I create a static member in UDF class is that one instance per mapper task? 2. Is there a method that gets called at the end of mapper method that I can use for cleanup? I was wond

Question on Pig Script

2012-04-25 Thread Khamgaonwala, Hussain
Hello, I am new to pig and I am trying to write pig script which will take the following records as an input (48514619041_20044,{(1),(2),(3),(4)}) (48514619041_20045,{(5),(6)}) (48514619041_20044,{(7),(8),(9),(10),(11)}) (48542605038_20045,{(12),(13)}) And provide me following ou

Fwd: AvroStorage/Avro Schema Question

2012-04-10 Thread Russell Jurney
I am having trouble with ARRAY_ELEM getting injected into my pig data, when I store. Scott Carey had good insight into how to address the issue. -- Forwarded message -- From: Scott Carey Date: Mon, Apr 2, 2012 at 9:13 AM Subject: Re: AvroStorage/Avro Schema Question To: u

AvroStorage Question - ARRAY_ELEM bothers me. It called me stupid.

2012-03-30 Thread Russell Jurney
I sent this to the Avro list but got no reply, so I thought I'd try here. Is it possible to name string elements in the schema of an array? Specifically, below I want to name the email addresses in the from/to/cc/bcc/reply_to fields, so they don't get auto-named ARRAY_ELEM by Pig's AvroStorage.

Python to BoundScript encoding question

2012-03-16 Thread Alejandra García Rojas Martínez
Hello, I am having some problems with encoding at the moment of binding python script and pig script. I have the text "Catégorie" as a parameter for a pig script, and when binding with the pig script, it doesn't use the right encoding, and produces "Catgorie". This is my python script: # -*- codi

Re: question about AVG

2012-02-15 Thread Jonathan Coveney
it as an int, which is inefficient, I suppose you could > > make > > > a UDF that calculate the Average of chararrays by casting to an > int...but > > > then that raises the question of why you couldn't just load it as an > > x:int > > > in the first

Re: question about AVG

2012-02-15 Thread Prashant Kommireddi
which is inefficient, I suppose you could > make > > a UDF that calculate the Average of chararrays by casting to an int...but > > then that raises the question of why you couldn't just load it as an > x:int > > in the first place. > > > > So generally, you

Re: question about AVG

2012-02-15 Thread Haitao Yao
> then that raises the question of why you couldn't just load it as an x:int > in the first place. > > So generally, you need to do something like "foreach rel generate (int)x". > In this case that doesn't work as efficiently, but this is kind of a weird &

Re: question about AVG

2012-02-15 Thread Jonathan Coveney
then that raises the question of why you couldn't just load it as an x:int in the first place. So generally, you need to do something like "foreach rel generate (int)x". In this case that doesn't work as efficiently, but this is kind of a weird case. 2012/2/14 Haitao Yao >

question about AVG

2012-02-14 Thread Haitao Yao
hi, all here's my pig script: A = load 'input' as (b:bag{t:(x:int, y:int)}); B = foreach A generate AVG(b.x); describe B; it works well. if the b.x is char array, the problems arise: A = load 'input' as (b:bag{t:(x:chararray, y:int)}); B = foreach A generate AVG((int)b.x); 2012-02-15 14

Re: Fw: Hadoop Cluster Question

2012-02-13 Thread Prashant Kommireddi
ng and make sure both first and second field is in > same order A B C D. > > Thanks > Jagaran > -- > *From:* Prashant Kommireddi > *To:* jagaran das > *Sent:* Monday, 13 February 2012 11:36 AM > > *Subject:* Re: Fw: Hadoop Cluster Quest

Re: Fw: Hadoop Cluster Question

2012-02-12 Thread Prashant Kommireddi
Apologies, I overlooked "One" associated with Part A of the question, and answered it for the case when cluster is out disk space. Part A 1. Job would not fail if there are more nodes where the task can be run (Hadoop places the task on other nodes when a particular node goes

Re: Fw: Hadoop Cluster Question

2012-02-12 Thread Dmitriy Ryaboy
; > - Forwarded Message - > From: jagaran das > To: "common-u...@hadoop.apache.org" > Sent: Sunday, 12 February 2012 9:33 PM > Subject: Hadoop Cluster Question > > > Hi, > A. If One of the Slave Node local disc space is full in a cluster ? > >

Re: Fw: Hadoop Cluster Question

2012-02-12 Thread Prashant Kommireddi
oop.apache.org" > Sent: Sunday, 12 February 2012 9:33 PM > Subject: Hadoop Cluster Question > > > Hi, > A. If One of the Slave Node local disc space is full in a cluster ? > > 1. Would a already started running Pig job fail ? > 2. Any new started pig job would fail ? >

Fw: Hadoop Cluster Question

2012-02-12 Thread jagaran das
- Forwarded Message - From: jagaran das To: "common-u...@hadoop.apache.org" Sent: Sunday, 12 February 2012 9:33 PM Subject: Hadoop Cluster Question Hi, A. If One of the Slave Node local disc space is full in a cluster ? 1. Would a already started running Pig job fail ?

Re: Question on how GroupBy and Join works in Pig

2012-02-09 Thread Dmitriy Ryaboy
I have a question on behavior of how Group By and Join works in Pig : > > Suppose I have Two data files: > > 1. cust_info > > 2. premium_data > > > cust_info: > > ID name region > > 2321 Austin Pondicherry > > 2375 Martin California &

Question on how GroupBy and Join works in Pig

2012-02-06 Thread praveenesh kumar
Hi everyone, I have a question on behavior of how Group By and Join works in Pig : Suppose I have Two data files: 1. cust_info 2. premium_data cust_info: ID name region 2321 Austin Pondicherry 2375 Martin California 4286 Lisa Chennai premium_data: ID premium

Re: Pig/Avro Question

2012-02-03 Thread Russell Jurney
Wow, thanks a ton! On Fri, Feb 3, 2012 at 1:17 PM, Stan Rosenberg < srosenb...@proclivitysystems.com> wrote: > Check the code in PigAvroInputFormat; it overrides 'listStatus' from > FileInputFormat so that files not ending > in .avro are filtered. > > stan > > On Fri, Feb 3, 2012 at 1:58 PM, Russ

Re: Pig/Avro Question

2012-02-03 Thread Stan Rosenberg
Check the code in PigAvroInputFormat; it overrides 'listStatus' from FileInputFormat so that files not ending in .avro are filtered. stan On Fri, Feb 3, 2012 at 1:58 PM, Russell Jurney wrote: > btw - the weird thing is... I've read the code.  There isn't a filter for > .avro in there.  Does Hado

Re: Pig/Avro Question

2012-02-03 Thread Russell Jurney
btw - the weird thing is... I've read the code. There isn't a filter for .avro in there. Does Hadoop, or Avro itself (not that I can see it is involved) do so? On Fri, Feb 3, 2012 at 10:55 AM, Russell Jurney wrote: > Hmmm I applied it, but I still can't open files that don't end in .avro > > On

Re: Pig/Avro Question

2012-02-03 Thread Russell Jurney
Hmmm I applied it, but I still can't open files that don't end in .avro On Fri, Feb 3, 2012 at 2:23 AM, Philipp wrote: > This patch fixes this issue: > > https://issues.apache.org/**jira/browse/PIG-2492 > > > > On 02/03/2012 07:22 AM, Russell Jurne

Re: Pig/Avro Question

2012-02-03 Thread Philipp
This patch fixes this issue: https://issues.apache.org/jira/browse/PIG-2492 On 02/03/2012 07:22 AM, Russell Jurney wrote: I have the same bug. I read the code... there is no obvious fix. Arg. On Feb 2, 2012, at 10:07 PM, Something Something wrote: In my Pig script I have something like t

Re: Pig/Avro Question

2012-02-02 Thread Russell Jurney
I have the same bug. I read the code... there is no obvious fix. Arg. On Feb 2, 2012, at 10:07 PM, Something Something wrote: > In my Pig script I have something like this... > > %default MY_SCHEMA '/user/xyz/my-schema.json'; > > %default MY_AVRO > 'org.apache.pig.piggybank.storage.avro.Avr

Re: Question on generate semantics

2012-02-02 Thread Xiaomeng Wan
a bag of one tuple is still a bag, you need to flatten it generate group, FLATTEN(biggest); Shawn On Wed, Feb 1, 2012 at 10:07 PM, Sid Stuart wrote: > Hi, > > I'm using Pig to analyze some log files. We would like to find the last > time a URL has been accessed. I've pulled out the path and the

Question on generate semantics

2012-02-01 Thread Sid Stuart
Hi, I'm using Pig to analyze some log files. We would like to find the last time a URL has been accessed. I've pulled out the path and the time, but I'm having difficulty create a relation of paths and latest access. My thought was to group the relation by the path and then order the bags on the

Re: question about Pig e2e test, thanks

2011-12-01 Thread Daniel Dai
On Wed, Nov 30, 2011 at 3:25 PM, Zhang Xiaoyu wrote: > Hello, Everyone: > I am trying to run the Pig e2e test. I found the source from repo > http://svn.apache.org/repos/asf/pig/branches/branch-0.9/test/e2e/ > And there is a post describes how to kick of the run > https://cwiki.apache.org/confluen

  1   2   3   >