Re: Removing characters from a bag

2013-06-25 Thread Mohit Anchlia
We use newline as row seprater, however we are getting some newlines in a column. So data looks like this Hello I \n am \n here Hello\n I am here Those are 2 lines however it gets broken down as 5 lines because of \n in between and the real line ends. I tried to use foreach generate REPLACE('\n',

Pig giving priority to "non" Apache Hadoop

2013-06-25 Thread Mohammad Tariq
Hello list, Today I started Pig on my personal machine after a few weeks to give 0.11.1 a try. As soon as I issued bin/pig it threw this message on my terminal : apache@hadoop:/hadoop/projects/pig-0.11.1$ bin/pig 2013-06-26 06:05:45,121 [main] INFO org.apache.pig.Main - Apache Pig versi

Re: Adding files to distributed cache

2013-06-25 Thread Mark Wagner
Hi Phanish, EvalFuncs can implement getCacheFiles to register files that should be included in distributed cache: http://pig.apache.org/docs/r0.11.1/api/org/apache/pig/EvalFunc.html#getCacheFiles() -Mark On Tue, Jun 25, 2013 at 11:37 AM, Phanish Lakkarasu < abhishek.do...@icloud.com> wrote: >

Adding files to distributed cache

2013-06-25 Thread Phanish Lakkarasu
Hello, How can I add multiple files to distributed cache in pig UDF. Regards Abhi

Re: Multi line parameter value?

2013-06-25 Thread Johnny Zhang
Hi, Sajid: Can you try -param_file and the content of the file looks like (basically multiple lines of parameters) foo = bazz = . so in your Pig script, it substitution value of foo and bazz in set my_param = 'SUM(foo) as bar, \ SUM (bazz) as boo'; is this w

Re: Multi line parameter value?

2013-06-25 Thread Sajid Raza
Based on the spec, the parameter file only works with one line values. On Jun 25, 2013, at 11:49 AM, Johnny Zhang wrote: > http://wiki.apache.org/pig/ParameterSubstitution > > > On Tue, Jun 25, 2013 at 10:03 AM, Shahab Yunus wrote: > >> Have you tried using params file? I have not used it per

Re: Multi line parameter value?

2013-06-25 Thread Johnny Zhang
http://wiki.apache.org/pig/ParameterSubstitution On Tue, Jun 25, 2013 at 10:03 AM, Shahab Yunus wrote: > Have you tried using params file? I have not used it personalty but it > might work that way. > > Regards, > Shahab > > > On Tue, Jun 25, 2013 at 12:44 PM, Sajid Raza wrote: > > > I don't su

Re: Adding files to distributed cache

2013-06-25 Thread Prashant Kommireddi
Take a look here http://ofps.oreilly.com/titles/9781449302641/writing_udfs.html under "Loading the Distributed Cache". On Tue, Jun 25, 2013 at 11:41 AM, abhishek wrote: > > > Hello, > > > How can I add multiple files to distributed cache in pig UDF. > > > > Regards > > Abhi >

Adding files to distributed cache

2013-06-25 Thread abhishek
> Hello, > How can I add multiple files to distributed cache in pig UDF. > > Regards > Abhi

Re: Multi line parameter value?

2013-06-25 Thread Shahab Yunus
Have you tried using params file? I have not used it personalty but it might work that way. Regards, Shahab On Tue, Jun 25, 2013 at 12:44 PM, Sajid Raza wrote: > I don't suppose it's possible to specify a parameter with a multi line > value. > > My use case is that I have a macro that groups o

Multi line parameter value?

2013-06-25 Thread Sajid Raza
I don't suppose it's possible to specify a parameter with a multi line value. My use case is that I have a macro that groups over an alias, and I would like to not hard-code what aggregate values I compute. If I could pass in a multi line parameter value, I would be able to do something like: se

Re: dereferencing bag of map

2013-06-25 Thread Abhinav Neelam
Use REGEX_EXTRACT_ALL Something like this should work (untested, please verify) rel2 = foreach rel1 generate FLATTEN(REGEX_EXTRACT_ALL(attributes#'md','\\{"cld":"(\\w+)","sld":"(\\w+)"\\}')) AS (cld: chararray, sld: chararray); Tighten up the regex appropriately. On 24 June 2013 14:55, Suresh S

??????how can i filter tuple that have same value in two field

2013-06-25 Thread ????
i think the question is Solve?? my Expression is wrong ,modify Expression tmp14 = filter tmp13 by $1==$4; i have right result. -- -- ??: ""; : 2013??6??25??(??) 8:05 ??: "user"; : how can i filter tuple that have same

how can i filter tuple that have same value in two field

2013-06-25 Thread ????
how can i filter tuple that have same value in two field? my data is : (00,13803493583,0.4,00,66504185,0.10869565217391304) (00,0351-8596699,0.001949317738791423,00,66504185,0.10869565217391304) (00,0351-8596699,0

Re: Removing characters from a bag

2013-06-25 Thread Ruslan Al-Fakikh
Hi Mohit, I don't clearly understand your use case. It depends on how you read the input, how you use the newlines... As the row separator, or just inside a row as a normal character. Can you put a simple example of input and output that you need? Thanks On Mon, Jun 24, 2013 at 10:18 PM, Mohit

Re: nested FOREACH statements

2013-06-25 Thread Ruslan Al-Fakikh
Hi! I haven't tried this script, but here is an idea: flattenned = FOREACH data2 GENERATE group AS initialGroup, FLATTEN(data1); grouped = GROUP flattenned BY (initialGroup, lt, ln); counted = FOREACH grouped GENERATE group AS wholeGroup, COUNT(flattenned) AS aCount; groupedAgain = GROUP counted B

Re: count total number of tuples in a bag?

2013-06-25 Thread Ruslan Al-Fakikh
Hi! What are you trying to do with define c COV('a','b','c') exactly? Can you try out = foreach grp generate group, COV(A.$0,A.$1,A.$2); without the define statement? Ruslan Al-Fakikh On Tue, Jun 18, 2013 at 1:17 PM, achile wandji wrote: > Hi, > I' trying to compute a correlation with the scri