We use newline as row seprater, however we are getting some newlines in a
column. So data looks like this
Hello I \n am \n here
Hello\n I am here
Those are 2 lines however it gets broken down as 5 lines because of \n in
between and the real line ends. I tried to use foreach generate
REPLACE('\n',
Hello list,
Today I started Pig on my personal machine after a few weeks to
give 0.11.1 a try. As soon as I issued bin/pig it threw this message on my
terminal :
apache@hadoop:/hadoop/projects/pig-0.11.1$ bin/pig
2013-06-26 06:05:45,121 [main] INFO org.apache.pig.Main - Apache Pig
versi
Hi Phanish,
EvalFuncs can implement getCacheFiles to register files that should be
included in distributed cache:
http://pig.apache.org/docs/r0.11.1/api/org/apache/pig/EvalFunc.html#getCacheFiles()
-Mark
On Tue, Jun 25, 2013 at 11:37 AM, Phanish Lakkarasu <
abhishek.do...@icloud.com> wrote:
>
Hello,
How can I add multiple files to distributed cache in pig UDF.
Regards
Abhi
Hi, Sajid:
Can you try -param_file and the content of the file looks like
(basically multiple lines of parameters)
foo =
bazz = .
so in your Pig script, it substitution value of foo and bazz in
set my_param = 'SUM(foo) as bar, \
SUM (bazz) as boo';
is this w
Based on the spec, the parameter file only works with one line values.
On Jun 25, 2013, at 11:49 AM, Johnny Zhang wrote:
> http://wiki.apache.org/pig/ParameterSubstitution
>
>
> On Tue, Jun 25, 2013 at 10:03 AM, Shahab Yunus wrote:
>
>> Have you tried using params file? I have not used it per
http://wiki.apache.org/pig/ParameterSubstitution
On Tue, Jun 25, 2013 at 10:03 AM, Shahab Yunus wrote:
> Have you tried using params file? I have not used it personalty but it
> might work that way.
>
> Regards,
> Shahab
>
>
> On Tue, Jun 25, 2013 at 12:44 PM, Sajid Raza wrote:
>
> > I don't su
Take a look here
http://ofps.oreilly.com/titles/9781449302641/writing_udfs.html under
"Loading the Distributed Cache".
On Tue, Jun 25, 2013 at 11:41 AM, abhishek wrote:
>
> > Hello,
>
> > How can I add multiple files to distributed cache in pig UDF.
> >
> > Regards
> > Abhi
>
> Hello,
> How can I add multiple files to distributed cache in pig UDF.
>
> Regards
> Abhi
Have you tried using params file? I have not used it personalty but it
might work that way.
Regards,
Shahab
On Tue, Jun 25, 2013 at 12:44 PM, Sajid Raza wrote:
> I don't suppose it's possible to specify a parameter with a multi line
> value.
>
> My use case is that I have a macro that groups o
I don't suppose it's possible to specify a parameter with a multi line
value.
My use case is that I have a macro that groups over an alias, and I would
like to not hard-code what aggregate values I compute.
If I could pass in a multi line parameter value, I would be able to do
something like:
se
Use REGEX_EXTRACT_ALL
Something like this should work (untested, please verify)
rel2 = foreach rel1 generate
FLATTEN(REGEX_EXTRACT_ALL(attributes#'md','\\{"cld":"(\\w+)","sld":"(\\w+)"\\}'))
AS (cld: chararray, sld: chararray);
Tighten up the regex appropriately.
On 24 June 2013 14:55, Suresh S
i think the question is Solve??
my Expression is wrong ,modify
Expression tmp14 = filter tmp13 by $1==$4; i have right result.
-- --
??: "";
: 2013??6??25??(??) 8:05
??: "user";
: how can i filter tuple that have same
how can i filter tuple that have same value in two field?
my data is :
(00,13803493583,0.4,00,66504185,0.10869565217391304)
(00,0351-8596699,0.001949317738791423,00,66504185,0.10869565217391304)
(00,0351-8596699,0
Hi Mohit,
I don't clearly understand your use case. It depends on how you read the
input, how you use the newlines... As the row separator, or just inside a
row as a normal character.
Can you put a simple example of input and output that you need?
Thanks
On Mon, Jun 24, 2013 at 10:18 PM, Mohit
Hi!
I haven't tried this script, but here is an idea:
flattenned = FOREACH data2 GENERATE group AS initialGroup, FLATTEN(data1);
grouped = GROUP flattenned BY (initialGroup, lt, ln);
counted = FOREACH grouped GENERATE group AS wholeGroup, COUNT(flattenned)
AS aCount;
groupedAgain = GROUP counted B
Hi!
What are you trying to do with define c COV('a','b','c') exactly?
Can you try
out = foreach grp generate group, COV(A.$0,A.$1,A.$2);
without the define statement?
Ruslan Al-Fakikh
On Tue, Jun 18, 2013 at 1:17 PM, achile wandji wrote:
> Hi,
> I' trying to compute a correlation with the scri
17 matches
Mail list logo