Hi Vineet,
Check the UI to see what resources you are using per each of those jobs
from the full Pig script (ie, >lynx localhost:9100). Sometimes the process
may not use the entire cluster per a job on a module of your Pig script;
per a job having $REDUCERS mappers but only one reducer, impacting
Hi Dan,
In your above mentioned snippet of script at last line
D = FOREACH D1 GENERATE FLATTEN(group) AS (category_id, category_name),
(int)COUNT(D0) AS cat_id_1_count:int;
I wanted to perform some group level operation, such as conditional based
additive operation for each grouped rows. How can
Vineet,
Pig 0.12 > supports the IN clause for filtering
X = FILTER A BY (f1==8) OR (NOT (f2+f3 > f1)) OR (f1 IN (9, 10, 11));
Ryan
On Thu, Oct 30, 2014 at 11:09 PM, Vineet Mishra
wrote:
> Hi Dan,
>
> Thanks for your response, although
>
> FILTER cat_ids BY (category_id == 1);
>
> is working
Hi Vineet,
I believe I am understand more about what you are asking for now. Note the
the FILTER statement will only allow through any tuples matching the
predicate logic condition specified in the filter (in this case your
massive disjunction statement hard-coded in the script). Below is a
poss
Hi Dan,
Thanks for your response, although
FILTER cat_ids BY (category_id == 1);
is working fine but having multiple IN matches would make the script very
large, so considering my IN clause contains multiple values, such as
sum_up_count = FILTER cat_ids BY category_id IN (1,2,3,4,5. . . 100);
Hi Vineet,
Not entirely sure I'm understanding the problem correctly, but perhaps the
error you are getting can be fixed by:
sum_up_count = FILTER cat_ids BY (category_id == 1);
I think that having a more clear description of your use case and input
data sets along with your current pig script in
Hi Dan,
I am trying to put Filter inside a Foreach, the description of the group(on
which the FOREACH iteration is happening) is mentioned below. I am trying
to get counts of which all are passing the filter,
Describe grp:
grp: {group: (a::category_id: int,a::category_name: chararray),joind:
{(a:
Hi Anil,
Not entirely sure how this question pertains to Hadoop or Pig, but to
answer your question I guess it really depends what you want to learn
about.
Given your Data courses and ETL/BI background, and I'd say if you want to
keep within ETL/BI, then perhaps consider the real-time streaming
f
Dear All,
I recently finished my BigData training and i’m given one more course for free.
Request if someone can advice me which one of the below three should i go for.
My experience lies in ETL and Reporting.
• Cassandra
• Cloud Computing with AWS
• Apache Storm
Reque
Hi Dan/Lorand,
Thanks for sharing this beautiful resource and knowledge, I will definitely
go through it and let you know should I encounter any issues.
Thanks!
On Tue, Oct 28, 2014 at 7:49 PM, Dan DeCapria, CivicScience <
dan.decap...@civicscience.com> wrote:
> Hi Vineet,
>
> Expanding upon Lo
Hi Vineet,
Expanding upon Lorand's resources, please note this all really depends on
your actual use case. When blocking out code to transform from SQL to Pig
latin, it's usually a good idea to just flow-chart plan the logical process
of what you want to do - just like you would for SQL queries.
Hi Vineet,
I'd recommend you have a look at these excellent resources:
http://hortonworks.com/blog/pig-eye-for-the-sql-guy/
http://mortar-public-site-content.s3-website-us-east-1.amazonaws.com/Mortar-Pig-Cheat-Sheet.pdf
http://www.slideshare.net/trihug/practical-pig/11
--Lorand
On 28/10/14 14:
Hi,
I was looking out to transform SQL statement which is consisting of
multiple clause in the same query specifically, a JOIN followed by some
condition(WHERE) and finally grouping on some fields(GROUP BY).
Can I have a link or some briefing which can guide me how can I implement
this k/o of comp
13 matches
Mail list logo