Reporter: Viraj Bhat
Fix For: 0.9.0
In realistic scenarios we need to split a dataset into segments by using LIMIT,
and like to achieve that goal within the same pig script. Here is a case:
{code}
A = load '$DATA' using PigStorage(',') as (id, pvs);
B
3.0, 0.2.0, 0.1.0
Reporter: Viraj Bhat
I am hoping that in Pig if I type
{quote} c = cogroup a by foo, b by bar", the fields c.group, c.foo and c.bar
should all map to c.$0 {quote}
This would improve the readability of the Pig script.
Here's a real usecase:
{co
Affects Versions: 0.7.0, 0.6.0, 0.5.0, 0.4.0
Reporter: Viraj Bhat
I have created a RANDOMINT function which generates random numbers between (0
and specified value), For example RANDOMINT(4) gives random numbers between 0
and 3 (inclusive)
{code}
$hadoop fs -cat rand.dat
f
g
h
i
j
k
l
Support to 2 level nested foreach
-
Key: PIG-1631
URL: https://issues.apache.org/jira/browse/PIG-1631
Project: Pig
Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Viraj Bhat
What I
: Viraj Bhat
I want to place the parameters of a Pig script in a param_file.
But instead of this file being in the local file system where I run my java
command, I want this to be on HDFS.
{code}
$ java -cp pig.jar org.apache.pig.Main -param_file hdfs://namenode/paramfile
myscript.pig
{code
[
https://issues.apache.org/jira/browse/PIG-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910414#action_12910414
]
Viraj Bhat commented on PIG-1615:
-
I tested this on Pig 0.8, but with a downloaded ver
Versions: 0.7.0, 0.6.0
Reporter: Viraj Bhat
Fix For: 0.8.0
I have a Pig script of this form, which I used inside a workflow system such as
Oozie.
{code}
A = load '$INPUT' using PigStorage();
store A into '$OUTPUT';
{code}
I run this as with Mult
[
https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat updated PIG-282:
---
Release Note:
This feature allows to specify Hadoop Partitioner for the following operations:
GROUP/COGROUP
[
https://issues.apache.org/jira/browse/PIG-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat updated PIG-1586:
Description:
I have a Pig script as a template:
{code}
register Countwords.jar;
A = $INPUT;
B = FOREACH A
)
Key: PIG-1586
URL: https://issues.apache.org/jira/browse/PIG-1586
Project: Pig
Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Viraj Bhat
I have a Pig script as a template:
{code}
register Countwords.jar;
A = $INPUT;
B
: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.7.0, 0.6.0
Reporter: Viraj Bhat
Here is my directory structure on HDFS which I want to access using Pig.
This is a sample, but in real use case I have more than 100 of these
directories.
{code}
$ hadoop fs
Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
I have a simple Pig script which uses the XMLLoader after the Piggybank is
built.
{code}
register piggybank.jar;
A = load '/user/viraj/capacity-scheduler.xml.gz
Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Viraj Bhat
I am trying to use the MultiStorage piggybank UDF
{code}
register pig-svn/trunk/contrib/piggybank/java/piggybank.jar;
A = load '/user/viraj/largebucketinput.txt' using PigStorage('\u0001') as
(
[
https://issues.apache.org/jira/browse/PIG-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895858#action_12895858
]
Viraj Bhat commented on PIG-1537:
-
Hi Olga, I have given the specific script with UDF
Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Viraj Bhat
I have script which is of this pattern and it uses 2 StoreFunc's:
{code}
register loader.jar
register piggy-bank/java/build/storage.jar;
%DEFAULT OUTPUTDIR /user/viraj/prunecol/
ss_sc_0 = LOAD '/
[
https://issues.apache.org/jira/browse/PIG-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat updated PIG-1537:
Description:
I have script which is of this pattern and it uses 2 StoreFunc's:
{code}
register loade
Equating aliases does not work (B = A)
--
Key: PIG-1529
URL: https://issues.apache.org/jira/browse/PIG-1529
Project: Pig
Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Viraj
: Pig
Issue Type: Improvement
Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
I am doing a self join:
Input file is tab separated:
{code}
1 one
1 uno
2 two
2 dos
3 three
3 tres
{code}
vi...@machine~/pigscripts
[
https://issues.apache.org/jira/browse/PIG-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864963#action_12864963
]
Viraj Bhat commented on PIG-1345:
-
Richard thanks for suggesting a workaround. The e
[
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat reopened PIG-1378:
-
Pradeep,
After rerunning with patch the following revision
Apache Pig version 0.8.0-dev (r940560)
compiled
[
https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861134#action_12861134
]
Viraj Bhat commented on PIG-798:
Ashutosh thanks for clarifying, we will wait till that
[
https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861106#action_12861106
]
Viraj Bhat commented on PIG-1211:
-
Ashutosh, yes as more and more people adopt Pig,
[
https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861097#action_12861097
]
Viraj Bhat commented on PIG-798:
Hi Ashutosh,
Yes that is possible, I know that we ca
[
https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat updated PIG-798:
---
Affects Version/s: 0.6.0
0.5.0
0.4.0
0.3.0
[
https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860452#action_12860452
]
Viraj Bhat commented on PIG-798:
Hi Ashutosh,
The problem here is not about using the
[
https://issues.apache.org/jira/browse/PIG-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat updated PIG-1339:
Affects Version/s: 0.7.0
0.8.0
> International characters in column names
[
https://issues.apache.org/jira/browse/PIG-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860445#action_12860445
]
Viraj Bhat commented on PIG-1339:
-
Hi Ashutosh this does not work in trunk. I am using
[
https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860419#action_12860419
]
Viraj Bhat commented on PIG-1211:
-
Ashutosh, I feel that the user may not be intereste
[
https://issues.apache.org/jira/browse/PIG-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860397#action_12860397
]
Viraj Bhat commented on PIG-1345:
-
Which release will PIG:908 be fixed?
Does it guara
[
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859384#action_12859384
]
Viraj Bhat commented on PIG-1378:
-
har:// currently works in Pig 0.7 when the hdfs loca
[
https://issues.apache.org/jira/browse/PIG-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat resolved PIG-829.
Fix Version/s: 0.7.0
Resolution: Fixed
Pig 0.7 yields the correct result.
{code}
x = LOAD 'some
[
https://issues.apache.org/jira/browse/PIG-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat resolved PIG-518.
Fix Version/s: 0.7.0
Resolution: Fixed
> LOBinCond exception in LogicalPlanValidationExecutor w
[
https://issues.apache.org/jira/browse/PIG-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857157#action_12857157
]
Viraj Bhat commented on PIG-518:
The above script generates the following error in Pig
[
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat updated PIG-1378:
Description:
I am trying to use har (Hadoop Archives) in my Pig script.
I can use them through the HDFS
: Viraj Bhat
Fix For: 0.7.0
I am trying to use har (Hadoop Archives) in my Pig script.
I can use them through the HDFS shell
{noformat}
$hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
Found 1 items
-rw--- 5 viraj users1537234 2010-04-14 09:49
/jira/browse/PIG-1377
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.6.0, 0.7.0
Reporter: Viraj Bhat
I have a Zebra script which generates huge amount of mappers around 400K. The
mapred.jobtracker.maxtasks.per.job is currently set at
Issue Type: Bug
Components: impl
Affects Versions: 0.6.0, 0.7.0
Reporter: Viraj Bhat
Script loads data from BinStorage(), then flattens columns and then sorts on
the second column with order descending. The order by fails with the
ClassCastException
{code
[
https://issues.apache.org/jira/browse/PIG-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat resolved PIG-756.
Resolution: Fixed
Fix Version/s: 0.7.0
https://issues.apache.org/jira/browse/PIG-1053 fixes this issue
[
https://issues.apache.org/jira/browse/PIG-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854762#action_12854762
]
Viraj Bhat commented on PIG-756:
In Pig 0.7 we have moved local mode of Pig to local mod
Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
For the purpose of easy debugging, I would be nice to find out where my
warnings are coming from is in the pig script.
The only known process is to comment out lines in the Pig script and see if
these
Issue Type: Bug
Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
There is a particular case where I was running with the latest trunk of Pig.
{code}
$java -cp pig.jar:/home/path/hadoop20cluster org.apache.pig.Main testcase.pig
[main] INFO
[
https://issues.apache.org/jira/browse/PIG-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat updated PIG-1341:
Component/s: impl
Summary: Cannot convert DataByeArray to Chararray and results in
Project: Pig
Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Script reads in BinStorage data and tries to convert a column which is in
DataByteArray to Chararray.
{code}
raw = load 'sampledata' using BinStorage() as (col1,col2, col3)
Affects Versions: 0.6.0
Reporter: Viraj Bhat
There is a particular use-case in which someone specifies a column name to be
in International characters.
{code}
inputdata = load '/user/viraj/inputdata.txt' using PigStorage() as (あいうえお);
describe inputdata;
dump inputd
[
https://issues.apache.org/jira/browse/PIG-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat updated PIG-1308:
Description:
Simple script fails to read files from BinStorage() and fails to submit jobs to
JobTracker
]
Key: PIG-1308
URL: https://issues.apache.org/jira/browse/PIG-1308
Project: Pig
Issue Type: Bug
Reporter: Viraj Bhat
Fix For: 0.7.0
Simple script fails to read files from BinStorage() and fails to
://issues.apache.org/jira/browse/PIG-1305
Project: Pig
Issue Type: Bug
Components: documentation
Reporter: Viraj Bhat
Fix For: 0.7.0
The Pig Reference Manual needs to be updated:
Relational Operators
Syntax:
LOAD 'data' [USIN
Issue Type: New Feature
Affects Versions: 0.6.0
Reporter: Viraj Bhat
I have the following txt files which are bzipped: \t =
{code}
$ bzcat A.txt.bz2
1\ta
2\taa
$bzcat B.txt.bz2
1\tb
2\tbb
$cat *.bz2 > test/mymerge.bz2
$bzcat test/mymerge.bz2
1\ta
2\taa
1\tb
2\
---
Key: PIG-1281
URL: https://issues.apache.org/jira/browse/PIG-1281
Project: Pig
Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Fix For: 0.8.0
This is more of an enhancement request, where we
URL: https://issues.apache.org/jira/browse/PIG-1278
Project: Pig
Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Fix For: 0.7.0
I have a script which uses Map data, and runs a UDF, which creates random
numbers and
[
https://issues.apache.org/jira/browse/PIG-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840389#action_12840389
]
Viraj Bhat commented on PIG-1272:
-
Now with Pig 0.7 or trunk we have the following e
: Viraj Bhat
Fix For: 0.7.0
For a simple script the column pruner optimization removes certain columns from
the original relation, which results in wrong results.
Input file "kv" contains the following columns (tab separated)
{code}
a 1
a 2
a 3
b 4
[
https://issues.apache.org/jira/browse/PIG-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840339#action_12840339
]
Viraj Bhat commented on PIG-1252:
-
A modified version of the script works, does this hav
[
https://issues.apache.org/jira/browse/PIG-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat updated PIG-1263:
Description:
I have a Pig script which I am experimenting upon. [[Albeit this is not
optimized and can be
/browse/PIG-1263
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Fix For: 0.6.0
I have a Pig script which I am experimenting upon. [[Albeit this is not
optimized and can be done in variety of
[
https://issues.apache.org/jira/browse/PIG-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat updated PIG-1252:
Description:
I have script which uses split but somehow does not use one of the split
branch. The skeleton
: Pig
Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Fix For: 0.7.0
I have script which uses split but somehow does not use one of the split
branch. The skeleton of the script is as follows
{code}
loadData = load '/user/viraj/zebradata
-
Key: PIG-1247
URL: https://issues.apache.org/jira/browse/PIG-1247
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Fix For
Versions: 0.6.0
Reporter: Viraj Bhat
Fix For: 0.7.0
I have a program which generates different types of Maps fields and stores it
into PigStorage.
{code}
A = load '/user/viraj/three.txt' using PigStorage();
B = foreach A generate ['a'#'12'] as
[
https://issues.apache.org/jira/browse/PIG-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat reopened PIG-1194:
-
Hi Richard,
I ran the script attached on the ticket and found out that the map tasks fails
with the
[
https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831251#action_12831251
]
Viraj Bhat commented on PIG-1131:
-
Ashutosh I was able to recreate a similar problem u
[
https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831248#action_12831248
]
Viraj Bhat commented on PIG-1131:
-
Olga I marked it as critical since we mention that
: documentation
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Fix For: 0.7.0
To get help at the grunt shell I do the following:
grunt>touchz
010-02-04 00:59:28,714 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
1000: Error during parsing. Encountered " "touc
Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Fix For: 0.8.0
I have a Pig script which is structured in the following way
{code}
register cp.jar
dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col
[
https://issues.apache.org/jira/browse/PIG-531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat updated PIG-531:
---
Fix Version/s: 0.5.0
Hi Olga,
I think we have a way to handle it in multi-query optimization. Is it
[
https://issues.apache.org/jira/browse/PIG-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat updated PIG-940:
---
Affects Version/s: (was: 0.3.0)
0.5.0
Fix Version/s: 0.7.0
> Cross site H
[
https://issues.apache.org/jira/browse/PIG-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat updated PIG-1174:
Fix Version/s: 0.7.0
> Creation of output path should be done by storage funct
Affects Versions: 0.5.0, 0.6.0
Reporter: Viraj Bhat
Assignee: Richard Ding
Fix For: 0.6.0
Attachments: inputdata.txt
I have a simple Pig script which takes 3 columns out of which one is null.
{code}
input = load 'inputdata.txt' using PigSt
[
https://issues.apache.org/jira/browse/PIG-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat updated PIG-1194:
Attachment: inputdata.txt
Testdata to run with this script
> ERROR 2055: Received Error while process
[
https://issues.apache.org/jira/browse/PIG-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800315#action_12800315
]
Viraj Bhat commented on PIG-1187:
-
Hi Jeff,
This is specific to the data we are using
Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Fix For: 0.6.0
I have a set of Pig statements which dump an international dataset.
{code}
INPUT_OBJECT = load 'internationalcode';
describe INPUT_OBJECT;
dump INPUT_OBJECT;
{code}
Sample output
Dear Hadoop and Pig Users,
This is just to let you know that the submission deadline for ICS'10 (
http://www.ics-conference.org/) is two weeks from today. ICS is a
premier forum for research in cloud/distributed computing and the most
of the work/research we do in CCDI. The CFP of the conferen
[
https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792061#action_12792061
]
Viraj Bhat commented on PIG-1157:
-
Hi Richard,
Thanks for your suggestion, it w
[
https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat updated PIG-1157:
Attachment: oomreplicatedjoin.pig
replicatedjoinexplain.log
Explain output and Pig script
Issue Type: Bug
Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Fix For: 0.6.0
Hi all,
I have a script which does 2 replicated joins in succession. Please note that
the inputs do not exist on the HDFS.
{code}
A = LOAD '/tmp/abc
[
https://issues.apache.org/jira/browse/PIG-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788481#action_12788481
]
Viraj Bhat commented on PIG-1144:
-
Hi Daniel,
Thanks again for your input. This is mor
[
https://issues.apache.org/jira/browse/PIG-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788439#action_12788439
]
Viraj Bhat commented on PIG-1144:
-
Hi Daniel,
One more thing to note is that the Last So
[
https://issues.apache.org/jira/browse/PIG-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788436#action_12788436
]
Viraj Bhat commented on PIG-1144:
-
This happens on the real cluster, where the sorting
[
https://issues.apache.org/jira/browse/PIG-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat updated PIG-1144:
Attachment: brokenparallel.out
genericscript_broken_parallel.pig
Script and explain output
Issue Type: Bug
Components: impl
Affects Versions: 0.7.0
Environment: Hadoop 20 cluster with multi-node installation
Reporter: Viraj Bhat
Fix For: 0.7.0
Hi all,
I have a Pig script where I set the parallelism using the following set
construct: &quo
[
https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788387#action_12788387
]
Viraj Bhat commented on PIG-1131:
-
Hi Pradeep,
So the workaround for this is for the
[
https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat updated PIG-1131:
Attachment: simplejoinscript.pig
junk2.txt
junk1.txt
Dummy datasets and pig
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Priority: Critical
Fix For: 0.7.0
I have a simple script, which does a JOIN.
{code}
input1 = load '/user/viraj/junk1.txt' using PigStorage(' ');
describe input1;
input2 = load '/us
Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Priority: Minor
Fix For: 0.6.0
As a Hadoop user I want to control the Job name for my analysis via the command
line using the following construct::
java -cp pig.jar:$HADOOP_HOME/conf
: Improvement
Components: documentation
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Fix For: 0.6.0
In the Pig 0.5 release we have the option of setting the default reduce
parallelism for a script using the following construct:
set default_parallel 100
Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Priority: Minor
Fix For: 0.6.0
I have a Pig script in which I specify the number of records to limit as a long
type.
{code}
A = LOAD '/user/viraj/echo.txt' AS (txt:chararray);
B = L
e/PIG-1084
Project: Pig
Issue Type: Bug
Components: documentation
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Fix For: 0.6.0
Hi all,
We have a host of Join optimizations that have been implemented recently in
Pig to improve performa
Reporter: Viraj Bhat
Fix For: 0.5.0
Hi all,
I am looking at some tips for optimizing Pig programs (Pig Cookbook) using the
PARALLEL keyword.
http://hadoop.apache.org/pig/docs/r0.5.0/cookbook.html#Use+PARALLEL+Keyword
We know that currently Pig 0.5 uses Hadoop 20 (as its default
[
https://issues.apache.org/jira/browse/PIG-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773744#action_12773744
]
Viraj Bhat commented on PIG-1060:
-
Hi Ankur and Richard,
I have a script which demonstr
: Bug
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Fix For: 0.6.0
I have a script which first does a union of these schemas and then does a ORDER
BY of this result.
{code}
f1 = LOAD '1.txt' as (key:chararray, v:chararray);
f2 = LOAD '2.txt' as (k
Type: Bug
Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Fix For: 0.6.0
I have 2 tab separated files, "1.txt" and "2.txt"
$ cat 1.txt
1 2
2 3
$ cat 2.txt
1 2
2
[
https://issues.apache.org/jira/browse/PIG-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat updated PIG-1031:
Description:
I have a data stored in a text file as:
{(4153E765)}
{(AF533765)}
I try reading it using
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.5.0
Reporter: Viraj Bhat
Fix For: 0.5.0, 0.6.0
I have a data stored in a text file as:
{(4153E765)}
{(AF533765)}
I try reading it using PigStorage as:
{code}
A = load
---
Key: PIG-978
URL: https://issues.apache.org/jira/browse/PIG-978
Project: Pig
Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Fix For: 0.6.0
I have Pig script of this
[
https://issues.apache.org/jira/browse/PIG-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758962#action_12758962
]
Viraj Bhat commented on PIG-974:
It turns out that the problem was due to single qu
[
https://issues.apache.org/jira/browse/PIG-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat updated PIG-974:
---
Attachment: studenttab10k
Testdata
> Issues with mv command when used after store when using -param_f
Issue Type: Bug
Affects Versions: 0.6.0
Environment: Hadoop 18 and 20
Reporter: Viraj Bhat
Fix For: 0.6.0
Attachments: studenttab10k
I have a Pig script which moves the final output to another HDFS directory to
signal completion, so that another
[
https://issues.apache.org/jira/browse/PIG-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749722#action_12749722
]
Viraj Bhat commented on PIG-940:
One important point to add:
{code}
localmachine.company
Components: impl
Affects Versions: 0.3.0
Environment: Hadoop 20
Reporter: Viraj Bhat
Fix For: 0.3.0
I have a script which does the following.. access data from a remote HDFS
location (via a HDFS installed at:hdfs://remotemachine1.company.com/ ) [[as I
do
[
https://issues.apache.org/jira/browse/PIG-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat updated PIG-921:
---
Attachment: joinusecase.pig
B.txt
A.txt
Script with test data.
> Strange
1 - 100 of 231 matches
Mail list logo