[jira] Created: (PIG-1660) Consider passing result of COUNT/COUNT_STAR to LIMIT

2010-09-30 Thread Viraj Bhat (JIRA)
Reporter: Viraj Bhat Fix For: 0.9.0 In realistic scenarios we need to split a dataset into segments by using LIMIT, and like to achieve that goal within the same pig script. Here is a case: {code} A = load '$DATA' using PigStorage(',') as (id, pvs); B

[jira] Created: (PIG-1634) Multiple names for the "group" field

2010-09-20 Thread Viraj Bhat (JIRA)
3.0, 0.2.0, 0.1.0 Reporter: Viraj Bhat I am hoping that in Pig if I type {quote} c = cogroup a by foo, b by bar", the fields c.group, c.foo and c.bar should all map to c.$0 {quote} This would improve the readability of the Pig script. Here's a real usecase: {co

[jira] Created: (PIG-1633) Using an alias withing Nested Foreach causes indeterminate behaviour

2010-09-20 Thread Viraj Bhat (JIRA)
Affects Versions: 0.7.0, 0.6.0, 0.5.0, 0.4.0 Reporter: Viraj Bhat I have created a RANDOMINT function which generates random numbers between (0 and specified value), For example RANDOMINT(4) gives random numbers between 0 and 3 (inclusive) {code} $hadoop fs -cat rand.dat f g h i j k l

[jira] Created: (PIG-1631) Support to 2 level nested foreach

2010-09-20 Thread Viraj Bhat (JIRA)
Support to 2 level nested foreach - Key: PIG-1631 URL: https://issues.apache.org/jira/browse/PIG-1631 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0 Reporter: Viraj Bhat What I

[jira] Created: (PIG-1630) Support param_files to be loaded into HDFS

2010-09-20 Thread Viraj Bhat (JIRA)
: Viraj Bhat I want to place the parameters of a Pig script in a param_file. But instead of this file being in the local file system where I run my java command, I want this to be on HDFS. {code} $ java -cp pig.jar org.apache.pig.Main -param_file hdfs://namenode/paramfile myscript.pig {code

[jira] Commented: (PIG-1615) Return code from Pig is 0 even if the job fails when using -M flag

2010-09-16 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910414#action_12910414 ] Viraj Bhat commented on PIG-1615: - I tested this on Pig 0.8, but with a downloaded ver

[jira] Created: (PIG-1615) Return code from Pig is 0 even if the job fails when using -M flag

2010-09-16 Thread Viraj Bhat (JIRA)
Versions: 0.7.0, 0.6.0 Reporter: Viraj Bhat Fix For: 0.8.0 I have a Pig script of this form, which I used inside a workflow system such as Oozie. {code} A = load '$INPUT' using PigStorage(); store A into '$OUTPUT'; {code} I run this as with Mult

[jira] Updated: (PIG-282) Custom Partitioner

2010-09-15 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-282: --- Release Note: This feature allows to specify Hadoop Partitioner for the following operations: GROUP/COGROUP

[jira] Updated: (PIG-1586) Parameter subsitution using -param option runs into problems when substituing entire pig statements in a shell script (maybe this is a bash problem)

2010-08-31 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-1586: Description: I have a Pig script as a template: {code} register Countwords.jar; A = $INPUT; B = FOREACH A

[jira] Created: (PIG-1586) Parameter subsitution using -param option runs into problems when substituing entire pig statements in a shell script (maybe this is a bash problem)

2010-08-31 Thread Viraj Bhat (JIRA)
) Key: PIG-1586 URL: https://issues.apache.org/jira/browse/PIG-1586 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Viraj Bhat I have a Pig script as a template: {code} register Countwords.jar; A = $INPUT; B

[jira] Created: (PIG-1576) Difference in Semantics between Load statement in Pig and HDFS client on Command line

2010-08-27 Thread Viraj Bhat (JIRA)
: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0, 0.6.0 Reporter: Viraj Bhat Here is my directory structure on HDFS which I want to access using Pig. This is a sample, but in real use case I have more than 100 of these directories. {code} $ hadoop fs

[jira] Created: (PIG-1561) XMLLoader in Piggybank does not support bz2 or gzip compressed XML files

2010-08-23 Thread Viraj Bhat (JIRA)
Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat I have a simple Pig script which uses the XMLLoader after the Piggybank is built. {code} register piggybank.jar; A = load '/user/viraj/capacity-scheduler.xml.gz&#

[jira] Created: (PIG-1547) Piggybank MultiStorage does not scale when processing around 7k records per bucket

2010-08-17 Thread Viraj Bhat (JIRA)
Issue Type: Bug Affects Versions: 0.7.0 Reporter: Viraj Bhat I am trying to use the MultiStorage piggybank UDF {code} register pig-svn/trunk/contrib/piggybank/java/piggybank.jar; A = load '/user/viraj/largebucketinput.txt' using PigStorage('\u0001') as (

[jira] Commented: (PIG-1537) Column pruner causes wrong results when using both Custom Store UDF and PigStorage

2010-08-05 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895858#action_12895858 ] Viraj Bhat commented on PIG-1537: - Hi Olga, I have given the specific script with UDF&#

[jira] Created: (PIG-1537) Column pruner causes wrong results when using both Custom Store UDF and PigStorage

2010-08-04 Thread Viraj Bhat (JIRA)
Issue Type: Bug Affects Versions: 0.7.0 Reporter: Viraj Bhat I have script which is of this pattern and it uses 2 StoreFunc's: {code} register loader.jar register piggy-bank/java/build/storage.jar; %DEFAULT OUTPUTDIR /user/viraj/prunecol/ ss_sc_0 = LOAD '/

[jira] Updated: (PIG-1537) Column pruner causes wrong results when using both Custom Store UDF and PigStorage

2010-08-04 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-1537: Description: I have script which is of this pattern and it uses 2 StoreFunc's: {code} register loade

[jira] Created: (PIG-1529) Equating aliases does not work (B = A)

2010-07-30 Thread Viraj Bhat (JIRA)
Equating aliases does not work (B = A) -- Key: PIG-1529 URL: https://issues.apache.org/jira/browse/PIG-1529 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Viraj

[jira] Created: (PIG-1528) Enable use of similar aliases when doing a join :(ERROR 1108: Duplicate schema alias:)

2010-07-30 Thread Viraj Bhat (JIRA)
: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat I am doing a self join: Input file is tab separated: {code} 1 one 1 uno 2 two 2 dos 3 three 3 tres {code} vi...@machine~/pigscripts

[jira] Commented: (PIG-1345) Link casting errors in POCast to actual lines numbers in Pig script

2010-05-06 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864963#action_12864963 ] Viraj Bhat commented on PIG-1345: - Richard thanks for suggesting a workaround. The e

[jira] Reopened: (PIG-1378) har url not usable in Pig scripts

2010-05-03 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat reopened PIG-1378: - Pradeep, After rerunning with patch the following revision Apache Pig version 0.8.0-dev (r940560) compiled

[jira] Commented: (PIG-798) Schema errors when using PigStorage and none when using BinStorage in FOREACH??

2010-04-26 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861134#action_12861134 ] Viraj Bhat commented on PIG-798: Ashutosh thanks for clarifying, we will wait till that

[jira] Commented: (PIG-1211) Pig script runs half way after which it reports syntax error

2010-04-26 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861106#action_12861106 ] Viraj Bhat commented on PIG-1211: - Ashutosh, yes as more and more people adopt Pig,

[jira] Commented: (PIG-798) Schema errors when using PigStorage and none when using BinStorage in FOREACH??

2010-04-26 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861097#action_12861097 ] Viraj Bhat commented on PIG-798: Hi Ashutosh, Yes that is possible, I know that we ca

[jira] Updated: (PIG-798) Schema errors when using PigStorage and none when using BinStorage in FOREACH??

2010-04-23 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-798: --- Affects Version/s: 0.6.0 0.5.0 0.4.0 0.3.0

[jira] Commented: (PIG-798) Schema errors when using PigStorage and none when using BinStorage in FOREACH??

2010-04-23 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860452#action_12860452 ] Viraj Bhat commented on PIG-798: Hi Ashutosh, The problem here is not about using the

[jira] Updated: (PIG-1339) International characters in column names not supported

2010-04-23 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-1339: Affects Version/s: 0.7.0 0.8.0 > International characters in column names

[jira] Commented: (PIG-1339) International characters in column names not supported

2010-04-23 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860445#action_12860445 ] Viraj Bhat commented on PIG-1339: - Hi Ashutosh this does not work in trunk. I am using

[jira] Commented: (PIG-1211) Pig script runs half way after which it reports syntax error

2010-04-23 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860419#action_12860419 ] Viraj Bhat commented on PIG-1211: - Ashutosh, I feel that the user may not be intereste

[jira] Commented: (PIG-1345) Link casting errors in POCast to actual lines numbers in Pig script

2010-04-23 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860397#action_12860397 ] Viraj Bhat commented on PIG-1345: - Which release will PIG:908 be fixed? Does it guara

[jira] Commented: (PIG-1378) har url not usable in Pig scripts

2010-04-21 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859384#action_12859384 ] Viraj Bhat commented on PIG-1378: - har:// currently works in Pig 0.7 when the hdfs loca

[jira] Resolved: (PIG-829) DECLARE statement stop processing after special characters such as dot "." , "+" "%" etc..

2010-04-14 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat resolved PIG-829. Fix Version/s: 0.7.0 Resolution: Fixed Pig 0.7 yields the correct result. {code} x = LOAD 'some

[jira] Resolved: (PIG-518) LOBinCond exception in LogicalPlanValidationExecutor when providing default values for bag

2010-04-14 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat resolved PIG-518. Fix Version/s: 0.7.0 Resolution: Fixed > LOBinCond exception in LogicalPlanValidationExecutor w

[jira] Commented: (PIG-518) LOBinCond exception in LogicalPlanValidationExecutor when providing default values for bag

2010-04-14 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857157#action_12857157 ] Viraj Bhat commented on PIG-518: The above script generates the following error in Pig

[jira] Updated: (PIG-1378) har url not usable in Pig scripts

2010-04-14 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-1378: Description: I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS

[jira] Created: (PIG-1378) har url not usable in Pig scripts

2010-04-14 Thread Viraj Bhat (JIRA)
: Viraj Bhat Fix For: 0.7.0 I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49

[jira] Created: (PIG-1377) Pig/Zebra fails without proper error message when the mapred.jobtracker.maxtasks.per.job exceeds threshold

2010-04-13 Thread Viraj Bhat (JIRA)
/jira/browse/PIG-1377 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0, 0.7.0 Reporter: Viraj Bhat I have a Zebra script which generates huge amount of mappers around 400K. The mapred.jobtracker.maxtasks.per.job is currently set at

[jira] Created: (PIG-1374) Order by fails with java.lang.String cannot be cast to org.apache.pig.data.DataBag

2010-04-12 Thread Viraj Bhat (JIRA)
Issue Type: Bug Components: impl Affects Versions: 0.6.0, 0.7.0 Reporter: Viraj Bhat Script loads data from BinStorage(), then flattens columns and then sorts on the second column with order descending. The order by fails with the ClassCastException {code

[jira] Resolved: (PIG-756) UDFs should have API for transparently opening and reading files from HDFS or from local file system with only relative path

2010-04-07 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat resolved PIG-756. Resolution: Fixed Fix Version/s: 0.7.0 https://issues.apache.org/jira/browse/PIG-1053 fixes this issue

[jira] Commented: (PIG-756) UDFs should have API for transparently opening and reading files from HDFS or from local file system with only relative path

2010-04-07 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854762#action_12854762 ] Viraj Bhat commented on PIG-756: In Pig 0.7 we have moved local mode of Pig to local mod

[jira] Created: (PIG-1345) Link casting errors in POCast to actual lines numbers in Pig script

2010-03-31 Thread Viraj Bhat (JIRA)
Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat For the purpose of easy debugging, I would be nice to find out where my warnings are coming from is in the pig script. The only known process is to comment out lines in the Pig script and see if these

[jira] Created: (PIG-1343) pig_log file missing even though Main tells it is creating one and an M/R job fails

2010-03-30 Thread Viraj Bhat (JIRA)
Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat There is a particular case where I was running with the latest trunk of Pig. {code} $java -cp pig.jar:/home/path/hadoop20cluster org.apache.pig.Main testcase.pig [main] INFO

[jira] Updated: (PIG-1341) Cannot convert DataByeArray to Chararray and results in FIELD_DISCARDED_TYPE_CONVERSION_FAILED

2010-03-30 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-1341: Component/s: impl Summary: Cannot convert DataByeArray to Chararray and results in

[jira] Created: (PIG-1341) Cannot convert DataByeArray to Chararray and results in FIELD_DISCARDED_TYPE_CONVERSION_FAILED 20

2010-03-30 Thread Viraj Bhat (JIRA)
Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Viraj Bhat Script reads in BinStorage data and tries to convert a column which is in DataByteArray to Chararray. {code} raw = load 'sampledata' using BinStorage() as (col1,col2, col3)

[jira] Created: (PIG-1339) International characters in column names not supported

2010-03-30 Thread Viraj Bhat (JIRA)
Affects Versions: 0.6.0 Reporter: Viraj Bhat There is a particular use-case in which someone specifies a column name to be in International characters. {code} inputdata = load '/user/viraj/inputdata.txt' using PigStorage() as (あいうえお); describe inputdata; dump inputd

[jira] Updated: (PIG-1308) Inifinite loop in JobClient when reading from BinStorage Message: [org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2]

2010-03-18 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-1308: Description: Simple script fails to read files from BinStorage() and fails to submit jobs to JobTracker

[jira] Created: (PIG-1308) Inifinite loop in JobClient when reading from BinStorage Message: [org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2]

2010-03-18 Thread Viraj Bhat (JIRA)
] Key: PIG-1308 URL: https://issues.apache.org/jira/browse/PIG-1308 Project: Pig Issue Type: Bug Reporter: Viraj Bhat Fix For: 0.7.0 Simple script fails to read files from BinStorage() and fails to

[jira] Created: (PIG-1305) Document in Load statement syntax that Pig and underlying M/R does not handle concatenated bz2 and gz files correctly

2010-03-17 Thread Viraj Bhat (JIRA)
://issues.apache.org/jira/browse/PIG-1305 Project: Pig Issue Type: Bug Components: documentation Reporter: Viraj Bhat Fix For: 0.7.0 The Pig Reference Manual needs to be updated: Relational Operators Syntax: LOAD 'data' [USIN

[jira] Created: (PIG-1304) Fail underlying M/R jobs when concatenated gzip and bz2 files are provided as input

2010-03-17 Thread Viraj Bhat (JIRA)
Issue Type: New Feature Affects Versions: 0.6.0 Reporter: Viraj Bhat I have the following txt files which are bzipped: \t = {code} $ bzcat A.txt.bz2 1\ta 2\taa $bzcat B.txt.bz2 1\tb 2\tbb $cat *.bz2 > test/mymerge.bz2 $bzcat test/mymerge.bz2 1\ta 2\taa 1\tb 2\

[jira] Created: (PIG-1281) Detect org.apache.pig.data.DataByteArray cannot be cast to org.apache.pig.data.Tuple type of errors at Compile Type during creation of logical plan

2010-03-05 Thread Viraj Bhat (JIRA)
--- Key: PIG-1281 URL: https://issues.apache.org/jira/browse/PIG-1281 Project: Pig Issue Type: Improvement Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.8.0 This is more of an enhancement request, where we

[jira] Created: (PIG-1278) Type mismatch in key from map: expected org.apache.pig.impl.io.NullableFloatWritable, recieved org.apache.pig.impl.io.NullableText

2010-03-05 Thread Viraj Bhat (JIRA)
URL: https://issues.apache.org/jira/browse/PIG-1278 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.7.0 I have a script which uses Map data, and runs a UDF, which creates random numbers and

[jira] Commented: (PIG-1272) Column pruner causes wrong results

2010-03-02 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840389#action_12840389 ] Viraj Bhat commented on PIG-1272: - Now with Pig 0.7 or trunk we have the following e

[jira] Created: (PIG-1272) Column pruner causes wrong results

2010-03-02 Thread Viraj Bhat (JIRA)
: Viraj Bhat Fix For: 0.7.0 For a simple script the column pruner optimization removes certain columns from the original relation, which results in wrong results. Input file "kv" contains the following columns (tab separated) {code} a 1 a 2 a 3 b 4

[jira] Commented: (PIG-1252) Diamond splitter does not generate correct results when using Multi-query optimization

2010-03-02 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840339#action_12840339 ] Viraj Bhat commented on PIG-1252: - A modified version of the script works, does this hav

[jira] Updated: (PIG-1263) Script producing varying number of records when COGROUPing value of map data type with and without types

2010-02-25 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-1263: Description: I have a Pig script which I am experimenting upon. [[Albeit this is not optimized and can be

[jira] Created: (PIG-1263) Script producing varying number of records when COGROUPing value of map data type with and without types

2010-02-25 Thread Viraj Bhat (JIRA)
/browse/PIG-1263 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.6.0 I have a Pig script which I am experimenting upon. [[Albeit this is not optimized and can be done in variety of

[jira] Updated: (PIG-1252) Diamond splitter does not generate correct results when using Multi-query optimization

2010-02-22 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-1252: Description: I have script which uses split but somehow does not use one of the split branch. The skeleton

[jira] Created: (PIG-1252) Diamond splitter does not generate correct results when using Multi-query optimization

2010-02-22 Thread Viraj Bhat (JIRA)
: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.7.0 I have script which uses split but somehow does not use one of the split branch. The skeleton of the script is as follows {code} loadData = load '/user/viraj/zebradata&#

[jira] Created: (PIG-1247) Error Number makes it hard to debug: ERROR 2999: Unexpected internal error. org.apache.pig.backend.datastorage.DataStorageException cannot be cast to java.lang.Error

2010-02-19 Thread Viraj Bhat (JIRA)
- Key: PIG-1247 URL: https://issues.apache.org/jira/browse/PIG-1247 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For

[jira] Created: (PIG-1243) Passing Complex map types to and from streaming causes a problem

2010-02-18 Thread Viraj Bhat (JIRA)
Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.7.0 I have a program which generates different types of Maps fields and stores it into PigStorage. {code} A = load '/user/viraj/three.txt' using PigStorage(); B = foreach A generate ['a'#'12'] as

[jira] Reopened: (PIG-1194) ERROR 2055: Received Error while processing the map plan

2010-02-10 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat reopened PIG-1194: - Hi Richard, I ran the script attached on the ticket and found out that the map tasks fails with the

[jira] Commented: (PIG-1131) Pig simple join does not work when it contains empty lines

2010-02-08 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831251#action_12831251 ] Viraj Bhat commented on PIG-1131: - Ashutosh I was able to recreate a similar problem u

[jira] Commented: (PIG-1131) Pig simple join does not work when it contains empty lines

2010-02-08 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831248#action_12831248 ] Viraj Bhat commented on PIG-1131: - Olga I marked it as critical since we mention that

[jira] Created: (PIG-1220) Document unknown keywords as missing or to do in future

2010-02-03 Thread Viraj Bhat (JIRA)
: documentation Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.7.0 To get help at the grunt shell I do the following: grunt>touchz 010-02-04 00:59:28,714 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " "touc

[jira] Created: (PIG-1211) Pig script runs half way after which it reports syntax error

2010-01-28 Thread Viraj Bhat (JIRA)
Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.8.0 I have a Pig script which is structured in the following way {code} register cp.jar dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col

[jira] Updated: (PIG-531) Way for explain to show 1 plan at a time

2010-01-27 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-531: --- Fix Version/s: 0.5.0 Hi Olga, I think we have a way to handle it in multi-query optimization. Is it

[jira] Updated: (PIG-940) Cross site HDFS access using the default.fs.name not possible in Pig

2010-01-27 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-940: --- Affects Version/s: (was: 0.3.0) 0.5.0 Fix Version/s: 0.7.0 > Cross site H

[jira] Updated: (PIG-1174) Creation of output path should be done by storage function

2010-01-27 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-1174: Fix Version/s: 0.7.0 > Creation of output path should be done by storage funct

[jira] Created: (PIG-1194) ERROR 2055: Received Error while processing the map plan

2010-01-15 Thread Viraj Bhat (JIRA)
Affects Versions: 0.5.0, 0.6.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.6.0 Attachments: inputdata.txt I have a simple Pig script which takes 3 columns out of which one is null. {code} input = load 'inputdata.txt' using PigSt

[jira] Updated: (PIG-1194) ERROR 2055: Received Error while processing the map plan

2010-01-15 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-1194: Attachment: inputdata.txt Testdata to run with this script > ERROR 2055: Received Error while process

[jira] Commented: (PIG-1187) UTF-8 (international code) breaks with loader when load with schema is specified

2010-01-14 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800315#action_12800315 ] Viraj Bhat commented on PIG-1187: - Hi Jeff, This is specific to the data we are using

[jira] Created: (PIG-1187) UTF-8 (international code) breaks with loader when load with schema is specified

2010-01-13 Thread Viraj Bhat (JIRA)
Issue Type: Bug Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.6.0 I have a set of Pig statements which dump an international dataset. {code} INPUT_OBJECT = load 'internationalcode'; describe INPUT_OBJECT; dump INPUT_OBJECT; {code} Sample output

CFP for 24th International Conference on Supercomputing (ICS 2010, Tsukuba, Japan)

2010-01-05 Thread Viraj Bhat
Dear Hadoop and Pig Users, This is just to let you know that the submission deadline for ICS'10 ( http://www.ics-conference.org/) is two weeks from today. ICS is a premier forum for research in cloud/distributed computing and the most of the work/research we do in CCDI. The CFP of the conferen

[jira] Commented: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM

2009-12-17 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792061#action_12792061 ] Viraj Bhat commented on PIG-1157: - Hi Richard, Thanks for your suggestion, it w

[jira] Updated: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM

2009-12-16 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-1157: Attachment: oomreplicatedjoin.pig replicatedjoinexplain.log Explain output and Pig script

[jira] Created: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM

2009-12-16 Thread Viraj Bhat (JIRA)
Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.6.0 Hi all, I have a script which does 2 replicated joins in succession. Please note that the inputs do not exist on the HDFS. {code} A = LOAD '/tmp/abc&#

[jira] Commented: (PIG-1144) set default_parallelism construct does not set the number of reducers correctly

2009-12-09 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788481#action_12788481 ] Viraj Bhat commented on PIG-1144: - Hi Daniel, Thanks again for your input. This is mor

[jira] Commented: (PIG-1144) set default_parallelism construct does not set the number of reducers correctly

2009-12-09 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788439#action_12788439 ] Viraj Bhat commented on PIG-1144: - Hi Daniel, One more thing to note is that the Last So

[jira] Commented: (PIG-1144) set default_parallelism construct does not set the number of reducers correctly

2009-12-09 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788436#action_12788436 ] Viraj Bhat commented on PIG-1144: - This happens on the real cluster, where the sorting

[jira] Updated: (PIG-1144) set default_parallelism construct does not set the number of reducers correctly

2009-12-09 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-1144: Attachment: brokenparallel.out genericscript_broken_parallel.pig Script and explain output

[jira] Created: (PIG-1144) set default_parallelism construct does not set the number of reducers correctly

2009-12-09 Thread Viraj Bhat (JIRA)
Issue Type: Bug Components: impl Affects Versions: 0.7.0 Environment: Hadoop 20 cluster with multi-node installation Reporter: Viraj Bhat Fix For: 0.7.0 Hi all, I have a Pig script where I set the parallelism using the following set construct: &quo

[jira] Commented: (PIG-1131) Pig simple join does not work when it contains empty lines

2009-12-09 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788387#action_12788387 ] Viraj Bhat commented on PIG-1131: - Hi Pradeep, So the workaround for this is for the

[jira] Updated: (PIG-1131) Pig simple join does not work when it contains empty lines

2009-12-07 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-1131: Attachment: simplejoinscript.pig junk2.txt junk1.txt Dummy datasets and pig

[jira] Created: (PIG-1131) Pig simple join does not work when it contains empty lines

2009-12-07 Thread Viraj Bhat (JIRA)
Affects Versions: 0.7.0 Reporter: Viraj Bhat Priority: Critical Fix For: 0.7.0 I have a simple script, which does a JOIN. {code} input1 = load '/user/viraj/junk1.txt' using PigStorage(' '); describe input1; input2 = load '/us

[jira] Created: (PIG-1124) Unable to set Custom Job Name using the -Dmapred.job.name parameter

2009-12-03 Thread Viraj Bhat (JIRA)
Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Priority: Minor Fix For: 0.6.0 As a Hadoop user I want to control the Job name for my analysis via the command line using the following construct:: java -cp pig.jar:$HADOOP_HOME/conf

[jira] Created: (PIG-1123) Popularize usage of default_parallel keyword in Cookook and Latin Manual

2009-12-03 Thread Viraj Bhat (JIRA)
: Improvement Components: documentation Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.6.0 In the Pig 0.5 release we have the option of setting the default reduce parallelism for a script using the following construct: set default_parallel 100

[jira] Created: (PIG-1101) Pig parser does not recognize its own data type in LIMIT statement

2009-11-20 Thread Viraj Bhat (JIRA)
Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Priority: Minor Fix For: 0.6.0 I have a Pig script in which I specify the number of records to limit as a long type. {code} A = LOAD '/user/viraj/echo.txt' AS (txt:chararray); B = L

[jira] Created: (PIG-1084) Pig CookBook documentation "Take Advantage of Join Optimization" additions:Merge and Skewed Join

2009-11-10 Thread Viraj Bhat (JIRA)
e/PIG-1084 Project: Pig Issue Type: Bug Components: documentation Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.6.0 Hi all, We have a host of Join optimizations that have been implemented recently in Pig to improve performa

[jira] Created: (PIG-1081) PigCookBook use of PARALLEL keyword

2009-11-10 Thread Viraj Bhat (JIRA)
Reporter: Viraj Bhat Fix For: 0.5.0 Hi all, I am looking at some tips for optimizing Pig programs (Pig Cookbook) using the PARALLEL keyword. http://hadoop.apache.org/pig/docs/r0.5.0/cookbook.html#Use+PARALLEL+Keyword We know that currently Pig 0.5 uses Hadoop 20 (as its default

[jira] Commented: (PIG-1060) MultiQuery optimization throws error for multi-level splits

2009-11-04 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773744#action_12773744 ] Viraj Bhat commented on PIG-1060: - Hi Ankur and Richard, I have a script which demonstr

[jira] Created: (PIG-1065) In-determinate behaviour of Union when there are 2 non-matching schema's

2009-10-29 Thread Viraj Bhat (JIRA)
: Bug Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.6.0 I have a script which first does a union of these schemas and then does a ORDER BY of this result. {code} f1 = LOAD '1.txt' as (key:chararray, v:chararray); f2 = LOAD '2.txt' as (k

[jira] Created: (PIG-1064) Behvaiour of COGROUP with and without schema when using "*" operator

2009-10-29 Thread Viraj Bhat (JIRA)
Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.6.0 I have 2 tab separated files, "1.txt" and "2.txt" $ cat 1.txt 1 2 2 3 $ cat 2.txt 1 2 2

[jira] Updated: (PIG-1031) PigStorage interpreting chararray/bytearray for a tuple element inside a bag as float or double

2009-10-20 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-1031: Description: I have a data stored in a text file as: {(4153E765)} {(AF533765)} I try reading it using

[jira] Created: (PIG-1031) PigStorage interpreting chararray/bytearray for a tuple element inside a bag as float or double

2009-10-20 Thread Viraj Bhat (JIRA)
Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.5.0 Reporter: Viraj Bhat Fix For: 0.5.0, 0.6.0 I have a data stored in a text file as: {(4153E765)} {(AF533765)} I try reading it using PigStorage as: {code} A = load

[jira] Created: (PIG-978) ERROR 2100 (hdfs://localhost/tmp/temp175740929/tmp-1126214010 does not exist) and ERROR 2999: (Unexpected internal error. null) when using Multi-Query optimization

2009-09-25 Thread Viraj Bhat (JIRA)
--- Key: PIG-978 URL: https://issues.apache.org/jira/browse/PIG-978 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.6.0 I have Pig script of this

[jira] Commented: (PIG-974) Issues with mv command when used after store when using -param_file/-param options

2009-09-23 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758962#action_12758962 ] Viraj Bhat commented on PIG-974: It turns out that the problem was due to single qu

[jira] Updated: (PIG-974) Issues with mv command when used after store when using -param_file/-param options

2009-09-23 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-974: --- Attachment: studenttab10k Testdata > Issues with mv command when used after store when using -param_f

[jira] Created: (PIG-974) Issues with mv command when used after store when using -param_file/-param options

2009-09-23 Thread Viraj Bhat (JIRA)
Issue Type: Bug Affects Versions: 0.6.0 Environment: Hadoop 18 and 20 Reporter: Viraj Bhat Fix For: 0.6.0 Attachments: studenttab10k I have a Pig script which moves the final output to another HDFS directory to signal completion, so that another

[jira] Commented: (PIG-940) Cross site HDFS access using the default.fs.name not possible in Pig

2009-08-31 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749722#action_12749722 ] Viraj Bhat commented on PIG-940: One important point to add: {code} localmachine.company

[jira] Created: (PIG-940) Cross site HDFS access using the default.fs.name not possible in Pig

2009-08-31 Thread Viraj Bhat (JIRA)
Components: impl Affects Versions: 0.3.0 Environment: Hadoop 20 Reporter: Viraj Bhat Fix For: 0.3.0 I have a script which does the following.. access data from a remote HDFS location (via a HDFS installed at:hdfs://remotemachine1.company.com/ ) [[as I do

[jira] Updated: (PIG-921) Strange use case for Join which produces different results in local and map reduce mode

2009-08-13 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-921: --- Attachment: joinusecase.pig B.txt A.txt Script with test data. > Strange

  1   2   3   >