Re: [DISCUSS] Apache Pig bylaws

2010-10-05 Thread Daniel Dai
+1 Olga Natkovich wrote: I am fine with them "as-is" Olga -Original Message- From: Alan Gates [mailto:ga...@yahoo-inc.com] Sent: Tuesday, October 05, 2010 1:16 PM To: Thejas M Nair Cc: pig-u...@hadoop.apache.org Subject: Re: [DISCUSS] Apache Pig bylaws Comments inlined. However, I

Re: [VOTE] Bylaws for the Pig project

2010-10-07 Thread Daniel Dai
+1 Matt Tanquary wrote: +1 On Thu, Oct 7, 2010 at 10:09 AM, Olga Natkovich wrote: +1 -Original Message- From: Alan Gates [mailto:ga...@yahoo-inc.com] Sent: Thursday, October 07, 2010 9:23 AM To: user@pig.apache.org Subject: [VOTE] Bylaws for the Pig project I propose that we adop

Re: Passing Hadoop Site Configurations in classpath is not recommended for Local Mode

2010-11-17 Thread Daniel Dai
This is a defect in Pig 0.7. Pig 0.8 will automatically exclude hadoop config file in local mode (https://issues.apache.org/jira/browse/PIG-1338) Daniel Michael Sundell wrote: It turns out that Pig calls $HADOOP_HOME/bin/hadoop-config.sh Inside this script this is set by default (among other t

Re: How can I join a dataset that was loaded without an explicit schema definition?? I have 185 columns

2010-11-18 Thread Daniel Dai
It is a bug, which is addressed in Pig 0.8 soon to come. You can use the option "-t PruneColumns" to run it with 0.7. Daniel Mallya, Ashok wrote: Hello, I have a dataset with more than 180 columns to which I want to join (based on two columns) to another. I would like not to have to enu

Re: Access to file name?

2010-11-23 Thread Daniel Dai
I remember we did something similar before. FileSplit.getPath() does have a hold of file name. Here is a sample code: public class PigStorageWithInputPath extends PigStorage { Path path = null; @Override public void prepareToRead(RecordReader reader, PigSplit split) { super.pre

Re: pass configuration param to UDF

2010-11-23 Thread Daniel Dai
The only hook in frontend for a UDF is outputSchema. You can put your property into UDFContext in outputSchema, and read back in exec. public String exec(Tuple input) throws IOException { UDFContext context = UDFContext.getUDFContext(); String a = context.getUDFProperties(this.

Re: Access to file name?

2010-11-23 Thread Daniel Dai
Sure. Thanks Dmitriy Ryaboy wrote: Daniel, Can you drop this on the wiki? -D On Tue, Nov 23, 2010 at 10:27 AM, Daniel Dai wrote: I remember we did something similar before. FileSplit.getPath() does have a hold of file name. Here is a sample code: public class PigStorageWithInputPath

Re: Question on getting TotalCount - X records

2010-11-29 Thread Daniel Dai
Limit only takes constant. So "limit sorted_asc (COUNT(*kws*) - 5)" does not work. You will need a UDF, which returns DataBag. One example is org.apache.pig.builtin.COR, which returns DataBag. Basically, you can write a UDF like this: public class BagTest extends EvalFunc { @Override publi

Re: Correlating timestamps

2010-11-29 Thread Daniel Dai
Inner filter cannot access outside fields. It can only access fields inside the base alias. Writing a UDF to process the whole tuple will work. Preaggregating on userid will help performance since we do not need to aggregate again in Pig job. Daniel -Original Message- From: Marko Mus

Re: pass configuration param to UDF

2010-11-29 Thread Daniel Dai
#x27;t use pig.properties since the property passed to UDF are per pig script specific, not a global setting. How do I pass a -D option to pig script run (pig -f myscript.pig)? Thanks. On Tue, Nov 23, 2010 at 6:12 PM, Daniel Dai wrote: The only hook in frontend for a UDF is outputSchema. You ca

Re: Reading the output of EXPLAIN

2010-11-29 Thread Daniel Dai
Sorry there is no document to describe explain result so far. If you want to find out the alias -> job mapping in the explain result, look at "Map Reduce Plan" section of explain result, every node represents a mapreduce job, and you will find alias included in this job. I would also suggest u

Re: LOAD data USING to parse data in order to obtain the AS as desired.

2010-11-30 Thread Daniel Dai
Try this: table = LOAD stuff AS (n1:chararray, n2:chararray, other irrelevant stuff); pared = foreach table generate n1, n2; grouped = group pared by n1; counted = foreach grouped generate group, (double)COUNT(pared.n2)/COUNT_STAR(pared.n2) as ratio; ordered = order counted by ratio desc; limi

Re: Writing filter function that takes constructor param?

2010-11-30 Thread Daniel Dai
Pig always instantiate UDF using the construct parameter defined in "define" statement. ". CONTAINS_STRINGS(haystack) only pass haystack to CONTAINS_STRINGS.exec(). It will not re-initializing the UDF. Daniel Zach Bailey wrote: I am trying to do what seems like should be a simple task using

Re: Cast bytearray

2010-11-30 Thread Daniel Dai
No, bag assumes all tuples inside it share the same schema. Daniel Matt Tanquary wrote: I have the following bytearray: | F | bytearray| | | {(l1n2), (0,0), (1)} | | | {(l2n2), (0,0), (1)} | | |

Re: decide to use Pig

2010-11-30 Thread Daniel Dai
Since I am a Pig developer, I will say "do everything Pig" :). To be frankly, if these 9 functions are all you want, you can easily convert them into Pig, but you will not get too much if non of 9 functions can utilize existing UDFs. Here is one way you can do it: * Write a UDF LineProcess: p

Re: Find variants of a term in relation A from a field in relation B

2010-12-02 Thread Daniel Dai
Can you convert it into a equal join problem? That's the case mapreduce can handle efficiently. Not sure if it address your problem but provide a sample script. a = load 'A' as (a0:chararray); b = foreach a generate LOWER(a0) as b0; c = load 'B' as (c0:chararray); d = foreach c generate LOWER(

Re: Easy question...difference between this::form and this.form?

2010-12-06 Thread Daniel Dai
After join, cross, foreach flatten, Pig will automatically add "base_alias::" prefix. All other cases use "." Daniel Jonathan Coveney wrote: It's very hard to search for this among the docs because it's so generic, so I thought I'd ask... I'm sure the answer is painfully easy. Taking a look a

Re: FOREACH and FLATTEN Syntax

2010-12-07 Thread Daniel Dai
When you flatten a bag, you get items inside the tuple. The foreach statement is wrong, you should change it to: flat_foo = FOREACH foo GENERATE FLATTEN($0) as (f1, f2, f3, f4, f5); DUMP flat_foo; (a, b, c, d, e) (1, 2, 3, 4, 5) ... (f,g,h,i,j) (6,7,8,9,10) subset_foo = FOREACH flat_foo GENERAT

Re: Custom UDF + Grouping - Unexpected Output

2010-12-08 Thread Daniel Dai
It is not expected. I would think something wrong inside NormalizeListUDF. Make sure you feed bag of tuples which has the schema (int, int) inside your UDF. If you can post your UDF, I can know better. Daniel Michael Moss wrote: Hello, I'm having an issue with a script that uses an EvalFunc

Re: Custom UDF + Grouping - Unexpected Output

2010-12-09 Thread Daniel Dai
INTEGER); fields.add(f1); fields.add(f2); Schema tupleInner = new Schema(fields); Schema.FieldSchema tupleSchema = new Schema.FieldSchema("t1", tupleInner, DataType.TUPLE); Schema bagInner = new Schema(tupleSchema); Schema.FieldSchema bagSchema = new Schema.FieldSchema("bag", bagInner, DataType

Re: should the following query work?

2010-12-09 Thread Daniel Dai
You can slice a bag, but not a bag of bag. If you do want to project x, do it early: A = load 'foo.txt' using PigStorage as (x : chararray, y : int); B = group A by x; B1 = foreach B generate group, A.x as Ax; C = group B1 by group; E = foreach C generate B1.(group, Ax); Daniel Kris Coward wro

Re: Strange problem with Pig 0.7.0 and Hadoop 0.20.2 and Failed to create DataStorage

2010-12-10 Thread Daniel Dai
Looks like hadoop client jar does not match the version of server side. Are you using hadoop 0.20.2 from Apache? Daniel -Original Message- From: felix gao Sent: Thursday, December 09, 2010 5:48 PM To: pig-u...@hadoop.apache.org Subject: Strange problem with Pig 0.7.0 and Hadoop 0.20.2

Re: Strange problem with Pig 0.7.0 and Hadoop 0.20.2 and Failed to create DataStorage

2010-12-10 Thread Daniel Dai
6.1.14.jar native commons-logging-1.0.4.jar jackson-core-asl-1.5.2.jar jsp-2.1 oro-2.0.8.jar please tell me how to get this working with pig Thanks, Felix On Fri, Dec 10, 2010 at 12:20 AM, Daniel Dai wrote: Looks like hadoop client jar does not

Re: Strange problem with Pig 0.7.0 and Hadoop 0.20.2 and Failed to create DataStorage

2010-12-10 Thread Daniel Dai
x On Fri, Dec 10, 2010 at 11:10 AM, Daniel Dai wrote: I didn't use Cloudera distribution before. Pig bundles Apache hadoop 0.20.2 client library. If Cloudera made some changes to hadoop, that could be an issue. One thing you can try is build hadoop20.jar by yourself ( http://behemoth.s

Re: Question on getting TotalCount - X records

2010-12-13 Thread Daniel Dai
name. Not sure what "104" stands for? How do I access each field in topkws? I need to join reportdate,appid and keyword in topkws with another file. Appreciate any help thanks Sheeba On Sun, Nov 28, 2010 at 2:07 AM, Daniel Dai wrote: Limit only takes constant. So "limit sor

Re: Question on getting TotalCount - X records

2010-12-14 Thread Daniel Dai
Yes, actually it is much easier: public Schema outputSchema(Schema input) { return input; } Daniel Sheeba George wrote: Hi Daniel Is it possible to get the schema string from the "input" param rather than hardcoding? Thanks Sheeba On Mon, Dec 13, 2010 at 11:53 PM, Daniel

Re: How to divide by the minimum number in a set in Pig?

2010-12-14 Thread Daniel Dai
This is what you need (on 0.8): loaded = LOAD 'whatever' AS (whatever:chararray, icare:int); min_generated = FOREACH loaded GENERATE icare; min_group = GROUP min_generated ALL; min = FOREACH min_group GENERATE MIN(min_generated) as m; generated = FOREACH loaded GENERATE whatever, icare/min.m; Da

Re: Streaming bags

2010-12-14 Thread Daniel Dai
Unfortunately ForEach inner plan does not support stream now. Here are some choices: 1. You can customize input/output of your perl script. Check http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#DEFINE, search "About Input and Output" 2. Use UDF instead of stream result = FOREACH awesome_i

Re: Help debugging an "unexpected problem during optimization"?

2010-12-15 Thread Daniel Dai
Which version of Pig are you using? I find some syntax error in your script. Is this the script you actually run? Here is the syntax error I find: 1. What is ahh, ooh? 2. Alias cannot be "group", it is a keyword 3. "sort = ORDER counts BY cnt DESC; ". Do you mean "sort = ORDER count BY cnt DESC

Re: increment counters in Pig UDF

2010-12-15 Thread Daniel Dai
Yes, you can use EvalFunc.warn(Object o, String msg, Enum warningEnum). Daniel Dexin Wang wrote: Is it possible to increment a counter in Pig UDF (in either Load/Eval/Store Func). Since we have access to counters using the org.apache.hadoop.mapred.Reporter: http://hadoop.apache.org/common/doc

Re: Help debugging an "unexpected problem during optimization"?

2010-12-16 Thread Daniel Dai
in the actual script. Logic wise, however, should it work? 2010/12/15 Daniel Dai Which version of Pig are you using? I find some syntax error in your script. Is this the script you actually run? Here is the syntax error I find: 1. What is ahh, ooh? 2. Alias cannot be "group", i

Pig 0.8.0 is released!

2010-12-17 Thread Daniel Dai
Pig team is happy to announce Pig 0.8.0 release. Apache Pig provides a high-level data-flow language and execution framework for parallel computation on Hadoop clusters. More details about Pig can be found at http://pig.apache.org/. The highlights of this release are scalar, custom partitioner

Re: Overview of features in Pig 0.8

2010-12-20 Thread Daniel Dai
That's very comprehensive. Can we put a link on Pig wiki? Daniel Dmitriy Ryaboy wrote: Pig users, I wrote up a short overview of some new features in Pig 0.8: https://squarecog.wordpress.com/2010/12/19/new-features-in-apache-pig-0-8/ Cheers -Dmitriy

Re: How to contribute to Pig Wiki ?

2010-12-21 Thread Daniel Dai
Thanks Charles! Everyone can edit Pig wiki. Just go to the wiki page, register an account, and you can edit. We are looking forward to your contribution! Daniel Charles Gonçalves wrote: Guys, I'm starting to use pig (0.8) now and I went to Pig Wiki for some directives and tutorials. I alre

Re: set reducer timeout with pig

2010-12-21 Thread Daniel Dai
True, however there is one bug in 0.7. We fix it in 0.8. https://issues.apache.org/jira/browse/PIG-1760 Daniel Ashutosh Chauhan wrote: Ideally you need not to do that. Pig automatically takes care of progress reporting in its operator. Do you have a pig script which fails because of reporting

Re: Concat on bags?

2010-12-21 Thread Daniel Dai
You will need a UDF to concat bag items. Daniel Matt Tanquary wrote: This set results from a JOIN: (04f4c2fd-8be2-41c3-b045-283de80909ba,1966,2L) (04f4c2fd-8be2-41c3-b045-283de80909ba,3845,2L) Using PIG, I group this and get: (669a4b47-d3c3-4950-9ec0-f1e24064d9d9,{(669a4b47-d3c3-4950-9ec0-f1

Re: UDFContext in 0.8 LoadFunc?

2011-01-05 Thread Daniel Dai
You are right. setLocation is called in frontend, however, it is in the context of InputFormat.getSplits() and it is too late to save anything in UDFContext. Your best bet is relativeToAbsolutePath, which is called in frontend and you can save your stuff in UDFContext. Daniel -Original Me

Re: Pig error: Unable to create input splits

2011-01-11 Thread Daniel Dai
I tried JSON loader you mentioned on 0.7, seems works fine for me. I didn't get the error message you mention. Are you still seeing those errors? Daniel Geoffrey Gallaway wrote: Hello, I'm looking for some clues to help me fix an annoying error I'm getting using Pig. I need to parse a large J

Re: General LoadFunc for nested Json

2011-01-18 Thread Daniel Dai
Currently, we treat all map value as bytearray. However, if you project the map value later in the script, you have chance to cast the map value. Eg: a = load '1.json' using JSONLoader() as (m:map[]); b = foreach a generate (map[])m#'key' as v; c = foreach b generate (long)v; But you cannot c

Re: Projection on the wrong columns, possible bug

2011-01-18 Thread Daniel Dai
Thank you for reporting. I checked the latest 0.8 code, the issue is fixed. We fixed couple of issues since the release of 0.8. You can get those fixes by checking out code from svn and build by yourself: svn co https://svn.apache.org/repos/asf/pig/branches/branch-0.8 Daniel Spyros Kotoulas w

Re: Error 2042 in version 0.8.0

2011-01-24 Thread Daniel Dai
Can you send me the script you are running? Thanks Kaluskar, Sanjay wrote: I am seeing this stack when running a script that runs fine in 0.5.0, 0.6.0 and 0.7.0. Is this a known issue? ERROR 2042: Error in new logical plan. Try -Dpig.usenewlogicalplan=false. org.apache.pig.impl.logica

Re: python udf doesnt work

2011-01-24 Thread Daniel Dai
Put build/ivy/lib/Pig/jython-2.5.0.jar in your classpath (if not there, do ant first). This is a bug we need to fix. Daniel Xiaomeng Wan wrote: Hi, I want to write a python udf to split string into bags #!/usr/bin/python import re

Re: Tuple Question

2011-02-01 Thread Daniel Dai
You cannot get size of tuple using SIZE. Use ARITY instead. Daniel Xavier Stevens wrote: I've written a regular expression EvalFunc similar to ExtractAll except this is called FindAll. It returns a tuple of all strings found that match the given pattern. The syntax looks like this: A = FOREA

Re: StoreFunc Schema

2011-02-01 Thread Daniel Dai
Looks like you should be able to get ResourceSchema in checkSchema, as long as the schema for the alias is not null. Daniel Dan Harvey wrote: Hey, I'm just porting a json StoreFunc class method I wrote from pig 0.6 to pig 0.8 so I can take advantage of the schema that the Store's can use from

Re: Tuple Question

2011-02-01 Thread Daniel Dai
tuple. Expected input is a tuple, * output is an integer. * @deprecated Use {@link SIZE} instead. */ public class ARITY extends EvalFunc { On Tue, Feb 1, 2011 at 12:10 PM, Daniel Dai wrote: You cannot get size of tuple using SIZE. Use ARITY instead. Daniel Xavier Stevens wrote: I've

Re: TupleSize implemented incorrectly?

2011-02-01 Thread Daniel Dai
This is definitely a bug. Can you open a Jira ticket? Daniel Eric Tschetter wrote: I'm looking at Pig's TupleSize implementation and wondering if it's implemented correctly: @Override public Long exec(Tuple input) throws IOException { try{ if (input == null) return

Re: [VOTE] Sponsoring Howl as an Apache Incubator project

2011-02-02 Thread Daniel Dai
+1 Olga Natkovich wrote: +1 -Original Message- From: Alan Gates [mailto:ga...@yahoo-inc.com] Sent: Wednesday, February 02, 2011 1:19 PM To: user@pig.apache.org Subject: [VOTE] Sponsoring Howl as an Apache Incubator project Howl is a table management system built to provide metadata an

Re: Pig 0.8 and PigMix

2011-02-03 Thread Daniel Dai
Yes, we do use it for 0.8. Daniel Renato Marroquín Mogrovejo wrote: Hey all, I wanted to know if the patch from https://issues.apache.org/jira/browse/PIG-200 is safe for Pig0.8, and how to apply it is the same way as shown in the JIRA. Thanks. Renato M.

Re: TupleSize implemented incorrectly?

2011-02-03 Thread Daniel Dai
Thanks, Eric! Eric Tschetter wrote: https://issues.apache.org/jira/browse/PIG-1841 --Eric On Tue, Feb 1, 2011 at 3:03 PM, Daniel Dai wrote: This is definitely a bug. Can you open a Jira ticket? Daniel Eric Tschetter wrote: I'm looking at Pig's TupleSize implemen

Re: Pig 0.8 and PigMix

2011-02-07 Thread Daniel Dai
y suggestion or advice is highly appreciated! Thanks in advance. Renato M. 2011/2/3 Daniel Dai Yes, we do use it for 0.8. Daniel Renato Marroquín Mogrovejo wrote: Hey all, I wanted to know if the patch from https://issues.apache.org/jira/browse/PIG-200 is safe for Pig0.8, and how

Re: Error in new logical plan. Try -Dpig.usenewlogicalplan=false

2011-02-07 Thread Daniel Dai
There could be a bug in new logical plan. First, try to check out from the latest Pig 0.8 from https://svn.apache.org/repos/asf/pig/branches/branch-0.8, see if the issue go away. If not, please report the bug by creating a Jira. Daniel Alex McLintock wrote: I am developing a new UDF for load

Re: Weird stack trace NullableBytesWritable vs NullableText

2011-02-07 Thread Daniel Dai
In Pig 0.9, we will detect group/join key type dynamically (PIG-1277), and will provide typed map. This will solve the map value type problem. Daniel Alex McLintock wrote: I am using maps a lot so I guess this is related to PIG-919 which is closed but not really fixed. https://issues.apache.

Re: Repetitive pig scripts...

2011-02-07 Thread Daniel Dai
Also take a look of http://wiki.apache.org/pig/TuringCompletePig. You can embed Pig into Python script. This feature already checked in into trunk and will be available in 0.9. Daniel Alex McLintock wrote: I'm trying to understand the best way of setting up repeated processing of continuously

Re: Pig 0.8: DESCRIBE and DUMP are in disagreement after a GROUP BY and a FLATTEN

2011-02-16 Thread Daniel Dai
Yes, it is fixed by PIG-998. Doing a describe on trunk will get: data: {f0: chararray,b1::t1: (f1: chararray,f2: int),b3: {(f3: chararray)}} Daniel Alan Gates wrote: The issue here is that describe is incorrectly removing the second level of tuple, even though dump is doing the right thing.

Re: FLATTEN custom bags

2011-02-16 Thread Daniel Dai
Hi, Aniket, Does myLoader implements LoadMetaData? If it does, what schema it returns? I suspect that your schema for bag does not set twolevelaccess flag (though we are working to drop it in 0.9). Daniel Aniket Mokashi wrote: Hi, I have a custom loader that creates and returns a tuple of i

Re: FOREACH GENERATE after if else condition

2011-02-22 Thread Daniel Dai
I just tried your script. I can see the wrong output in 0.8 release, but it is fixed on current 0.8 branch (http://svn.apache.org/repos/asf/pig/branches/branch-0.8). Check out the 0.8 branch and try again. Daniel Bill Graham wrote: Our version (I work with Sonia) is this: Apache Pig version

Re: Bug with flattening a bag element returned from a UDF?

2011-02-23 Thread Daniel Dai
Looks like a bug. Create a Jira for it: https://issues.apache.org/jira/browse/PIG-1866 Thanks, Daniel Ryan Tecco wrote: This seems like it should work: register '/tmp/test-udfs.jar'; /* package test.udfs; import java.io.IOException; import org.apache.pig.EvalFunc;

Re: UDF problem: Java Heap space

2011-02-24 Thread Daniel Dai
Hi, Aniket, What is your Pig script? Is the UDF in map side or reduce side? Daniel Dmitriy Ryaboy wrote: That's a max of 3.3K single-character strings. Even with the java overhead that shouldn't be more than a meg right? none of these should make it out of young gen assuming the list "cats" doe

Re: Problem when executionengine.util.MapRedUtil combine input paths

2011-02-28 Thread Daniel Dai
Not sure if I get your question. In 0.8, Pig combine small files into one map, so it is possible you get less output files. If that is your concern, you can try to disable split combine using "-Dpig.splitCombination=false" Daniel Charles Gonçalves wrote: I tried to process a big number of sm

Re: Problem when executionengine.util.MapRedUtil combine input paths

2011-03-01 Thread Daniel Dai
ntain only data from 2010-10-21. And if I process all the logs with an awk script I got the correct answer. On Mon, Feb 28, 2011 at 3:29 PM, Daniel Dai mailto:jiany...@yahoo-inc.com>> wrote: > Not sure if I get your

Re: PerformanceTimerFactory error?

2011-03-10 Thread Daniel Dai
PerformanceTimerFactory is bundled in pig.jar. I can't think of any reason why Pig cannot find this class. Also the invoking code is in the main code path, so every run will go over it. Do you see this error every time? Try do a clean rebuild and run it again. Daniel On 03/08/2011 12:13 PM, J

Re: STORE with variable?

2011-03-10 Thread Daniel Dai
You may try custom partitioner. http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#partitionby https://issues.apache.org/jira/browse/PIG-282. Daniel On 03/08/2011 02:04 PM, Dexin Wang wrote: Unfortunately, it doesn't work. Seems the same problem as in https://issues.apache.org/jira/browse/P

Re: Converting Pig DataTypes to Java Data Types

2011-03-10 Thread Daniel Dai
Forget your attachment? :) On 03/10/2011 03:04 PM, Jonathan Holloway wrote: I ran into an issue tonight with parsing log lines whereby I had to generate a schema in a user defined function. Part of that involved converting various values into their associated data types, but I couldn't see a w

Re: Schema?

2011-03-17 Thread Daniel Dai
In 0.9, you can use the syntax: m:[{(c:chararray, m1:[chararray])}] Daniel On 03/17/2011 09:18 AM, Alan Gates wrote: Currently there is no way to specify the schema for values in the map up front. You have to cast them when you bring them out of the map. We hope to resolve that in 0.9. Alan.

Re: Unable to access Map within a tuple.

2011-03-17 Thread Daniel Dai
Hi, Deepak, Can you be more specific? I did some simple test and cannot reproduce. What is your query? UDF? Daniel On 03/16/2011 11:24 PM, deepak kumar v wrote: Hi, Below are list of tuples generated after flattening a bag . (day, age, name, address, ['k1#v1','k2#v2']), (12/2,22,deepak,newy

Re: Preserve newlines in field

2011-03-22 Thread Daniel Dai
Which Pig version are you using? If you are using Pig 0.7/0.8, line parsing is handled by hadoop TextInputFormat. You need to override the behavior of TextInputFormat in order to do that. You need to derive a new TextInputFormat which reserve newline characters, feed it to your LoadFunc(getInpu

Re: packages problem with eclipse

2011-03-22 Thread Daniel Dai
If all you need is to write a UDF, you only need to add pig.jar into library of your eclipse project. The wiki page is to set up the environment to develop Pig core code. Daniel On 03/22/2011 06:51 AM, Baraa Mohamad wrote: Hello there, I want to write a UDF in java so I tried to add pig to e

Re: logging in pig

2011-03-22 Thread Daniel Dai
Pig output goes to STDOUT, info goes to STDERR. If you want to log both, use pig > filename 2>&1 You can open a file to log inside UDF, but your log will be in different work nodes. For debugging purpose, usually I print some debugging output to STDOUT, and check the JobTracker UI. Dani

Re: Unable to access Map within a tuple.

2011-03-23 Thread Daniel Dai
in the the tuple But this threw the following error $0.$1 throws java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.Tuple at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:389) Regards, Deepak item 0-3 are of type char array and item4 is a map. I iterate through these tuples

Re: has this been reported? (bug)?

2011-03-24 Thread Daniel Dai
Thanks for reporting. It seems to be a new bug. I will file a Jira. Daniel On 03/24/2011 03:13 PM, Corbin Hoenes wrote: badsite.com127.0.0.1 goodsite.com/1?foo=truegoodsite.com127.0.0.1

Re: has this been reported? (bug)?

2011-03-24 Thread Daniel Dai
Open https://issues.apache.org/jira/browse/PIG-1935 for it. Daniel On 03/24/2011 04:21 PM, Daniel Dai wrote: Thanks for reporting. It seems to be a new bug. I will file a Jira. Daniel On 03/24/2011 03:13 PM, Corbin Hoenes wrote: badsite.com127.0.0.1 goodsite.com/1?foo=true

Re: ORDER statement leads to 'Cannot convert a Unknown to a String'

2011-03-25 Thread Daniel Dai
Sounds like aLoad() feed a data type Pig cannot understand. Daniel On 03/25/2011 10:44 AM, Andreas Paepcke wrote: Hi, Has anyone seen the following? I am getting an error when running ORDER: ERROR 1071: Cannot convert a Unknown to a String The error occurs in DataType.java:885. At the en

Re: ORDER statement leads to 'Cannot convert a Unknown to a String'

2011-03-25 Thread Daniel Dai
When you say "Store D into a tmp file", which store func are you using? On 03/25/2011 10:44 AM, Andreas Paepcke wrote: Hi, Has anyone seen the following? I am getting an error when running ORDER: ERROR 1071: Cannot convert a Unknown to a String The error occurs in DataType.java:885. At th

Re: reducer throttling?

2011-03-25 Thread Daniel Dai
You can control map size by setting "pig.maxCombinedSplitSize", "mapred.max.split.size", "mapred.min.split.size". The first one is pig parameter and last two are hadoop parameters. Daniel On 03/24/2011 06:18 PM, Dexin Wang wrote: Thanks for your explanation Alex. In some cases, there isn't e

Re: Unexpected data type -1 found in stream

2011-03-30 Thread Daniel Dai
The Jira ticket is https://issues.apache.org/jira/browse/PIG-1826 Daniel On 03/29/2011 02:08 PM, Jonathan Coveney wrote: This has definitely been seen before. I made a JIRA ticket back in the day for it. 2011/3/29 Xavier Stevens The value is a mixture of types. I'll go through and spit out w

Re: pig 0.8 : examples in cookbook; quoting key names in map dereferences

2011-04-04 Thread Daniel Dai
Thanks for reporting. Opened https://issues.apache.org/jira/browse/PIG-1960 for that. Daniel On 04/04/2011 09:38 AM, William F. Dowling wrote: I am a new pig and hadoop user, working my way through some simple examples in http://pig.apache.org/docs/r0.8.0/cookbook.html In the section "Reduce

Re: Loading arbitrary objects

2011-04-07 Thread Daniel Dai
You need a LoadFunc. Check http://pig.apache.org/docs/r0.8.0/udf.html#Load+Functions about how to write a LoadFunc. Daniel On 04/06/2011 06:30 PM, Mark wrote: If I wanted to load arbitrary objects into some tuples what classes should I be looking at? Would I need some of storage class? For e

Re: Pig filter against flatten column

2011-04-07 Thread Daniel Dai
Which version of Pig are you using? Previous version of Pig have trouble cast nested types. Can you try latest trunk? Daniel On 04/07/2011 05:26 AM, Badrinarayanan S wrote: Hi, I am trying to run a filter against a column which is the result of a flatten operation. But the filter clause thr

Re: Projecting on a pair of columns inside FOREACH() gives error 2213

2011-04-07 Thread Daniel Dai
This is a real bug. Open https://issues.apache.org/jira/browse/PIG-1978 for it. Thanks. Daniel On 04/07/2011 08:32 AM, william.dowl...@thomsonreuters.com wrote: I have a relation built by grouping the join (TCRaw) of a pair of basic relations (SrcFuid and NewCitationRel): grunt> describe TC

Re: Join issue

2011-04-07 Thread Daniel Dai
null column from different relation does not redeemed as equal in join. This is consistent with SQL. Daniel On 04/07/2011 11:19 AM, Marko Musnjak wrote: Hi, I'm trying to do a left outer join of two files, on eight keys, but it always seems that the keys don't match. I'm able to reproduce thi

Re: Projecting on a pair of columns inside FOREACH() gives error 2213

2011-04-07 Thread Daniel Dai
Before we can get a patch, run Pig with the flag -Dpig.exec.nosecondarykey=true Daniel On 04/07/2011 03:35 PM, Daniel Dai wrote: This is a real bug. Open https://issues.apache.org/jira/browse/PIG-1978 for it. Thanks. Daniel On 04/07/2011 08:32 AM, william.dowl...@thomsonreuters.com wrote

Re: Pig filter against flatten column

2011-04-08 Thread Daniel Dai
Message- From: Daniel Dai [mailto:jiany...@yahoo-inc.com] Sent: Friday, April 08, 2011 3:53 AM To: user@pig.apache.org Subject: Re: Pig filter against flatten column Which version of Pig are you using? Previous version of Pig have trouble cast nested types. Can you try latest trunk? Daniel On 04/07

Re: Dereferencing columns of nested bags

2011-04-08 Thread Daniel Dai
Bag dereference results a bag with less columns. It does not reduce the nested levels. $1 refer to visits: {(timestamp: bytearray,visit: {(Key: chararray,Value: chararray)})} $1.$1 slice the second column of the bag, all it does is drop timestamp column from bag "visits". The bag is still the

Re: Dereferencing columns of nested bags

2011-04-11 Thread Daniel Dai
.$1; This will result in exceptions. While A = load 'some_file' AS (id, visits:bag{visittuple:tuple(timestamp, visit:bag{details:tuple(Key:chararray, Value:chararray)})}); B = FOREACH A generate $1.$0; works. Regards, Mridul On Saturday 09 April 2011 02:39 AM, Daniel Dai wrote: Bag

Re: Error 6015? Any thoughts?

2011-04-15 Thread Daniel Dai
From the stack, it seems the exception is thrown in "order by" statement. Can you post your complete script? Daniel On 04/15/2011 07:46 AM, Brian Adams wrote: I have a python script which does some date/time epoch conversion which is sent to a pig job. However, it seems to error out all the ti

Re: Looking up two fields in a relation with another relation

2011-04-15 Thread Daniel Dai
This is a known bug, it is fixed on 0.8 svn. You can check out from http://svn.apache.org/repos/asf/pig/branches/branch-0.8, or wait for 0.8.1 coming in a few days. Daniel On 04/15/2011 01:45 PM, Jay Hacker wrote: I'm trying to replace a couple of fields in a relation with values looked up in

Re: Looking up two fields in a relation with another relation

2011-04-18 Thread Daniel Dai
I believe it is PIG-1705. Daniel On 04/18/2011 12:02 PM, Jay Hacker wrote: Thanks. Which Jira issue number is it? On Fri, Apr 15, 2011 at 9:07 PM, Daniel Dai wrote: This is a known bug, it is fixed on 0.8 svn. You can check out from http://svn.apache.org/repos/asf/pig/branches/branch-0.8

Re: Looking up two fields in a relation with another relation

2011-04-21 Thread Daniel Dai
e-use from what I saw ... Regards, Mridul On Tuesday 19 April 2011 03:11 AM, Daniel Dai wrote: I believe it is PIG-1705. Daniel On 04/18/2011 12:02 PM, Jay Hacker wrote: Thanks. Which Jira issue number is it? On Fri, Apr 15, 2011 at 9:07 PM, Daniel Daiwrote: This is a known bug, it i

Re: Looking up two fields in a relation with another relation

2011-04-22 Thread Daniel Dai
f I am not wrong, PIG-1705 talks about conflicting alias's in a join : interesting to see how that affects Jay Hacker's issue where there is no alias re-use from what I saw ... Regards, Mridul On Tuesday 19 April 2011 03:11 AM, Daniel Dai wrote: I believe it is PIG-1705. Daniel On 04/1

Re: Error Executing a Fragment Replicated Join

2011-04-27 Thread Daniel Dai
Do you see the failure in the first job (sampling) or second job? Do you see the exception right after the job kick off? If the replicated side is too large, you probably will see a "Java heap exception" rather than job setup exception. It more like an environment issue. Check if you can run r

Re: Error Executing a Fragment Replicated Join

2011-04-27 Thread Daniel Dai
There should be only one job. Thanks Thejas point out. Daniel -Original Message- From: Daniel Dai Sent: Wednesday, April 27, 2011 7:18 PM To: user@pig.apache.org Cc: Renato Marroquín Mogrovejo ; pig-u...@hadoop.apache.org Subject: Re: Error Executing a Fragment Replicated Join Do

Re: Error Executing a Fragment Replicated Join

2011-05-02 Thread Daniel Dai
| |---Load(hdfs://berlin.labbio:54310/user/hadoop/pigData/sr.dat:PigStorage('|')) - 1-87 Global sort: false 2011/4/28 Daniel Dai: There should be only one job. Thanks Thejas point out. Daniel -Original Message- From: Daniel Dai Sent:

Re: release notes?

2011-05-13 Thread Daniel Dai
You can: 1. CHANGE.txt has all the issue fixed in 0.8.1 2. Go to Jira, search for tickets with fix version 0.8.1 Daniel On 05/13/2011 12:36 PM, Corbin Hoenes wrote: Is there a change log for the 0.8.1 release? release notes.txt just mentions "bug fixes"

Re: embedded pig error

2011-05-13 Thread Daniel Dai
Sounds like a hadoop job setup exception. Go to job tracker UI, you may have chance to locate the job and check what happen in job setup. Daniel On 05/11/2011 05:45 PM, Jianting Cao wrote: I'm trying to embed pig into java program. I tried two approaches, none of them works. Approach 1: I fo

Re: Question about immediately projecting on a strsplit() return tuple...

2011-05-17 Thread Daniel Dai
This is an issue in 0.8.1. Open a Jira for it: https://issues.apache.org/jira/browse/PIG-2077. However, in 0.9 it is not an issue. Daniel On 05/17/2011 12:20 PM, Daniel Eklund wrote: I can absolutely open a ticket... Can you confirm though that the expression I am using STRSPLIT(timestamp

Re: How to make a UDF that can take a variable number of arguments while using getArgToFuncMapping?

2011-05-20 Thread Daniel Dai
It is not yet supported. See https://issues.apache.org/jira/browse/PIG-1577 Daniel On 05/20/2011 10:42 AM, Jonathan Coveney wrote: My goal is to be able to make functions like GREATER(a,b,c...) which can take any number of columns, and for each row will give the greater of them. I also want to

Re: can I not project into the group tuple from FILTER?

2011-05-20 Thread Daniel Dai
It seems the stack does not match your statement. Do have another filter which use "not" and "is null" in your script? Daniel On 05/20/2011 12:22 PM, Daniel Eklund wrote: If I can access the implicit 'group' column from within FOREACH like this: GROUPED = GROUP InputRelVar by (firstDim,second

Re: specifying the schema with a LoadFunc

2011-05-20 Thread Daniel Dai
It is changed to LoadMetadata.getSchema() starting 0.7. Daniel On 05/20/2011 02:20 PM, Sweet, Nate wrote: Hi, I have a LoadFunc that loads data using a complex schema. I don't want to have to specify the schema every time. LoadFunc used to have a method "determineSchema". The current docs re

Re: how to operate a map type

2011-05-23 Thread Daniel Dai
I cannot think of a way without writing UDF. You can write two UDF: * GetKey, input a map, output the key of the map * GetValues, input a bag of map, output a bag of map values The script is like: b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m; c = foreach b generate GetKe

Re: specifying the schema with a LoadFunc

2011-05-23 Thread Daniel Dai
I'm just guessing at this point. I must say I am very frustrated with the general lack of (and incorrect) documentation for Pig. I understand the project is evolving rapidly, but IMO documentation should not suffer. -Nate -Original Message- From: Daniel Dai [mailto:jiany...@yahoo

  1   2   3   4   >