+1
Olga Natkovich wrote:
I am fine with them "as-is"
Olga
-Original Message-
From: Alan Gates [mailto:ga...@yahoo-inc.com]
Sent: Tuesday, October 05, 2010 1:16 PM
To: Thejas M Nair
Cc: pig-u...@hadoop.apache.org
Subject: Re: [DISCUSS] Apache Pig bylaws
Comments inlined. However, I
+1
Matt Tanquary wrote:
+1
On Thu, Oct 7, 2010 at 10:09 AM, Olga Natkovich wrote:
+1
-Original Message-
From: Alan Gates [mailto:ga...@yahoo-inc.com]
Sent: Thursday, October 07, 2010 9:23 AM
To: user@pig.apache.org
Subject: [VOTE] Bylaws for the Pig project
I propose that we adop
This is a defect in Pig 0.7. Pig 0.8 will automatically exclude hadoop
config file in local mode (https://issues.apache.org/jira/browse/PIG-1338)
Daniel
Michael Sundell wrote:
It turns out that Pig calls $HADOOP_HOME/bin/hadoop-config.sh
Inside this script this is set by default (among other t
It is a bug, which is addressed in Pig 0.8 soon to come. You can use the
option "-t PruneColumns" to run it with 0.7.
Daniel
Mallya, Ashok wrote:
Hello,
I have a dataset with more than 180 columns to which I want to join (based on two columns) to another.
I would like not to have to enu
I remember we did something similar before. FileSplit.getPath() does
have a hold of file name.
Here is a sample code:
public class PigStorageWithInputPath extends PigStorage {
Path path = null;
@Override
public void prepareToRead(RecordReader reader, PigSplit split) {
super.pre
The only hook in frontend for a UDF is outputSchema. You can put your
property into UDFContext in outputSchema, and read back in exec.
public String exec(Tuple input) throws IOException {
UDFContext context = UDFContext.getUDFContext();
String a =
context.getUDFProperties(this.
Sure.
Thanks
Dmitriy Ryaboy wrote:
Daniel,
Can you drop this on the wiki?
-D
On Tue, Nov 23, 2010 at 10:27 AM, Daniel Dai wrote:
I remember we did something similar before. FileSplit.getPath() does have a
hold of file name.
Here is a sample code:
public class PigStorageWithInputPath
Limit only takes constant. So "limit sorted_asc (COUNT(*kws*) - 5)" does not
work.
You will need a UDF, which returns DataBag. One example is
org.apache.pig.builtin.COR, which returns DataBag. Basically, you can write
a UDF like this:
public class BagTest extends EvalFunc {
@Override
publi
Inner filter cannot access outside fields. It can only access fields inside
the base alias. Writing a UDF to process the whole tuple will work.
Preaggregating on userid will help performance since we do not need to
aggregate again in Pig job.
Daniel
-Original Message-
From: Marko Mus
#x27;t use pig.properties since the property passed to UDF
are per pig script specific, not a global setting. How do I pass a -D option
to pig script run (pig -f myscript.pig)?
Thanks.
On Tue, Nov 23, 2010 at 6:12 PM, Daniel Dai wrote:
The only hook in frontend for a UDF is outputSchema. You ca
Sorry there is no document to describe explain result so far. If you
want to find out the alias -> job mapping in the explain result, look at
"Map Reduce Plan" section of explain result, every node represents a
mapreduce job, and you will find alias included in this job.
I would also suggest u
Try this:
table = LOAD stuff AS (n1:chararray, n2:chararray, other irrelevant stuff);
pared = foreach table generate n1, n2;
grouped = group pared by n1;
counted = foreach grouped generate group,
(double)COUNT(pared.n2)/COUNT_STAR(pared.n2) as ratio;
ordered = order counted by ratio desc;
limi
Pig always instantiate UDF using the construct parameter defined in
"define" statement. ". CONTAINS_STRINGS(haystack) only pass haystack to
CONTAINS_STRINGS.exec(). It will not re-initializing the UDF.
Daniel
Zach Bailey wrote:
I am trying to do what seems like should be a simple task using
No, bag assumes all tuples inside it share the same schema.
Daniel
Matt Tanquary wrote:
I have the following bytearray:
| F | bytearray|
| | {(l1n2), (0,0), (1)} |
| | {(l2n2), (0,0), (1)} |
| |
Since I am a Pig developer, I will say "do everything Pig" :).
To be frankly, if these 9 functions are all you want, you can easily
convert them into Pig, but you will not get too much if non of 9
functions can utilize existing UDFs. Here is one way you can do it:
* Write a UDF LineProcess:
p
Can you convert it into a equal join problem? That's the case mapreduce
can handle efficiently. Not sure if it address your problem but provide
a sample script.
a = load 'A' as (a0:chararray);
b = foreach a generate LOWER(a0) as b0;
c = load 'B' as (c0:chararray);
d = foreach c generate LOWER(
After join, cross, foreach flatten, Pig will automatically add
"base_alias::" prefix. All other cases use "."
Daniel
Jonathan Coveney wrote:
It's very hard to search for this among the docs because it's so generic, so
I thought I'd ask... I'm sure the answer is painfully easy.
Taking a look a
When you flatten a bag, you get items inside the tuple. The foreach
statement is wrong, you should change it to:
flat_foo = FOREACH foo GENERATE FLATTEN($0) as (f1, f2, f3, f4, f5);
DUMP flat_foo;
(a, b, c, d, e)
(1, 2, 3, 4, 5)
...
(f,g,h,i,j)
(6,7,8,9,10)
subset_foo = FOREACH flat_foo GENERAT
It is not expected. I would think something wrong inside
NormalizeListUDF. Make sure you feed bag of tuples which has the schema
(int, int) inside your UDF. If you can post your UDF, I can know better.
Daniel
Michael Moss wrote:
Hello,
I'm having an issue with a script that uses an EvalFunc
INTEGER);
fields.add(f1);
fields.add(f2);
Schema tupleInner = new Schema(fields);
Schema.FieldSchema tupleSchema = new Schema.FieldSchema("t1", tupleInner,
DataType.TUPLE);
Schema bagInner = new Schema(tupleSchema);
Schema.FieldSchema bagSchema = new Schema.FieldSchema("bag", bagInner,
DataType
You can slice a bag, but not a bag of bag. If you do want to project x,
do it early:
A = load 'foo.txt' using PigStorage as (x : chararray, y : int);
B = group A by x;
B1 = foreach B generate group, A.x as Ax;
C = group B1 by group;
E = foreach C generate B1.(group, Ax);
Daniel
Kris Coward wro
Looks like hadoop client jar does not match the version of server side. Are
you using hadoop 0.20.2 from Apache?
Daniel
-Original Message-
From: felix gao
Sent: Thursday, December 09, 2010 5:48 PM
To: pig-u...@hadoop.apache.org
Subject: Strange problem with Pig 0.7.0 and Hadoop 0.20.2
6.1.14.jar native
commons-logging-1.0.4.jar jackson-core-asl-1.5.2.jar jsp-2.1
oro-2.0.8.jar
please tell me how to get this working with pig
Thanks,
Felix
On Fri, Dec 10, 2010 at 12:20 AM, Daniel Dai wrote:
Looks like hadoop client jar does not
x
On Fri, Dec 10, 2010 at 11:10 AM, Daniel Dai wrote:
I didn't use Cloudera distribution before. Pig bundles Apache hadoop 0.20.2
client library. If Cloudera made some changes to hadoop, that could be an
issue.
One thing you can try is build hadoop20.jar by yourself (
http://behemoth.s
name. Not sure what "104" stands for?
How do I access each field in topkws? I need to join reportdate,appid and
keyword in topkws with another file.
Appreciate any help
thanks
Sheeba
On Sun, Nov 28, 2010 at 2:07 AM, Daniel Dai wrote:
Limit only takes constant. So "limit sor
Yes, actually it is much easier:
public Schema outputSchema(Schema input) {
return input;
}
Daniel
Sheeba George wrote:
Hi Daniel
Is it possible to get the schema string from the "input" param rather than
hardcoding?
Thanks
Sheeba
On Mon, Dec 13, 2010 at 11:53 PM, Daniel
This is what you need (on 0.8):
loaded = LOAD 'whatever' AS (whatever:chararray, icare:int);
min_generated = FOREACH loaded GENERATE icare;
min_group = GROUP min_generated ALL;
min = FOREACH min_group GENERATE MIN(min_generated) as m;
generated = FOREACH loaded GENERATE whatever, icare/min.m;
Da
Unfortunately ForEach inner plan does not support stream now. Here are
some choices:
1. You can customize input/output of your perl script. Check
http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#DEFINE, search
"About Input and Output"
2. Use UDF instead of stream
result = FOREACH awesome_i
Which version of Pig are you using? I find some syntax error in your
script. Is this the script you actually run?
Here is the syntax error I find:
1. What is ahh, ooh?
2. Alias cannot be "group", it is a keyword
3. "sort = ORDER counts BY cnt DESC; ". Do you mean "sort = ORDER count
BY cnt DESC
Yes, you can use EvalFunc.warn(Object o, String msg, Enum warningEnum).
Daniel
Dexin Wang wrote:
Is it possible to increment a counter in Pig UDF (in either Load/Eval/Store
Func).
Since we have access to counters using the
org.apache.hadoop.mapred.Reporter:
http://hadoop.apache.org/common/doc
in the
actual script.
Logic wise, however, should it work?
2010/12/15 Daniel Dai
Which version of Pig are you using? I find some syntax error in your
script. Is this the script you actually run?
Here is the syntax error I find:
1. What is ahh, ooh?
2. Alias cannot be "group", i
Pig team is happy to announce Pig 0.8.0 release.
Apache Pig provides a high-level data-flow language and execution
framework for parallel computation on Hadoop clusters.
More details about Pig can be found at http://pig.apache.org/.
The highlights of this release are scalar, custom partitioner
That's very comprehensive. Can we put a link on Pig wiki?
Daniel
Dmitriy Ryaboy wrote:
Pig users,
I wrote up a short overview of some new features in Pig 0.8:
https://squarecog.wordpress.com/2010/12/19/new-features-in-apache-pig-0-8/
Cheers
-Dmitriy
Thanks Charles! Everyone can edit Pig wiki. Just go to the wiki page,
register an account, and you can edit. We are looking forward to your
contribution!
Daniel
Charles Gonçalves wrote:
Guys,
I'm starting to use pig (0.8) now and I went to Pig Wiki for some
directives and tutorials.
I alre
True, however there is one bug in 0.7. We fix it in 0.8.
https://issues.apache.org/jira/browse/PIG-1760
Daniel
Ashutosh Chauhan wrote:
Ideally you need not to do that. Pig automatically takes care of
progress reporting in its operator. Do you have a pig script which
fails because of reporting
You will need a UDF to concat bag items.
Daniel
Matt Tanquary wrote:
This set results from a JOIN:
(04f4c2fd-8be2-41c3-b045-283de80909ba,1966,2L)
(04f4c2fd-8be2-41c3-b045-283de80909ba,3845,2L)
Using PIG, I group this and get:
(669a4b47-d3c3-4950-9ec0-f1e24064d9d9,{(669a4b47-d3c3-4950-9ec0-f1
You are right. setLocation is called in frontend, however, it is in the
context of InputFormat.getSplits() and it is too late to save anything in
UDFContext. Your best bet is relativeToAbsolutePath, which is called in
frontend and you can save your stuff in UDFContext.
Daniel
-Original Me
I tried JSON loader you mentioned on 0.7, seems works fine for me. I
didn't get the error message you mention. Are you still seeing those errors?
Daniel
Geoffrey Gallaway wrote:
Hello, I'm looking for some clues to help me fix an annoying error I'm
getting using Pig.
I need to parse a large J
Currently, we treat all map value as bytearray. However, if you project
the map value later in the script, you have chance to cast the map
value. Eg:
a = load '1.json' using JSONLoader() as (m:map[]);
b = foreach a generate (map[])m#'key' as v;
c = foreach b generate (long)v;
But you cannot c
Thank you for reporting. I checked the latest 0.8 code, the issue is
fixed. We fixed couple of issues since the release of 0.8. You can get
those fixes by checking out code from svn and build by yourself:
svn co https://svn.apache.org/repos/asf/pig/branches/branch-0.8
Daniel
Spyros Kotoulas w
Can you send me the script you are running?
Thanks
Kaluskar, Sanjay wrote:
I am seeing this stack when running a script that runs fine in 0.5.0,
0.6.0 and 0.7.0. Is this a known issue?
ERROR 2042: Error in new logical plan. Try
-Dpig.usenewlogicalplan=false.
org.apache.pig.impl.logica
Put build/ivy/lib/Pig/jython-2.5.0.jar in your classpath (if not there,
do ant first). This is a bug we need to fix.
Daniel
Xiaomeng Wan wrote:
Hi,
I want to write a python udf to split string into bags
#!/usr/bin/python
import re
You cannot get size of tuple using SIZE. Use ARITY instead.
Daniel
Xavier Stevens wrote:
I've written a regular expression EvalFunc similar to ExtractAll except
this is called FindAll. It returns a tuple of all strings found that
match the given pattern. The syntax looks like this:
A = FOREA
Looks like you should be able to get ResourceSchema in checkSchema, as
long as the schema for the alias is not null.
Daniel
Dan Harvey wrote:
Hey,
I'm just porting a json StoreFunc class method I wrote from pig 0.6 to pig
0.8 so I can take advantage of the schema that the Store's can use from
tuple. Expected input is a tuple,
* output is an integer.
* @deprecated Use {@link SIZE} instead.
*/
public class ARITY extends EvalFunc {
On Tue, Feb 1, 2011 at 12:10 PM, Daniel Dai wrote:
You cannot get size of tuple using SIZE. Use ARITY instead.
Daniel
Xavier Stevens wrote:
I've
This is definitely a bug. Can you open a Jira ticket?
Daniel
Eric Tschetter wrote:
I'm looking at Pig's TupleSize implementation and wondering if it's
implemented correctly:
@Override
public Long exec(Tuple input) throws IOException {
try{
if (input == null) return
+1
Olga Natkovich wrote:
+1
-Original Message-
From: Alan Gates [mailto:ga...@yahoo-inc.com]
Sent: Wednesday, February 02, 2011 1:19 PM
To: user@pig.apache.org
Subject: [VOTE] Sponsoring Howl as an Apache Incubator project
Howl is a table management system built to provide metadata an
Yes, we do use it for 0.8.
Daniel
Renato Marroquín Mogrovejo wrote:
Hey all,
I wanted to know if the patch from
https://issues.apache.org/jira/browse/PIG-200 is safe for Pig0.8, and how to
apply it is the same way as shown in the JIRA.
Thanks.
Renato M.
Thanks, Eric!
Eric Tschetter wrote:
https://issues.apache.org/jira/browse/PIG-1841
--Eric
On Tue, Feb 1, 2011 at 3:03 PM, Daniel Dai wrote:
This is definitely a bug. Can you open a Jira ticket?
Daniel
Eric Tschetter wrote:
I'm looking at Pig's TupleSize implemen
y suggestion or advice is highly appreciated!
Thanks in advance.
Renato M.
2011/2/3 Daniel Dai
Yes, we do use it for 0.8.
Daniel
Renato Marroquín Mogrovejo wrote:
Hey all,
I wanted to know if the patch from
https://issues.apache.org/jira/browse/PIG-200 is safe for Pig0.8, and
how
There could be a bug in new logical plan. First, try to check out from
the latest Pig 0.8 from
https://svn.apache.org/repos/asf/pig/branches/branch-0.8, see if the
issue go away. If not, please report the bug by creating a Jira.
Daniel
Alex McLintock wrote:
I am developing a new UDF for load
In Pig 0.9, we will detect group/join key type dynamically (PIG-1277),
and will provide typed map. This will solve the map value type problem.
Daniel
Alex McLintock wrote:
I am using maps a lot so I guess this is related to PIG-919 which is closed
but not really fixed.
https://issues.apache.
Also take a look of http://wiki.apache.org/pig/TuringCompletePig. You
can embed Pig into Python script. This feature already checked in into
trunk and will be available in 0.9.
Daniel
Alex McLintock wrote:
I'm trying to understand the best way of setting up repeated processing of
continuously
Yes, it is fixed by PIG-998. Doing a describe on trunk will get:
data: {f0: chararray,b1::t1: (f1: chararray,f2: int),b3: {(f3: chararray)}}
Daniel
Alan Gates wrote:
The issue here is that describe is incorrectly removing the second
level of tuple, even though dump is doing the right thing.
Hi, Aniket,
Does myLoader implements LoadMetaData? If it does, what schema it
returns? I suspect that your schema for bag does not set twolevelaccess
flag (though we are working to drop it in 0.9).
Daniel
Aniket Mokashi wrote:
Hi,
I have a custom loader that creates and returns a tuple of i
I just tried your script. I can see the wrong output in 0.8 release, but
it is fixed on current 0.8 branch
(http://svn.apache.org/repos/asf/pig/branches/branch-0.8). Check out the
0.8 branch and try again.
Daniel
Bill Graham wrote:
Our version (I work with Sonia) is this:
Apache Pig version
Looks like a bug. Create a Jira for it:
https://issues.apache.org/jira/browse/PIG-1866
Thanks,
Daniel
Ryan Tecco wrote:
This seems like it should work:
register '/tmp/test-udfs.jar';
/*
package test.udfs;
import java.io.IOException;
import org.apache.pig.EvalFunc;
Hi, Aniket,
What is your Pig script? Is the UDF in map side or reduce side?
Daniel
Dmitriy Ryaboy wrote:
That's a max of 3.3K single-character strings. Even with the java overhead
that shouldn't be more than a meg right?
none of these should make it out of young gen assuming the list "cats"
doe
Not sure if I get your question. In 0.8, Pig combine small files into
one map, so it is possible you get less output files. If that is your
concern, you can try to disable split combine using
"-Dpig.splitCombination=false"
Daniel
Charles Gonçalves wrote:
I tried to process a big number of sm
ntain only data from 2010-10-21.
And if I process all the logs with an awk script I got the
correct answer.
On Mon, Feb 28, 2011 at 3:29 PM, Daniel Dai
mailto:jiany...@yahoo-inc.com>>
wrote:
> Not sure if I get your
PerformanceTimerFactory is bundled in pig.jar. I can't think of any
reason why Pig cannot find this class. Also the invoking code is in the
main code path, so every run will go over it. Do you see this error
every time? Try do a clean rebuild and run it again.
Daniel
On 03/08/2011 12:13 PM, J
You may try custom partitioner.
http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#partitionby
https://issues.apache.org/jira/browse/PIG-282.
Daniel
On 03/08/2011 02:04 PM, Dexin Wang wrote:
Unfortunately, it doesn't work.
Seems the same problem as in https://issues.apache.org/jira/browse/P
Forget your attachment? :)
On 03/10/2011 03:04 PM, Jonathan Holloway wrote:
I ran into an issue tonight with parsing log lines whereby I had to
generate a schema in a user defined function.
Part of that involved converting various values into their associated
data types, but I couldn't see a w
In 0.9, you can use the syntax:
m:[{(c:chararray, m1:[chararray])}]
Daniel
On 03/17/2011 09:18 AM, Alan Gates wrote:
Currently there is no way to specify the schema for values in the map
up front. You have to cast them when you bring them out of the map.
We hope to resolve that in 0.9.
Alan.
Hi, Deepak,
Can you be more specific? I did some simple test and cannot reproduce.
What is your query? UDF?
Daniel
On 03/16/2011 11:24 PM, deepak kumar v wrote:
Hi,
Below are list of tuples generated after flattening a bag .
(day, age, name, address, ['k1#v1','k2#v2']),
(12/2,22,deepak,newy
Which Pig version are you using? If you are using Pig 0.7/0.8, line
parsing is handled by hadoop TextInputFormat. You need to override the
behavior of TextInputFormat in order to do that. You need to derive a
new TextInputFormat which reserve newline characters, feed it to your
LoadFunc(getInpu
If all you need is to write a UDF, you only need to add pig.jar into
library of your eclipse project. The wiki page is to set up the
environment to develop Pig core code.
Daniel
On 03/22/2011 06:51 AM, Baraa Mohamad wrote:
Hello there,
I want to write a UDF in java so I tried to add pig to e
Pig output goes to STDOUT, info goes to STDERR. If you want to log both,
use pig > filename 2>&1
You can open a file to log inside UDF, but your log will be in different
work nodes. For debugging purpose, usually I print some debugging output
to STDOUT, and check the JobTracker UI.
Dani
in the the tuple
But this threw the following error
$0.$1 throws java.lang.ClassCastException: java.lang.String cannot be cast
to org.apache.pig.data.Tuple
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:389)
Regards,
Deepak
item 0-3 are of type char array and item4 is a map.
I iterate through these tuples
Thanks for reporting. It seems to be a new bug. I will file a Jira.
Daniel
On 03/24/2011 03:13 PM, Corbin Hoenes wrote:
badsite.com127.0.0.1
goodsite.com/1?foo=truegoodsite.com127.0.0.1
Open https://issues.apache.org/jira/browse/PIG-1935 for it.
Daniel
On 03/24/2011 04:21 PM, Daniel Dai wrote:
Thanks for reporting. It seems to be a new bug. I will file a Jira.
Daniel
On 03/24/2011 03:13 PM, Corbin Hoenes wrote:
badsite.com127.0.0.1
goodsite.com/1?foo=true
Sounds like aLoad() feed a data type Pig cannot understand.
Daniel
On 03/25/2011 10:44 AM, Andreas Paepcke wrote:
Hi,
Has anyone seen the following?
I am getting an error when running ORDER:
ERROR 1071: Cannot convert a Unknown to a String
The error occurs in DataType.java:885. At the en
When you say "Store D into a tmp file", which store func are you using?
On 03/25/2011 10:44 AM, Andreas Paepcke wrote:
Hi,
Has anyone seen the following?
I am getting an error when running ORDER:
ERROR 1071: Cannot convert a Unknown to a String
The error occurs in DataType.java:885. At th
You can control map size by setting "pig.maxCombinedSplitSize",
"mapred.max.split.size", "mapred.min.split.size". The first one is pig
parameter and last two are hadoop parameters.
Daniel
On 03/24/2011 06:18 PM, Dexin Wang wrote:
Thanks for your explanation Alex.
In some cases, there isn't e
The Jira ticket is https://issues.apache.org/jira/browse/PIG-1826
Daniel
On 03/29/2011 02:08 PM, Jonathan Coveney wrote:
This has definitely been seen before. I made a JIRA ticket back in the day
for it.
2011/3/29 Xavier Stevens
The value is a mixture of types. I'll go through and spit out w
Thanks for reporting. Opened
https://issues.apache.org/jira/browse/PIG-1960 for that.
Daniel
On 04/04/2011 09:38 AM, William F. Dowling wrote:
I am a new pig and hadoop user, working my way through some simple
examples in http://pig.apache.org/docs/r0.8.0/cookbook.html
In the section "Reduce
You need a LoadFunc. Check
http://pig.apache.org/docs/r0.8.0/udf.html#Load+Functions about how to
write a LoadFunc.
Daniel
On 04/06/2011 06:30 PM, Mark wrote:
If I wanted to load arbitrary objects into some tuples what classes
should I be looking at? Would I need some of storage class?
For e
Which version of Pig are you using? Previous version of Pig have trouble
cast nested types. Can you try latest trunk?
Daniel
On 04/07/2011 05:26 AM, Badrinarayanan S wrote:
Hi,
I am trying to run a filter against a column which is the result of a
flatten operation. But the filter clause thr
This is a real bug. Open https://issues.apache.org/jira/browse/PIG-1978
for it. Thanks.
Daniel
On 04/07/2011 08:32 AM, william.dowl...@thomsonreuters.com wrote:
I have a relation built by grouping the join (TCRaw) of a pair of basic
relations (SrcFuid and NewCitationRel):
grunt> describe TC
null column from different relation does not redeemed as equal in join.
This is consistent with SQL.
Daniel
On 04/07/2011 11:19 AM, Marko Musnjak wrote:
Hi,
I'm trying to do a left outer join of two files, on eight keys, but it
always seems that the keys don't match. I'm able to reproduce thi
Before we can get a patch, run Pig with the flag
-Dpig.exec.nosecondarykey=true
Daniel
On 04/07/2011 03:35 PM, Daniel Dai wrote:
This is a real bug. Open https://issues.apache.org/jira/browse/PIG-1978
for it. Thanks.
Daniel
On 04/07/2011 08:32 AM, william.dowl...@thomsonreuters.com wrote
Message-
From: Daniel Dai [mailto:jiany...@yahoo-inc.com]
Sent: Friday, April 08, 2011 3:53 AM
To: user@pig.apache.org
Subject: Re: Pig filter against flatten column
Which version of Pig are you using? Previous version of Pig have trouble
cast nested types. Can you try latest trunk?
Daniel
On 04/07
Bag dereference results a bag with less columns. It does not reduce the
nested levels.
$1 refer to visits: {(timestamp: bytearray,visit: {(Key:
chararray,Value: chararray)})}
$1.$1 slice the second column of the bag, all it does is drop timestamp
column from bag "visits". The bag is still the
.$1;
This will result in exceptions.
While
A = load 'some_file' AS (id, visits:bag{visittuple:tuple(timestamp,
visit:bag{details:tuple(Key:chararray, Value:chararray)})});
B = FOREACH A generate $1.$0;
works.
Regards,
Mridul
On Saturday 09 April 2011 02:39 AM, Daniel Dai wrote:
Bag
From the stack, it seems the exception is thrown in "order by"
statement. Can you post your complete script?
Daniel
On 04/15/2011 07:46 AM, Brian Adams wrote:
I have a python script which does some date/time epoch conversion which
is sent to a pig job. However, it seems to error out all the ti
This is a known bug, it is fixed on 0.8 svn. You can check out from
http://svn.apache.org/repos/asf/pig/branches/branch-0.8, or wait for
0.8.1 coming in a few days.
Daniel
On 04/15/2011 01:45 PM, Jay Hacker wrote:
I'm trying to replace a couple of fields in a relation with values
looked up in
I believe it is PIG-1705.
Daniel
On 04/18/2011 12:02 PM, Jay Hacker wrote:
Thanks. Which Jira issue number is it?
On Fri, Apr 15, 2011 at 9:07 PM, Daniel Dai wrote:
This is a known bug, it is fixed on 0.8 svn. You can check out from
http://svn.apache.org/repos/asf/pig/branches/branch-0.8
e-use from what I saw ...
Regards,
Mridul
On Tuesday 19 April 2011 03:11 AM, Daniel Dai wrote:
I believe it is PIG-1705.
Daniel
On 04/18/2011 12:02 PM, Jay Hacker wrote:
Thanks. Which Jira issue number is it?
On Fri, Apr 15, 2011 at 9:07 PM, Daniel Daiwrote:
This is a known bug, it i
f I am not wrong, PIG-1705 talks about conflicting alias's in a join :
interesting to see how that affects Jay Hacker's issue where there is no
alias re-use from what I saw ...
Regards,
Mridul
On Tuesday 19 April 2011 03:11 AM, Daniel Dai wrote:
I believe it is PIG-1705.
Daniel
On 04/1
Do you see the failure in the first job (sampling) or second job? Do you
see the exception right after the job kick off?
If the replicated side is too large, you probably will see a "Java heap
exception" rather than job setup exception. It more like an environment
issue. Check if you can run r
There should be only one job. Thanks Thejas point out.
Daniel
-Original Message-
From: Daniel Dai
Sent: Wednesday, April 27, 2011 7:18 PM
To: user@pig.apache.org
Cc: Renato Marroquín Mogrovejo ; pig-u...@hadoop.apache.org
Subject: Re: Error Executing a Fragment Replicated Join
Do
|
|---Load(hdfs://berlin.labbio:54310/user/hadoop/pigData/sr.dat:PigStorage('|'))
- 1-87
Global sort: false
2011/4/28 Daniel Dai:
There should be only one job. Thanks Thejas point out.
Daniel
-Original Message- From: Daniel Dai
Sent:
You can:
1. CHANGE.txt has all the issue fixed in 0.8.1
2. Go to Jira, search for tickets with fix version 0.8.1
Daniel
On 05/13/2011 12:36 PM, Corbin Hoenes wrote:
Is there a change log for the 0.8.1 release?
release notes.txt just mentions "bug fixes"
Sounds like a hadoop job setup exception. Go to job tracker UI, you may
have chance to locate the job and check what happen in job setup.
Daniel
On 05/11/2011 05:45 PM, Jianting Cao wrote:
I'm trying to embed pig into java program. I tried two approaches, none of
them works.
Approach 1:
I fo
This is an issue in 0.8.1. Open a Jira for it:
https://issues.apache.org/jira/browse/PIG-2077. However, in 0.9 it is
not an issue.
Daniel
On 05/17/2011 12:20 PM, Daniel Eklund wrote:
I can absolutely open a ticket... Can you confirm though that the
expression I am using
STRSPLIT(timestamp
It is not yet supported. See https://issues.apache.org/jira/browse/PIG-1577
Daniel
On 05/20/2011 10:42 AM, Jonathan Coveney wrote:
My goal is to be able to make functions like GREATER(a,b,c...) which can
take any number of columns, and for each row will give the greater of them.
I also want to
It seems the stack does not match your statement. Do have another filter
which use "not" and "is null" in your script?
Daniel
On 05/20/2011 12:22 PM, Daniel Eklund wrote:
If I can access the implicit 'group' column from within FOREACH like this:
GROUPED = GROUP InputRelVar by (firstDim,second
It is changed to LoadMetadata.getSchema() starting 0.7.
Daniel
On 05/20/2011 02:20 PM, Sweet, Nate wrote:
Hi,
I have a LoadFunc that loads data using a complex schema. I don't want to have to specify
the schema every time. LoadFunc used to have a method "determineSchema". The
current docs re
I cannot think of a way without writing UDF. You can write two UDF:
* GetKey, input a map, output the key of the map
* GetValues, input a bag of map, output a bag of map values
The script is like:
b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
c = foreach b generate GetKe
I'm just guessing at this point.
I must say I am very frustrated with the general lack of (and incorrect)
documentation for Pig. I understand the project is evolving rapidly, but IMO
documentation should not suffer.
-Nate
-Original Message-
From: Daniel Dai [mailto:jiany...@yahoo
1 - 100 of 333 matches
Mail list logo