Re: Re: Weird bug of REPLACE

2012-09-04 Thread MiaoMiao
Pity the document of REPLACE doesn't mention about regex at all. Thank you so much for your reply, being able to know what's going on is such a relief. Now I can trust myself with pig a little more. On Sat, 18 Aug 2012 at 00:05:29 AM, Cheolsoo Park wrote: > Hi, > If you look at the source code

Re: Extremely slow when loading small amount of data from HBase

2012-09-04 Thread 某因幡
After merging ~8000 regions to ~4000 on an 8-node cluster the things is getting better. Should I continue merging? 2012/8/29 Dmitriy Ryaboy : > Can you try the same scans with a regular hbase mapreduce job? If you see the > same problem, it's an hbase issue. Otherwise, we need to see the script

Re: Extremely slow when loading small amount of data from HBase

2012-09-04 Thread Dmitriy Ryaboy
I think the hbase folks recommend something like 40 regions per node per table, but I might be misremembering something. Have you tried emailing the hbase users list? On Sep 4, 2012, at 3:39 AM, 某因幡 wrote: > After merging ~8000 regions to ~4000 on an 8-node cluster the things > is getting bett

Re: schema of pig flatten

2012-09-04 Thread Russell Jurney
You must cast explicitly: b = foreach a generate (int)foo as foo:int; Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com On Sep 4, 2012, at 4:17 AM, Huo Zhu wrote: > i recently meet this problem in my work, it's about pig flatten. i use a > simple example to express i

UNION with bytearray

2012-09-04 Thread Xavier Stevens
I'm trying to do a UNION on two datasets with identical schemas (k:bytearray, v:chararray). When using the UNION operator like so: combined_data = UNION dataset1, dataset2; I get the following error: java.lang.RuntimeException: Unexpected data type java.util.ArrayList found in stream. Note on

Re: Count of all the rows

2012-09-04 Thread Alan Gates
Expression in Pig can only have tuples and bags. A tuple is a single record. It has a defined number of fields. Those fields are in a defined order. They can be given names and types may be assigned to them. Thus it is reasonable to speak of the 3rd field or the field named "user". That fi

Re: Re: Weird bug of REPLACE

2012-09-04 Thread Bill Graham
Opened a JIRA to better clarify the docs here: https://issues.apache.org/jira/browse/PIG-2905 On Tue, Sep 4, 2012 at 12:37 AM, MiaoMiao wrote: > Pity the document of REPLACE doesn't mention about regex at all. Thank > you so much for your reply, being able to know what's going on is such > a rel

Re: wrong sort order (lexical vs numeric) in a nested foreach

2012-09-04 Thread Lauren Blau
unfortunately, I can't put together an example without sharing the custom jsonloader and data. But I've worked around this by explicitly storing and reloading the data. But it sounds like you have it backwards in your attempt to be sneaky. The data actually is an int and should be sorted numericall

Re: wrong sort order (lexical vs numeric) in a nested foreach

2012-09-04 Thread Lauren Blau
I think I finally found the culprit. There is a load like this: a = load '/foobar' using CustomJsonLoader('baz') as (m:map[]); -- loading an untyped map then there is a flatten, a1 = foreach a generate a#'id' as id: chararray, flatten(a#'listvals') as (listvals: map[]); -- another untyped map a

Re: schema of pig flatten

2012-09-04 Thread Huo Zhu
script 2 and script 3 are both explict cast, why result are different? On 4 September 2012 22:04, Russell Jurney wrote: > You must cast explicitly: > > b = foreach a generate (int)foo as foo:int; > > Russell Jurney > twitter.com/rjurney > russell.jur...@gmail.com > datasyndrome.com > > On Sep 4,

Berkeley lecture on Pig

2012-09-04 Thread Jon Coveney
In partnership with Berkeley, Twitter employees are giving a number of lectures to a class on big data. I did one on Pig. I thought some might find it useful. Slides: http://db.tt/VJfy1Ixz Full presentation: http://blogs.ischool.berkeley.edu/i290-abdt-s12/ If you have any comments let me know (

Re: Berkeley lecture on Pig

2012-09-04 Thread Prashant Kommireddi
Nice! On Sep 4, 2012, at 11:34 PM, Jon Coveney wrote: > In partnership with Berkeley, Twitter employees are giving a number of > lectures to a class on big data. I did one on Pig. I thought some might find > it useful. > > Slides: > http://db.tt/VJfy1Ixz > > Full presentation: > http://blogs.i