Re: has this been reported? (bug)?

2011-03-24 Thread Daniel Dai
Open https://issues.apache.org/jira/browse/PIG-1935 for it. Daniel On 03/24/2011 04:21 PM, Daniel Dai wrote: Thanks for reporting. It seems to be a new bug. I will file a Jira. Daniel On 03/24/2011 03:13 PM, Corbin Hoenes wrote: badsite.com127.0.0.1 goodsite.com/1?foo=truegoodsit

Re: reducer throttling?

2011-03-24 Thread Dexin Wang
Thanks for your explanation Alex. In some cases, there isn't even a reduce phase. For example, we have some raw data, after our custom LOAD function and some filter function, it directly goes into DB. And since we don't have control on number of mappers, we end up with too many DB writers. That's

Re: Anti-Joins

2011-03-24 Thread Alan Gates
A = load 'input1' as (x, y); B = load 'input2' as (u, v); C = cogroup A by x, B by u; D = filter C by IsEmpty(B); E = foreach D generate flatten(A); Alan. On Mar 24, 2011, at 4:28 PM, mike st. john wrote: Are there any examples of Anti-Joins using Pig. Thanks Msj

Anti-Joins

2011-03-24 Thread mike st. john
Are there any examples of Anti-Joins using Pig. Thanks Msj

Re: has this been reported? (bug)?

2011-03-24 Thread Daniel Dai
Thanks for reporting. It seems to be a new bug. I will file a Jira. Daniel On 03/24/2011 03:13 PM, Corbin Hoenes wrote: badsite.com127.0.0.1 goodsite.com/1?foo=truegoodsite.com127.0.0.1

Re: LoadCaster, LoadStoreCaster usage and encoded output

2011-03-24 Thread jacob
We're still using a fork unfortunately. Jeremy is referencing the on in trunk as far as I know though. Here we're waiting for when we switch from our weird version of hbase (0.89somethingsomething) to 0.90 to make the switch. --jacob On Thu, 2011-03-24 at 15:10 -0700, Dmitriy Ryaboy wrote: > That

has this been reported? (bug)?

2011-03-24 Thread Corbin Hoenes
Wondering if someone has reported this bug in pig 0.8 (maybe it's been fixed?) data.txt (tab seperated file, bad site has no canonical_url populated): badsite.com127.0.0.1 goodsite.com/1?foo=truegoodsite.com127.0.0.1 data = LOAD 'data.txt' using PigStorage() as (referrer:chararray

Re: LoadCaster, LoadStoreCaster usage and encoded output

2011-03-24 Thread Dmitriy Ryaboy
That's a good point about HBaseStorage not using the caster. I don't use it in prod so forgot to put it in. Jacob, are you guys using a fork or are you back on the official loader version? On Thu, Mar 24, 2011 at 12:03 PM, Jeremy Hanna wrote: > Hmmm, that never calls the bytesToLong method even w

Re: LoadCaster, LoadStoreCaster usage and encoded output

2011-03-24 Thread Jeremy Hanna
Hmmm, that never calls the bytesToLong method even with that specified in the schema. I wonder if it's that when using a Cassandra validator on a column, Cassandra tries its best to make the best guess about the value's type which may not be compatible with the pig basic types (in this case Cas

Re: LoadCaster, LoadStoreCaster usage and encoded output

2011-03-24 Thread jacob
Hmm. I bet I know what the issue is. It's not fun though. I'm thinking that loadcaster probably isn't even called unless you explicitly name the types at in the schema declaration. Try loading with: rows = load 'cassandra://MyKeyspace/MyColumnFamily' using CassandraStorage() as (key:chararray, co

LoadCaster, LoadStoreCaster usage and encoded output

2011-03-24 Thread Jeremy Hanna
I see that there are a few LoadCaster implementations in pig 0.8. There's the Utf8StorageConverter, the HBaseBinaryConverter, and a couple of others. The HBaseStorage class uses the Utf8StorageConverter by default but can be configured to use the HBaseBinaryConverter. Also it's just used as a