We're still using a fork unfortunately. Jeremy is referencing the on in trunk as far as I know though. Here we're waiting for when we switch from our weird version of hbase (0.89somethingsomething) to 0.90 to make the switch.
--jacob On Thu, 2011-03-24 at 15:10 -0700, Dmitriy Ryaboy wrote: > That's a good point about HBaseStorage not using the caster. I don't use it > in prod so forgot to put it in. > Jacob, are you guys using a fork or are you back on the official loader > version? > > On Thu, Mar 24, 2011 at 12:03 PM, Jeremy Hanna > <jeremy.hanna1...@gmail.com>wrote: > > > Hmmm, that never calls the bytesToLong method even with that specified in > > the schema. I wonder if it's that when using a Cassandra validator on a > > column, Cassandra tries its best to make the best guess about the value's > > type which may not be compatible with the pig basic types (in this case > > Cassandra's LongType). So it's never getting to the bytesToX methods of the > > load caster. Talking with Brandon Williams a little, I may need to take it > > to a higher level and do appropriate casting. > > > > On Mar 24, 2011, at 1:42 PM, jacob wrote: > > > > > Hmm. I bet I know what the issue is. It's not fun though. I'm thinking > > > that loadcaster probably isn't even called unless you explicitly name > > > the types at in the schema declaration. > > > > > > Try loading with: > > > > > > rows = load 'cassandra://MyKeyspace/MyColumnFamily' using > > > CassandraStorage() as (key:chararray, columns: bag{T: > > > tuple(name:chararray, value:long)}); > > > > > > To see if it correctly treats the longtype values. This isn't good > > > though since obviously not all of the values are longs. However, if it > > > does work for the longtypes we know we're on the right track. > > > > > > We may have to go in and explicitly check the types of each column and > > > cast manually. > > > > > > --jacob > > > > > > On Thu, 2011-03-24 at 13:11 -0500, Jeremy Hanna wrote: > > >> I see that there are a few LoadCaster implementations in pig 0.8. > > There's the Utf8StorageConverter, the HBaseBinaryConverter, and a couple of > > others. > > >> > > >> The HBaseStorage class uses the Utf8StorageConverter by default but can > > be configured to use the HBaseBinaryConverter. Also it's just used as a > > LoadCaster and I don't see where it uses a StoreCaster at all - like the > > LoadFunc interface has a getLoadCaster method to override, but I can't find > > anything that has a getStoreCaster or getLoadStoreCaster method to override. > > >> > > >> Anyway, so I'm using the Cassandra loadfunc and getting LongType data > > returned with some special characters and I thought it might be because I'm > > not using a LoadCaster to convert to Pig types. So I tried both the > > UtfStorageConverter as well as created my own CassandraBinaryConverter > > (implementing LoadStoreCaster) to convert from Cassandra types to and from > > Pig basic types. Neither work though and I'm still getting the special > > character stuff when I dump to the console. > > >> > > >> Any ideas on why LongTypes would be returning something like this: � as > > a value in a tuple? It's showing up just as a normal Long value on the > > cassandra cli. Oh, and I'm loading it with: > > > > > >> rows = load 'cassandra://MyKeyspace/MyColumnFamily' using > > CassandraStorage() as (key, columns: bag{T: tuple(name, value)}); > > >> A = limit rows 10; > > >> dump A; > > >> > > >> The value is the thing that is coming out seemingly encoded. > > >> > > >> Thanks, > > >> > > >> Jeremy > > > > > > > > > >