We're still using a fork unfortunately. Jeremy is referencing the on in
trunk as far as I know though. Here we're waiting for when we switch
from our weird version of hbase (0.89somethingsomething) to 0.90 to make
the switch.

--jacob

On Thu, 2011-03-24 at 15:10 -0700, Dmitriy Ryaboy wrote:
> That's a good point about HBaseStorage not using the caster. I don't use it
> in prod so forgot to put it in.
> Jacob, are you guys using a fork or are you back on the official loader
> version?
> 
> On Thu, Mar 24, 2011 at 12:03 PM, Jeremy Hanna
> <jeremy.hanna1...@gmail.com>wrote:
> 
> > Hmmm, that never calls the bytesToLong method even with that specified in
> > the schema.  I wonder if it's that when using a Cassandra validator on a
> > column, Cassandra tries its best to make the best guess about the value's
> > type which may not be compatible with the pig basic types (in this case
> > Cassandra's LongType).  So it's never getting to the bytesToX methods of the
> > load caster.  Talking with Brandon Williams a little, I may need to take it
> > to a higher level and do appropriate casting.
> >
> > On Mar 24, 2011, at 1:42 PM, jacob wrote:
> >
> > > Hmm. I bet I know what the issue is. It's not fun though. I'm thinking
> > > that loadcaster probably isn't even called unless you explicitly name
> > > the types at in the schema declaration.
> > >
> > > Try loading with:
> > >
> > > rows = load 'cassandra://MyKeyspace/MyColumnFamily' using
> > > CassandraStorage() as (key:chararray, columns: bag{T:
> > > tuple(name:chararray, value:long)});
> > >
> > > To see if it correctly treats the longtype values. This isn't good
> > > though since obviously not all of the values are longs. However, if it
> > > does work for the longtypes we know we're on the right track.
> > >
> > > We may have to go in and explicitly check the types of each column and
> > > cast manually.
> > >
> > > --jacob
> > >
> > > On Thu, 2011-03-24 at 13:11 -0500, Jeremy Hanna wrote:
> > >> I see that there are a few LoadCaster implementations in pig 0.8.
> >  There's the Utf8StorageConverter, the HBaseBinaryConverter, and a couple of
> > others.
> > >>
> > >> The HBaseStorage class uses the Utf8StorageConverter by default but can
> > be configured to use the HBaseBinaryConverter.  Also it's just used as a
> > LoadCaster and I don't see where it uses a StoreCaster at all - like the
> > LoadFunc interface has a getLoadCaster method to override, but I can't find
> > anything that has a getStoreCaster or getLoadStoreCaster method to override.
> > >>
> > >> Anyway, so I'm using the Cassandra loadfunc and getting LongType data
> > returned with some special characters and I thought it might be because I'm
> > not using a LoadCaster to convert to Pig types.  So I tried both the
> > UtfStorageConverter as well as created my own CassandraBinaryConverter
> > (implementing LoadStoreCaster) to convert from Cassandra types to and from
> > Pig basic types.  Neither work though and I'm still getting the special
> > character stuff when I dump to the console.
> > >>
> > >> Any ideas on why LongTypes would be returning something like this: � as
> > a value in a tuple?  It's showing up just as a normal Long value on the
> > cassandra cli.  Oh, and I'm loading it with:
> > >
> > >> rows = load 'cassandra://MyKeyspace/MyColumnFamily' using
> > CassandraStorage() as (key, columns: bag{T: tuple(name, value)});
> > >> A = limit rows 10;
> > >> dump A;
> > >>
> > >> The value is the thing that is coming out seemingly encoded.
> > >>
> > >> Thanks,
> > >>
> > >> Jeremy
> > >
> > >
> >
> >


Reply via email to