Hmm. I bet I know what the issue is. It's not fun though. I'm thinking that loadcaster probably isn't even called unless you explicitly name the types at in the schema declaration.
Try loading with: rows = load 'cassandra://MyKeyspace/MyColumnFamily' using CassandraStorage() as (key:chararray, columns: bag{T: tuple(name:chararray, value:long)}); To see if it correctly treats the longtype values. This isn't good though since obviously not all of the values are longs. However, if it does work for the longtypes we know we're on the right track. We may have to go in and explicitly check the types of each column and cast manually. --jacob On Thu, 2011-03-24 at 13:11 -0500, Jeremy Hanna wrote: > I see that there are a few LoadCaster implementations in pig 0.8. There's > the Utf8StorageConverter, the HBaseBinaryConverter, and a couple of others. > > The HBaseStorage class uses the Utf8StorageConverter by default but can be > configured to use the HBaseBinaryConverter. Also it's just used as a > LoadCaster and I don't see where it uses a StoreCaster at all - like the > LoadFunc interface has a getLoadCaster method to override, but I can't find > anything that has a getStoreCaster or getLoadStoreCaster method to override. > > Anyway, so I'm using the Cassandra loadfunc and getting LongType data > returned with some special characters and I thought it might be because I'm > not using a LoadCaster to convert to Pig types. So I tried both the > UtfStorageConverter as well as created my own CassandraBinaryConverter > (implementing LoadStoreCaster) to convert from Cassandra types to and from > Pig basic types. Neither work though and I'm still getting the special > character stuff when I dump to the console. > > Any ideas on why LongTypes would be returning something like this: � as a > value in a tuple? It's showing up just as a normal Long value on the > cassandra cli. Oh, and I'm loading it with: > rows = load 'cassandra://MyKeyspace/MyColumnFamily' using CassandraStorage() > as (key, columns: bag{T: tuple(name, value)}); > A = limit rows 10; > dump A; > > The value is the thing that is coming out seemingly encoded. > > Thanks, > > Jeremy