Hmmm, not quite there yet. :-/ I installed: - HBase 0.20.6 - Cloudera CDH3b3 Hadoop (0.20.2) - Pig 0.8 (since official download is empty (?) I fetched the Pig trunk from SVN and built it)
Now it complains about "Failed to create DataStorage". Any ideas? Should I upgrade Haddop too? This is getting a bit complicated to install. :) I would appreciate some pointers - google revealed nothing useful. Thanks, Anze On Tuesday 26 October 2010, Anze wrote: > Great! :) > > Thanks for helping me out. > > All the best, > > Anze > > On Tuesday 26 October 2010, Dmitriy Ryaboy wrote: > > I think that you might be able to get away with 20.2 if you don't use > > the filtering options. > > > > On Mon, Oct 25, 2010 at 3:39 PM, Anze <anzen...@volja.net> wrote: > > > Dmitriy, thanks for the answer! > > > > > > The problem with upgrading to HBase 0.20.6 is that cloudera doesn't > > > ship it yet and we would like to keep our install at "official" > > > versions, even if beta. Of course, since this is a development / > > > testing cluster, we could bend the rules if really necessary... > > > > > > I have written a small MR job (actually, just "M" job :) that exports > > > the tables to files (allowing me to use Pig 0.7), but that is a bit > > > cumbersome and slow. > > > > > > If I install the latest Pig (0.8), will it work at all with HBase > > > 0.20.2? In other words, are scan filters (which were fixed in 0.20.6) > > > needed as part of user-defined parameters or as part of Pig > > > optimizations in reading from HBase? Hope my question makes sense... > > > :) > > > > > > Thanks again, > > > > > > Anze > > > > > > On Tuesday 26 October 2010, Dmitriy Ryaboy wrote: > > >> Anze, the reason we bumped up to 20.6 in the ticket was because HBase > > >> 20.2 had a bug in it. Ask the HBase folks, but I'd say you should > > >> upgrade. > > >> FWIW we upgraded to 20.6 from 20.2 a few months back and it's been > > >> working smoothly. > > >> > > >> The Elephant-Bird hbase loader for pig 0.6 does add row keys and most > > >> of the other features we added to the built-in loader for pig 0.8 > > >> (notably, it does not do storage). But I don't recommend downgrading > > >> to pig 0.6, as 7 and especially 8 are great improvements to the > > >> software. > > >> > > >> -D > > >> > > >> On Mon, Oct 25, 2010 at 7:01 AM, Anze <anzen...@volja.net> wrote: > > >> > Hi all! > > >> > > > >> > I am struggling to find a working solution to load data from HBase > > >> > directly. I am using Cloudera CDH3b3 which comes with Pig 0.7. What > > >> > would be the easiest way to load data from HBase? > > >> > If it matters: we need the rows to be included, too. > > >> > > > >> > I have checked ElephantBird, but it seems to require Pig 0.6. I > > >> > could downgrade, but it seems... well... :) > > >> > > > >> > On the other hand, loading from HBase with rows is only added in Pig > > >> > 0.8: https://issues.apache.org/jira/browse/PIG-915 > > >> > https://issues.apache.org/jira/browse/PIG-1205 > > >> > But judging from the last issue Pig 0.8 requires HBase 0.20.6? > > >> > > > >> > I can install latest Pig from source if needed, but I'd rather leave > > >> > Hadoop and HBase at their versions (0.20.2 and 0.89.20100924 > > >> > respectively). > > >> > > > >> > Should I write my own UDF? I'd appreciate some pointers. > > >> > > > >> > Thanks, > > >> > > > >> > Anze