Re: loading from HBase - Pig 0.7

Anze Tue, 26 Oct 2010 06:33:08 -0700

Hmmm, not quite there yet. :-/

I installed:
- HBase 0.20.6
- Cloudera CDH3b3 Hadoop (0.20.2) 
- Pig 0.8 (since official download is empty (?) I fetched the Pig trunk from 
SVN and built it)


Now it complains about "Failed to create DataStorage". Any ideas? Should I 
upgrade Haddop too? 

This is getting a bit complicated to install. :)

I would appreciate some pointers - google revealed nothing useful.

Thanks,

Anze


On Tuesday 26 October 2010, Anze wrote:
> Great! :)
> 
> Thanks for helping me out.
> 
> All the best,
> 
> Anze
> 
> On Tuesday 26 October 2010, Dmitriy Ryaboy wrote:
> > I think that you might be able to get away with 20.2 if you don't use
> > the filtering options.
> > 
> > On Mon, Oct 25, 2010 at 3:39 PM, Anze <anzen...@volja.net> wrote:
> > > Dmitriy, thanks for the answer!
> > > 
> > > The problem with upgrading to HBase 0.20.6 is that cloudera doesn't
> > > ship it yet and we would like to keep our install at "official"
> > > versions, even if beta. Of course, since this is a development /
> > > testing cluster, we could bend the rules if really necessary...
> > > 
> > > I have written a small MR job (actually, just "M" job :) that exports
> > > the tables to files (allowing me to use Pig 0.7), but that is a bit
> > > cumbersome and slow.
> > > 
> > > If I install the latest Pig (0.8), will it work at all with HBase
> > > 0.20.2? In other words, are scan filters (which were fixed in 0.20.6)
> > > needed as part of user-defined parameters or as part of Pig
> > > optimizations in reading from HBase? Hope my question makes sense...
> > > :)
> > > 
> > > Thanks again,
> > > 
> > > Anze
> > > 
> > > On Tuesday 26 October 2010, Dmitriy Ryaboy wrote:
> > >> Anze, the reason we bumped up to 20.6 in the ticket was because HBase
> > >> 20.2 had a bug in it. Ask the HBase folks, but I'd say you should
> > >> upgrade.
> > >> FWIW we upgraded to 20.6 from 20.2 a few months back and it's been
> > >> working smoothly.
> > >> 
> > >> The Elephant-Bird hbase loader for pig 0.6 does add row keys and most
> > >> of the other features we added to the built-in loader for pig 0.8
> > >> (notably, it does not do storage). But I don't recommend downgrading
> > >> to pig 0.6, as 7 and especially 8 are great improvements to the
> > >> software.
> > >> 
> > >> -D
> > >> 
> > >> On Mon, Oct 25, 2010 at 7:01 AM, Anze <anzen...@volja.net> wrote:
> > >> > Hi all!
> > >> > 
> > >> > I am struggling to find a working solution to load data from HBase
> > >> > directly. I am using Cloudera CDH3b3 which comes with Pig 0.7. What
> > >> > would be the easiest way to load data from HBase?
> > >> > If it matters: we need the rows to be included, too.
> > >> > 
> > >> > I have checked ElephantBird, but it seems to require Pig 0.6. I
> > >> > could downgrade, but it seems... well... :)
> > >> > 
> > >> > On the other hand, loading from HBase with rows is only added in Pig
> > >> > 0.8: https://issues.apache.org/jira/browse/PIG-915
> > >> > https://issues.apache.org/jira/browse/PIG-1205
> > >> > But judging from the last issue Pig 0.8 requires HBase 0.20.6?
> > >> > 
> > >> > I can install latest Pig from source if needed, but I'd rather leave
> > >> > Hadoop and HBase at their versions (0.20.2 and 0.89.20100924
> > >> > respectively).
> > >> > 
> > >> > Should I write my own UDF? I'd appreciate some pointers.
> > >> > 
> > >> > Thanks,
> > >> > 
> > >> > Anze

Re: loading from HBase - Pig 0.7

Reply via email to