See https://github.com/trendmicro/jgit-hbase

Use branch 'jgit.storage.hbase.v4'

Last night I loaded all of the following repositories into a small HBase 
cluster running on my laptop (zk + master + 3 rs):

    cascading
    cascading.hbase
    cascading.jruby
    cascalog
    flume
    gremlins
    hadoop-lzo
    hadoop-private
    hbase-private
    hive
    jgit-hbase
    mahout
    solr (git svn from asf solr svn)
    webtable
    wukong
    ycsb
    zookeeper

and last but not least

    linux-2.6

There were no errors observed during this process. 

The resulting table had 2,136,790 rows.

And even with it all on my laptop the performance was not too bad. Times on 
pushing up the full linux-2.6 repo with 1.8M objects:

   Counting objects (local git): 50.9 seconds
   Transferring data (local git, jgit server, hbase write): 43.8 seconds
   Resolving deltas (jgit server): 2 minutes 55.1 seconds
   Store object index entries (jgit server, hbase write): 1 minute 26.9 seconds

   Total: 5 minutes 53.6 seconds

Confirmed the result can be cloned back out of the server. 

During this process there were three splits and region migrations between the 
regionservers in the background. Watching the progress of the jgit server on 
the backchannel these did not introduce any human visible pauses.

I configured the HBase schema to use LZO compression. The total storage 
consumed by the HBase table contining the above repositories is 926 MB.

I suppose I need to do a massive import of GitHub to make a reasonable test 
case from here.

    - Andy

Problems worthy of attack prove their worth by hitting back.
  - Piet Hein (via Tom White)


 
____________________________________________________________________________________
We won't tell. Get more on shows you hate to love 
(and love to hate): Yahoo! TV's Guilty Pleasures list.
http://tv.yahoo.com/collections/265 

Reply via email to