See https://github.com/trendmicro/jgit-hbase
Use branch 'jgit.storage.hbase.v4' Last night I loaded all of the following repositories into a small HBase cluster running on my laptop (zk + master + 3 rs): cascading cascading.hbase cascading.jruby cascalog flume gremlins hadoop-lzo hadoop-private hbase-private hive jgit-hbase mahout solr (git svn from asf solr svn) webtable wukong ycsb zookeeper and last but not least linux-2.6 There were no errors observed during this process. The resulting table had 2,136,790 rows. And even with it all on my laptop the performance was not too bad. Times on pushing up the full linux-2.6 repo with 1.8M objects: Counting objects (local git): 50.9 seconds Transferring data (local git, jgit server, hbase write): 43.8 seconds Resolving deltas (jgit server): 2 minutes 55.1 seconds Store object index entries (jgit server, hbase write): 1 minute 26.9 seconds Total: 5 minutes 53.6 seconds Confirmed the result can be cloned back out of the server. During this process there were three splits and region migrations between the regionservers in the background. Watching the progress of the jgit server on the backchannel these did not introduce any human visible pauses. I configured the HBase schema to use LZO compression. The total storage consumed by the HBase table contining the above repositories is 926 MB. I suppose I need to do a massive import of GitHub to make a reasonable test case from here. - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) ____________________________________________________________________________________ We won't tell. Get more on shows you hate to love (and love to hate): Yahoo! TV's Guilty Pleasures list. http://tv.yahoo.com/collections/265