Hi Murali, You could use Nutch 2.x as Julien told you which already uses Gora. This gives you some advantages like being able to read the data with MapReduce or directly from the data source you have chosen. Besides this, Gora integrates with Giraph which would give you the chance to run graph algorithms out-of-the-box on top of your data. There are many ways to accomplish what you want to do, you just have to choose the one that suits your goals better.
Renato M. 2014-05-29 15:07 GMT+02:00 Julien Nioche <lists.digitalpeb...@gmail.com>: > The GORA API provides wrappers for MapReduce. That's what Nutch 2.x uses as > an abstraction over the storage. > > If your data volumes are going to be high, I'd question the relevance of > storing the data in a relational database but that's a different subject. > > > On 29 May 2014 13:45, Murali Parth <murparthnu...@gmail.com> wrote: > > > Hi Julien, > > Thanks for the email. Thanks for the suggestion . We are > > going to explore the Gora APIs ? > > > > Our data volumes are going to be high, can we use the Gora API with Map > > reduce ?. > > > > Please suggest > > > > Thanks > > Murali > > > > > > > > On Thu, May 29, 2014 at 12:49 AM, Julien Nioche < > > lists.digitalpeb...@gmail.com> wrote: > > > > > Hi Murali > > > > > > Why not using the GORA API to read from HBase? > > > > > > Julien > > > > > > > > > On 28 May 2014 23:18, Murali Parth <murparthnu...@gmail.com> wrote: > > > > > > > Hello, > > > > We are trying to use Nutch in our project. This is my first > > > > project with Nutch and Hbase. > > > > > > > > I was able to make Nutch write to Hbase. When I go into the hbase > shell > > > and > > > > use the scan command I see data. > > > > > > > > I started writing a map reduce to get the data out of Hbase. Our > > > intention > > > > is to do some massaging and write the cleaned data into RDBMS. > > > > > > > > In my Map program I am not able to see the data I see through the > scan > > > > command. > > > > > > > > Question : How do I read Nutch crawl data from Hbase . > > > > > > > > Map Program is > > > > > > > > protected void map( > > > > ImmutableBytesWritable rowkey, > > > > Result result, > > > > Context context) { > > > > > > > > > > > > > > > > > > > > > NavigableMap<byte[],NavigableMap<byte[],NavigableMap<Long,byte[]>>> > > > map > > > > = > > > > result.getMap(); > > > > for (Entry<byte[], NavigableMap<byte[], > > > NavigableMap<Long, > > > > byte[]>>> columnFamilyEntry : map.entrySet()) > > > > { > > > > NavigableMap<byte[],NavigableMap<Long,byte[]>> > > > columnMap > > > > = > > > > columnFamilyEntry.getValue(); > > > > for( Entry<byte[], NavigableMap<Long, byte[]>> > > > > columnEntry : > > > > columnMap.entrySet()) > > > > { > > > > NavigableMap<Long,byte[]> cellMap = > > > > columnEntry.getValue(); > > > > for ( Entry<Long, byte[]> cellEntry : > > > > cellMap.entrySet()) > > > > { > > > > System.out.println(String.format("Key : %s, > > > Value > > > > : %s", Bytes.toString(columnEntry.getKey()), > > > > Bytes.toString(cellEntry.getValue()))); > > > > } > > > > > > > > } > > > > } > > > > > > > > > > > > I see the following in the console > > > > > > > > Key : st, Value : > > > > > > > > > > > > Any help would be appreciated > > > > > > > > Thanks > > > > Murali > > > > > > > > > > > > > > > > -- > > > > > > Open Source Solutions for Text Engineering > > > > > > http://digitalpebble.blogspot.com/ > > > http://www.digitalpebble.com > > > http://twitter.com/digitalpebble > > > > > > > > > -- > > Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble >