Just a thought. Alternatively, if you wanted to keep things simpler, you could use Nutch 1.x and write a custom IndexWriter to send the data into the RDMS of your choice. The cleaning of the data could be done with Indexing Filters.
On 28 May 2014 23:18, Murali Parth <murparthnu...@gmail.com> wrote: > Hello, > We are trying to use Nutch in our project. This is my first > project with Nutch and Hbase. > > I was able to make Nutch write to Hbase. When I go into the hbase shell and > use the scan command I see data. > > I started writing a map reduce to get the data out of Hbase. Our intention > is to do some massaging and write the cleaned data into RDBMS. > > In my Map program I am not able to see the data I see through the scan > command. > > Question : How do I read Nutch crawl data from Hbase . > > Map Program is > > protected void map( > ImmutableBytesWritable rowkey, > Result result, > Context context) { > > > > > NavigableMap<byte[],NavigableMap<byte[],NavigableMap<Long,byte[]>>> map > = > result.getMap(); > for (Entry<byte[], NavigableMap<byte[], NavigableMap<Long, > byte[]>>> columnFamilyEntry : map.entrySet()) > { > NavigableMap<byte[],NavigableMap<Long,byte[]>> columnMap > = > columnFamilyEntry.getValue(); > for( Entry<byte[], NavigableMap<Long, byte[]>> > columnEntry : > columnMap.entrySet()) > { > NavigableMap<Long,byte[]> cellMap = > columnEntry.getValue(); > for ( Entry<Long, byte[]> cellEntry : > cellMap.entrySet()) > { > System.out.println(String.format("Key : %s, Value > : %s", Bytes.toString(columnEntry.getKey()), > Bytes.toString(cellEntry.getValue()))); > } > > } > } > > > I see the following in the console > > Key : st, Value : > > > Any help would be appreciated > > Thanks > Murali > -- Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble