Just a thought. Alternatively, if you wanted to keep things simpler, you
could use Nutch 1.x and write a custom IndexWriter  to send the data into
the RDMS of your choice. The cleaning of the data could be done with
Indexing Filters.

On 28 May 2014 23:18, Murali Parth <murparthnu...@gmail.com> wrote:

> Hello,
>          We are trying to use Nutch in our project. This is my first
> project with Nutch and Hbase.
> I was able to make Nutch write to Hbase. When I go into the hbase shell and
> use the scan command I see data.
> I started writing a map reduce to get the data out of Hbase.  Our intention
> is to do some massaging and write the cleaned data into RDBMS.
> In my Map program I am not able to see the data I see through the scan
> command.
> Question : How do I read Nutch crawl data from Hbase .
> Map Program is
> protected void map(
>           ImmutableBytesWritable rowkey,
>           Result result,
>           Context context) {
>     NavigableMap<byte[],NavigableMap<byte[],NavigableMap<Long,byte[]>>> map
>     =
>     result.getMap();
>                 for (Entry<byte[], NavigableMap<byte[], NavigableMap<Long,
>     byte[]>>> columnFamilyEntry : map.entrySet())
>                 {
>                   NavigableMap<byte[],NavigableMap<Long,byte[]>> columnMap
> =
>     columnFamilyEntry.getValue();
>                   for( Entry<byte[], NavigableMap<Long, byte[]>>
> columnEntry :
>     columnMap.entrySet())
>                   {
>                       NavigableMap<Long,byte[]> cellMap =
>     columnEntry.getValue();
>                       for ( Entry<Long, byte[]> cellEntry :
> cellMap.entrySet())
>                       {
>                           System.out.println(String.format("Key : %s, Value
> :    %s", Bytes.toString(columnEntry.getKey()),
>     Bytes.toString(cellEntry.getValue())));
>                       }
>                  }
>                }
> I see the following in the console
> Key : st, Value :
> Any help would be appreciated
> Thanks
> Murali


Open Source Solutions for Text Engineering


Reply via email to