How to use Hbase with Nutch

2009-08-25 Thread ilayaraja
Hello, I am trying to run NutchBase code on Hadoop/Hbase in local mode. I have setup the environment and everything, its working fine. I could able to create the table using Hbase shell as well. But, am not clear how to use the InjectorHbase program for injecting set of seed urls into my webtable

Nutch Performance Improvements

2009-08-25 Thread Fuad Efendi
Hello, Few years ago I noticed some performance bottlenecks of Nutch; checking source code now... the same... 1. RegexURLNormalizer and similar plugins It's singleton, and main method is synchronized. Would be better to have per-thread instance, non-synchronized; but how to mak

RE: Nutch Performance Improvements

2009-08-25 Thread Fuad Efendi
I forgot to add for "Allow Redirects" to work properly we need also Cookie handling in HttpClient... Most "stateful" websites generate links inside HTML with Session tokens if they find that Client does not support cookies; but if HttpClient supports - we are forced to allow redirects (although new

Re: Nutch Performance Improvements

2009-08-25 Thread Ken Krugler
On Aug 25, 2009, at 9:50am, Fuad Efendi wrote: I forgot to add for “Allow Redirects” to work properly we need also Cookie handling in HttpClient... Most “stateful” websites generate links inside HTML with Session tokens if they find that Client does not support cookies; but if HttpClient s