Hi,
I have used the keywords plugin to crawl the meta keyword details which
happens fine.
Here is the issue in case there is a keyword.
1.Say the keyword is RedHat (R and H in capital crawled from the site)
On searching using keywords:RedHat nutch returns 0 results ..
Bu
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/663/changes
--
[...truncated 2223 lines...]
A src/plugin/protocol-http/src/test/org/apache/nutch
A src/plugin/protocol-http/src/test/org/apache/nutch/protocol
A src/plugin/prot
If you are talking about Nutch Contents which are stored in the segments
during fetching of pages, then you would need to write MapReduce job to
read in the Contents object and do whatever processing you desire.
Dennis
oSilvio wrote:
Very useful information, thanks!
But in order to extract t
I've seen it now thanks for the attention
oSilvio wrote:
>
> Very useful information, thanks!
> But in order to extract the data inside those files (like html pages) I
> can find no algorithm available by nutch, nor the process used to store
> the data. Do you know if it is possible to extract
Very useful information, thanks!
But in order to extract the data inside those files (like html pages) I can
find no algorithm available by nutch, nor the process used to store the
data. Do you know if it is possible to extract using lucene?
Dennis Kubes-2 wrote:
>
> The nutch databases are e