Dear Guys,
we are working on search engine ,and we have to juest version 2.x(due to its
ability to connect to HBASE). we tired tens of re-crawling scripts but non of
them works. Is there any re-crawling scrips for nutch 2.x.
We also added "db.fetch.interval.default" to "nutch-site.xml" file but
Hi Vangelis,
> Cons: Scoring is not used for selection Domains (hosts) at the start of a
> region
> (mapper input) have the highest chance to get selected.
>
> I guess that the first line is wrong and should be updated.
Afaics, that belongs to section "Things for future development", resp.
"Sug
Hi Michael,
does it work if metatags in "index.parse.md" are lowercased?
index.parse.md
metatag.groupsallowed,metatag.gtitle
See https://issues.apache.org/jira/browse/NUTCH-1561
Sorry, that's an open issue for one year now.
If you find time to review the patch, would be great!
Thanks,
Sebas
Hi all,
I have a bunch of HTML files sitting in my file system. I know the http:// URL
of each html file.
If I just fetch from my file system, I will have file:// urls, but I would like
to map them to the http:// adress or to any arbitrary adress.
Is there any halfway non-hackish possibility f
4 matches
Mail list logo