Otis,
Thanks, I'll try the generate.optimal.url.ordering property on my next
crawl. As for JobStream.py, see my reply to Dennis.
Regards,
Svein Willassen
2008/4/15, [EMAIL PROTECTED] <[EMAIL PROTECTED]>:
>
> Svein,
> Could you update the page with JobStream.py with your corrections?
>
> As fo
2008/4/15, Dennis Kubes <[EMAIL PROTECTED]>:
>
> You are using an older version of the script. I will get you the newer
> version. At least in the newer version of the script, the urls are pulled
> as topN meaning the best urls are pulled and a master crawldb is updated
> after each run. Meaning
You are using an older version of the script. I will get you the newer
version. At least in the newer version of the script, the urls are
pulled as topN meaning the best urls are pulled and a master crawldb is
updated after each run. Meaning the best urls get fetched. I will send
you the ne
Svein,
Could you update the page with JobStream.py with your corrections?
As for URL generation, have you seen
https://issues.apache.org/jira/browse/NUTCH-570 ?
Want to try it and see if it improves things for you?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Origi