Re: JobStream.py

2008-04-15 Thread Svein Yngvar Willassen
Otis, Thanks, I'll try the generate.optimal.url.ordering property on my next crawl. As for JobStream.py, see my reply to Dennis. Regards, Svein Willassen 2008/4/15, [EMAIL PROTECTED] <[EMAIL PROTECTED]>: > > Svein, > Could you update the page with JobStream.py with your corrections? > > As fo

Re: JobStream.py

2008-04-15 Thread Svein Yngvar Willassen
2008/4/15, Dennis Kubes <[EMAIL PROTECTED]>: > > You are using an older version of the script. I will get you the newer > version. At least in the newer version of the script, the urls are pulled > as topN meaning the best urls are pulled and a master crawldb is updated > after each run. Meaning

Re: JobStream.py

2008-04-15 Thread Dennis Kubes
You are using an older version of the script. I will get you the newer version. At least in the newer version of the script, the urls are pulled as topN meaning the best urls are pulled and a master crawldb is updated after each run. Meaning the best urls get fetched. I will send you the ne

Re: JobStream.py

2008-04-15 Thread ogjunk-nutch
Svein, Could you update the page with JobStream.py with your corrections? As for URL generation, have you seen https://issues.apache.org/jira/browse/NUTCH-570 ? Want to try it and see if it improves things for you? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Origi