[jira] Commented: (NUTCH-16) boost documents matching a url pattern

2006-01-28 Thread byron miller (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-16?page=comments#action_12364354 ] byron miller commented on NUTCH-16: --- Cool an inverse of this plugin would be great, or enhancement of this for +/- values based on patters as i think lowering score of

[jira] Commented: (NUTCH-79) Fault tolerant searching.

2006-01-28 Thread byron miller (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-79?page=comments#action_12364357 ] byron miller commented on NUTCH-79: --- Piotr, Any update on this? Have you been able to run with this or still working out the kinks? Fault tolerant searching.

[jira] Commented: (NUTCH-14) NullPointerException NutchBean.getSummary

2006-01-28 Thread byron miller (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-14?page=comments#action_12364358 ] byron miller commented on NUTCH-14: --- Are you still hitting this Stefan? NullPointerException NutchBean.getSummary - Key:

[jira] Commented: (NUTCH-134) Summarizer doesn't select the best snippets

2006-01-20 Thread byron miller (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-134?page=comments#action_12363400 ] byron miller commented on NUTCH-134: Thanks Erik, I was able to pull down the highlighter and i'll be loading it up on mozdex.com to test out over the weekend (1/21/2006).

[jira] Commented: (NUTCH-183) MapReduce has a series of problems concerning task-allocation to worker nodes

2006-01-20 Thread byron miller (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-183?page=comments#action_12363477 ] byron miller commented on NUTCH-183: As Mr Burns would say eggcelent I'll give this a try. BTW, is it possible to implement functionality that would start jobs that are

[jira] Created: (NUTCH-159) Specify temp/working directory for crawl

2005-12-31 Thread byron miller (JIRA)
Specify temp/working directory for crawl Key: NUTCH-159 URL: http://issues.apache.org/jira/browse/NUTCH-159 Project: Nutch Type: Bug Components: fetcher, indexer Versions: 0.8-dev Environment: Linux/Debian

[jira] Commented: (NUTCH-123) Cache.jsp some times generate NullPointerException

2005-12-31 Thread byron miller (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-123?page=comments#action_12361473 ] byron miller commented on NUTCH-123: Perhaps you should try the cache servlet as it dumps out the data as it sees it. Cache.jsp some times generate NullPointerException

[jira] Commented: (NUTCH-42) enhance search.jsp such that it can also returns XML

2005-12-31 Thread byron miller (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-42?page=comments#action_12361474 ] byron miller commented on NUTCH-42: --- Safe to close. (done) We have XML/OpenSearch in latest trunk and other branches. enhance search.jsp such that it can also returns XML

[jira] Created: (NUTCH-158) Process Sitemap data in text, rss or xml format as well as OAI-PMH

2005-12-29 Thread byron miller (JIRA)
Process Sitemap data in text, rss or xml format as well as OAI-PMH -- Key: NUTCH-158 URL: http://issues.apache.org/jira/browse/NUTCH-158 Project: Nutch Type: New Feature Components: fetcher

[jira] Commented: (NUTCH-155) Remove web gui from the distribution to contrib and use OpenSearch Servlet

2005-12-29 Thread byron miller (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-155?page=comments#action_12361398 ] byron miller commented on NUTCH-155: I don't know how i feel about removing the JSP stuff into a contrib and then fluffing it up more with the potential to support other

[jira] Commented: (NUTCH-92) DistributedSearch incorrectly scores results

2005-12-28 Thread byron miller (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-92?page=comments#action_12361348 ] byron miller commented on NUTCH-92: --- Has there been any advancement on this front? DistributedSearch incorrectly scores results

[jira] Commented: (NUTCH-134) Summarizer doesn't select the best snippets

2005-12-28 Thread byron miller (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-134?page=comments#action_12361350 ] byron miller commented on NUTCH-134: Where is the lucene summarizer from the contrib? i'm not seeing anything obvious (unless it's under a different name) Summarizer

[jira] Commented: (NUTCH-95) DeleteDuplicates depends on the order of input segments

2005-12-27 Thread byron miller (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-95?page=comments#action_12361300 ] byron miller commented on NUTCH-95: --- Number 2 sounds great, but wouldn't you always want the latest scoring document since that should reflect the latest updatedb and rank of

[jira] Commented: (NUTCH-55) Create dmoz.org search plugin - incorporate the dmoz.org title/category/description if available

2005-12-27 Thread byron miller (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-55?page=comments#action_12361301 ] byron miller commented on NUTCH-55: --- You can close this ticket, duplicate of ticket NUTCH-59 Create dmoz.org search plugin - incorporate the dmoz.org

[jira] Commented: (NUTCH-134) Summarizer doesn't select the best snippets

2005-12-07 Thread byron miller (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-134?page=comments#action_12359649 ] byron miller commented on NUTCH-134: I would take more cpu for better summaries any day :) cpu power is cheaper than manual intervention! If any testing is needed, don't

[jira] Commented: (NUTCH-39) pagination in search result

2005-10-30 Thread byron miller (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-39?page=comments#action_12356374 ] byron miller commented on NUTCH-39: --- I'm using the above code snippet on mozdex and run across some strange issues.. for example if you search for cnn.com it doesn't show up

[jira] Commented: (NUTCH-49) Flag for generate to fetch only new pages to complement the -refetchonly flag

2005-10-25 Thread byron miller (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-49?page=comments#action_12355864 ] byron miller commented on NUTCH-49: --- Can something like this be adapted to use the regex filter as well? it would be nice to say new only and match urls of x type or x link