Re: Urlfilter Patch

2005-11-28 Thread Ken Krugler
On Mon, 2005-11-28 at 11:44 -0800, Doug Cutting wrote: Rod Taylor wrote: > Add a few more extensions which I commonly see and cannot be parsed > (that I am aware of). ZIP, mso, jar, bz2, XLS, pps, PPS, dot, etc. [ ... ] > # skip image and other suffixes we can't yet parse > --\.(gif|GIF

Re: Nutch WebDb storage alternatives: Revisited

2005-11-28 Thread Doug Cutting
Dalton, Jeffery wrote: I would propose that even in crawling large web collections that the updates may not always be proportional to the total size of the database if you want to keep your index fresh. One of the goals of a web search engine is to be an accurate representation of what is found

RE: Nutch WebDb storage alternatives: Revisited

2005-11-28 Thread Dalton, Jeffery
Sorry for the delay in my response, holiday busyness. My comments are in-line with response below. Your feedback would be greatly appreciated. - Jeff -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 16, 2005 1:30 PM To: nutch-dev@lucene.apache.org

Re: Urlfilter Patch

2005-11-28 Thread Rod Taylor
On Mon, 2005-11-28 at 11:44 -0800, Doug Cutting wrote: > Rod Taylor wrote: > > Add a few more extensions which I commonly see and cannot be parsed > > (that I am aware of). ZIP, mso, jar, bz2, XLS, pps, PPS, dot, etc. > > [ ... ] > > > # skip image and other suffixes we can't yet parse > > --\.(

Re: Urlfilter Patch

2005-11-28 Thread Doug Cutting
Rod Taylor wrote: Add a few more extensions which I commonly see and cannot be parsed (that I am aware of). ZIP, mso, jar, bz2, XLS, pps, PPS, dot, etc. [ ... ] # skip image and other suffixes we can't yet parse --\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|

Re: [proposal] Generic Markup Language Parser

2005-11-28 Thread Doug Cutting
Andrzej Bialecki wrote: Gentlemen, please let's keep a civilized tone to this exchange, or take it off the list. +1 Doug

Re: translation in the Italian language

2005-11-28 Thread palombo
yes, some minim things Adriano Palombo This message was sent using IMP, the Internet Messaging Program.

Re: translation in the Italian language

2005-11-28 Thread Piotr Kosiorowski
Hi Adriano, I have your previous email on mt TODO list. I had no time to commit it yet -> are there any chanes from previous version? Regatds Piotr [EMAIL PROTECTED] wrote: Hi, I hope that we publish my translation in Italian of Nucth. It is possible translate also the homepage of the

translation in the Italian language

2005-11-28 Thread palombo
Hi, I hope that we publish my translation in Italian of Nucth. It is possible translate also the homepage of the site of nutch? Please you answer me Thanks Adriano Palombo This message was sent u

Summary length

2005-11-28 Thread rupa priya
Hi All, I want to increase the summary length for the search results.I tried changing the searcher.summary.context and searcher.summary.length settings in the nutch-site.xml file but it didn't work. Any other solution available?? Thanks, Rupa

Re: Need metadata transport.

2005-11-28 Thread Stefan Groschupf
Hi Marcel, for version 0.7.x you can use a patch I had uploaded to the jira. http://issues.apache.org/jira/browse/NUTCH-59 For version 0.8 this will not work anymore. I already discussed the meta data issue with Doug and how we can solve it in 0.8 but I haven't found any time to write somethi

Need metadata transport.

2005-11-28 Thread marcel . schnippe
Hi dear nutchers, I have implemented http session support for nutch. A patch will be released, as soon as i switched to mapreduce. I am crawling an intranet CMS. I was succesfull in indexing the PDFs. If I follow the link in the search result pane, the PDFs are not retrieved by the clients browse