On Mon, 2005-11-28 at 11:44 -0800, Doug Cutting wrote:
Rod Taylor wrote:
> Add a few more extensions which I commonly see and cannot be parsed
> (that I am aware of). ZIP, mso, jar, bz2, XLS, pps, PPS, dot, etc.
[ ... ]
> # skip image and other suffixes we can't yet parse
>
--\.(gif|GIF
Dalton, Jeffery wrote:
I would propose that even in crawling large web collections that the
updates may not always be proportional to the total size of the database
if you want to keep your index fresh. One of the goals of a web search
engine is to be an accurate representation of what is found
Sorry for the delay in my response, holiday busyness. My comments are
in-line with response below. Your feedback would be greatly
appreciated.
- Jeff
-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 16, 2005 1:30 PM
To: nutch-dev@lucene.apache.org
On Mon, 2005-11-28 at 11:44 -0800, Doug Cutting wrote:
> Rod Taylor wrote:
> > Add a few more extensions which I commonly see and cannot be parsed
> > (that I am aware of). ZIP, mso, jar, bz2, XLS, pps, PPS, dot, etc.
>
> [ ... ]
>
> > # skip image and other suffixes we can't yet parse
> > --\.(
Rod Taylor wrote:
Add a few more extensions which I commonly see and cannot be parsed
(that I am aware of). ZIP, mso, jar, bz2, XLS, pps, PPS, dot, etc.
[ ... ]
# skip image and other suffixes we can't yet parse
--\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|
Andrzej Bialecki wrote:
Gentlemen, please let's keep a civilized tone to this exchange, or take
it off the list.
+1
Doug
yes, some minim things
Adriano Palombo
This message was sent using IMP, the Internet Messaging Program.
Hi Adriano,
I have your previous email on mt TODO list. I had no time to commit it
yet -> are there any chanes from previous version?
Regatds
Piotr
[EMAIL PROTECTED] wrote:
Hi,
I hope that we publish my translation in Italian of Nucth.
It is possible translate also the homepage of the
Hi,
I hope that we publish my translation in Italian of Nucth.
It is possible translate also the homepage of the site of nutch?
Please you answer me
Thanks
Adriano Palombo
This message was sent u
Hi All,
I want to increase the summary length for the search results.I tried changing
the
searcher.summary.context and searcher.summary.length settings in the
nutch-site.xml file but it didn't work.
Any other solution available??
Thanks,
Rupa
Hi Marcel,
for version 0.7.x you can use a patch I had uploaded to the jira.
http://issues.apache.org/jira/browse/NUTCH-59
For version 0.8 this will not work anymore.
I already discussed the meta data issue with Doug and how we can
solve it in 0.8 but I haven't found any time to write somethi
Hi dear nutchers,
I have implemented http session support for nutch. A patch will be
released, as soon as i switched to mapreduce.
I am crawling an intranet CMS. I was succesfull in indexing the PDFs.
If I follow the link in the search result pane, the PDFs are not retrieved
by the clients browse
12 matches
Mail list logo