[ http://issues.apache.org/jira/browse/NUTCH-245?page=all ]
Chris A. Mattmann updated NUTCH-245:
Attachment: NUTCH-245.Mattmann.patch.txt
Here's the patch for the plugin DTD file. I got a lot of info from:
http://help.eclipse.org/help31/index.jsp?to
[ http://issues.apache.org/jira/browse/NUTCH-245?page=all ]
Chris A. Mattmann updated NUTCH-245:
Description: Currently, the plugin.xml file does not have a DTD or XML
Schema associated with it, and most people just go look at an existing plugin's
p
robot parser to restrict.
-
Key: NUTCH-247
URL: http://issues.apache.org/jira/browse/NUTCH-247
Project: Nutch
Type: Bug
Components: fetcher
Versions: 0.8-dev
Reporter: Stefan Groschupf
Priority: Minor
Fix For: 0.8-dev
You can go even further and load all of the index into RAM using RAM Disk. How
big of
a index are you talking about?
-Ledio
-Original Message-
From: Dennis Kubes [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 11, 2006 3:51 PM
To: nutch-dev@lucene.apache.org
Subject: Re: Swap with Nutch
larryp wrote:
Hi, I'm trying to get Nutch to load it's index into swap as I believe it will
give better performance that having it as a file on the hard drive as it
will be mapped as virtual memory, has anyone every attempted this - any
suggestion as to how one might force the index into swap?
Hi, I'm trying to get Nutch to load it's index into swap as I believe it will
give better performance that having it as a file on the hard drive as it
will be mapped as virtual memory, has anyone every attempted this - any
suggestion as to how one might force the index into swap?
Thanks in advan
Thanks. I'll go through your rel-tag plugin in version 0.8 and use it as a
basis for adding my hreview code.
--
View this message in context:
http://www.nabble.com/Microformats-Support---HReview-t1433896.html#a3869485
Sent from the Nutch - Dev forum at Nabble.com.
> I have noticed that there are the beginnings of microformats support
> (rel-tag) in nutch version 0.8.
Hi Mike, I have created this plugin for playing a little around
microformats.
It can be a kind of "tutorial" for people who want to add support for
further microformats.
> Is anyone still w
I have noticed that there are the beginnings of microformats support
(rel-tag) in nutch version 0.8. Is anyone still working on adding other
microformats (hreview, hcard)?
If so, I would be interested in helping and/or collaborating. I already
created a simple hreview parser using nutch versi
> > Piotr, please keep oro-2.0.8 in pmd-ext
> I do not agree here - we are going to make a new release next week and
> releasing with two versions of oro does not look nice. oro is quite
> stable product and changes are in fact minimal:
> http://svn.apache.org/repos/asf/jakarta/oro/trunk/CHANGES
O
[
http://issues.apache.org/jira/browse/NUTCH-246?page=comments#action_12374049 ]
Chris Schneider commented on NUTCH-246:
---
A few more details:
Stefan and I were able to reproduce this problem using either an injection set
of 4500 URLs or a larger set
segment size is never as big as topN or crawlDB size in a distributed
deployement
-
Key: NUTCH-246
URL: http://issues.apache.org/jira/browse/NUTCH-246
Project: Nutch
Type: Bug
Versi
I didn't even think about that. trying it out now :)
thanks,
-byron
--- Stefan Groschupf <[EMAIL PROTECTED]> wrote:
> Hi Byron,
>
> This sounds like the url filter problem.
> Please try to remove the "-.*(/.+?)/.*?\1/.*?\1/"
> from regex-
> urlfilter.txt just for a test and tell us if this
> m
Hi Byron,
This sounds like the url filter problem.
Please try to remove the "-.*(/.+?)/.*?\1/.*?\1/" from regex-
urlfilter.txt just for a test and tell us if this may be would solve
the problem.
Thanks.
Stefan
Am 11.04.2006 um 14:43 schrieb Byron Miller:
i get nightly to run, but it never c
i get nightly to run, but it never completes anything.
always get stuck at 98% here and there.. i'll try
todays build and see what happens.
--- Stefan Groschupf <[EMAIL PROTECTED]> wrote:
> Hi,
>
> looks like the latest nightly build is broken.
> Looks like the jar that comes with the nightly bu
15 matches
Mail list logo