[jira] Updated: (NUTCH-245) DTD Schemas for plugin.xml configuration files in conf directory

2006-04-11 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-245?page=all ] Chris A. Mattmann updated NUTCH-245: Attachment: NUTCH-245.Mattmann.patch.txt Here's the patch for the plugin DTD file. I got a lot of info from: http://help.eclipse.org/help31/index.jsp?to

[jira] Updated: (NUTCH-245) DTD Schemas for plugin.xml configuration files in conf directory

2006-04-11 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-245?page=all ] Chris A. Mattmann updated NUTCH-245: Description: Currently, the plugin.xml file does not have a DTD or XML Schema associated with it, and most people just go look at an existing plugin's p

[jira] Created: (NUTCH-247) robot parser to restrict.

2006-04-11 Thread Stefan Groschupf (JIRA)
robot parser to restrict. - Key: NUTCH-247 URL: http://issues.apache.org/jira/browse/NUTCH-247 Project: Nutch Type: Bug Components: fetcher Versions: 0.8-dev Reporter: Stefan Groschupf Priority: Minor Fix For: 0.8-dev

RE: Swap with Nutch

2006-04-11 Thread Ledio Ago
You can go even further and load all of the index into RAM using RAM Disk. How big of a index are you talking about? -Ledio -Original Message- From: Dennis Kubes [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 11, 2006 3:51 PM To: nutch-dev@lucene.apache.org Subject: Re: Swap with Nutch

Re: Swap with Nutch

2006-04-11 Thread Dennis Kubes
larryp wrote: Hi, I'm trying to get Nutch to load it's index into swap as I believe it will give better performance that having it as a file on the hard drive as it will be mapped as virtual memory, has anyone every attempted this - any suggestion as to how one might force the index into swap?

Swap with Nutch

2006-04-11 Thread larryp
Hi, I'm trying to get Nutch to load it's index into swap as I believe it will give better performance that having it as a file on the hard drive as it will be mapped as virtual memory, has anyone every attempted this - any suggestion as to how one might force the index into swap? Thanks in advan

Re: Microformats Support - HReview

2006-04-11 Thread mikeyc
Thanks. I'll go through your rel-tag plugin in version 0.8 and use it as a basis for adding my hreview code. -- View this message in context: http://www.nabble.com/Microformats-Support---HReview-t1433896.html#a3869485 Sent from the Nutch - Dev forum at Nabble.com.

Re: Microformats Support - HReview

2006-04-11 Thread Jérôme Charron
> I have noticed that there are the beginnings of microformats support > (rel-tag) in nutch version 0.8. Hi Mike, I have created this plugin for playing a little around microformats. It can be a kind of "tutorial" for people who want to add support for further microformats. > Is anyone still w

Microformats Support - HReview

2006-04-11 Thread mikeyc
I have noticed that there are the beginnings of microformats support (rel-tag) in nutch version 0.8. Is anyone still working on adding other microformats (hreview, hcard)? If so, I would be interested in helping and/or collaborating. I already created a simple hreview parser using nutch versi

Re: PMD integration

2006-04-11 Thread Jérôme Charron
> > Piotr, please keep oro-2.0.8 in pmd-ext > I do not agree here - we are going to make a new release next week and > releasing with two versions of oro does not look nice. oro is quite > stable product and changes are in fact minimal: > http://svn.apache.org/repos/asf/jakarta/oro/trunk/CHANGES O

[jira] Commented: (NUTCH-246) segment size is never as big as topN or crawlDB size in a distributed deployement

2006-04-11 Thread Chris Schneider (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-246?page=comments#action_12374049 ] Chris Schneider commented on NUTCH-246: --- A few more details: Stefan and I were able to reproduce this problem using either an injection set of 4500 URLs or a larger set

[jira] Created: (NUTCH-246) segment size is never as big as topN or crawlDB size in a distributed deployement

2006-04-11 Thread Stefan Groschupf (JIRA)
segment size is never as big as topN or crawlDB size in a distributed deployement - Key: NUTCH-246 URL: http://issues.apache.org/jira/browse/NUTCH-246 Project: Nutch Type: Bug Versi

Re: nighly build brocken?

2006-04-11 Thread Byron Miller
I didn't even think about that. trying it out now :) thanks, -byron --- Stefan Groschupf <[EMAIL PROTECTED]> wrote: > Hi Byron, > > This sounds like the url filter problem. > Please try to remove the "-.*(/.+?)/.*?\1/.*?\1/" > from regex- > urlfilter.txt just for a test and tell us if this > m

Re: nighly build brocken?

2006-04-11 Thread Stefan Groschupf
Hi Byron, This sounds like the url filter problem. Please try to remove the "-.*(/.+?)/.*?\1/.*?\1/" from regex- urlfilter.txt just for a test and tell us if this may be would solve the problem. Thanks. Stefan Am 11.04.2006 um 14:43 schrieb Byron Miller: i get nightly to run, but it never c

Re: nighly build brocken?

2006-04-11 Thread Byron Miller
i get nightly to run, but it never completes anything. always get stuck at 98% here and there.. i'll try todays build and see what happens. --- Stefan Groschupf <[EMAIL PROTECTED]> wrote: > Hi, > > looks like the latest nightly build is broken. > Looks like the jar that comes with the nightly bu