from:"David Wallace"

[Nutch-dev] Re: svn commit: r384219 - /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java

2006-03-08 Thread David Wallace

I know this is off on a tangent, but: One huge adavantage to filtering in the FetchListTool (or is that the Generator, I'm still on 0.7?) is that you can generate separate fetch lists for separate "scopes", or subsets of your crawl data. You can then give your users some control over which of se

[Nutch-dev] Re: no static NutchConf

2006-01-04 Thread David Wallace

Hi Stefan, I think these are fine things to be doing. Just two points: (1) Why not just always pass the NutchConf to the constructor of any class that needs it? Instead of distinguishing between the case of whether the class will use 1 or 2 configuration parameters; or more than that. Just for

[Nutch-dev] Re: version branches / two products

2005-12-15 Thread David Wallace

ely separate Nutch products? If it is, then now is probably the right time to do so. Regards, David Wallace. This email may contain legally privileged information and is intended only for the addressee. It is not ne

[Nutch-dev] Re: version branches / two products

2005-12-15 Thread David Wallace

Would it be worthwhile discussing the pros and cons of having two completely separate Nutch products? If it is, then now is probably the right time to do so. Regards, David Wallace. This email may contain legally privileged information and is intended only for the addressee. It is not necessarily the

[Nutch-dev] Filter at fetch list time

2005-04-26 Thread David Wallace

Hi all. Has anyone written a version of the FetchListTool that only adds a URL to the fetch list if it complies with a particular Regex URL filter? If so, would they be prepared to share? I need to do something like this, but I dislike re-inventing wheels. Essentially, I'm doing an intranet-ty

Re: [Nutch-dev] Crawl-urlfilter cann't deals with relative urls appropriately ??

2005-04-14 Thread David Wallace

Hello Cao, The problem is not that the URLs are relative - it's the ? and = characters. Try changing the line [EMAIL PROTECTED] to [EMAIL PROTECTED] and the problem will go away. Kind regards, David. From: "cao yuzhong" <[EMAIL PROTECTED]> To: nutch-dev@incubator.apache.org Date: Thu, 14 Apr 2

Re: [Nutch-dev] Feature request - pluggable Analyzer

2005-04-13 Thread David Wallace

OK Jack, but the details of my analyser aren't particularly exciting. I need to index a site that has a mixture of documents in English and Te Reo Maori (indigenous language of New Zealand). Vowels in Te Reo Maori are sometimes written with short overlines (also known as macrons), to indicate a

[Nutch-dev] Feature request - pluggable Analyzer

2005-04-11 Thread David Wallace

Hi all, I have found a need to do document analysis other than that which is provided by the NutchDocumentAnalyzer class. I have written my own Analyzer class, and I need to plug it into the Nutch framework. What I've done is the following, and I'd like to suggest that it be made part of the main

[Nutch-dev] Question re index merge call in crawl tool

2005-04-11 Thread David Wallace

Hi all, I am trying to understand Nutch a little better, so that I can evaluate its suitability for a project I am soon to embark on. I have been studying the code in CrawlTool.java (used for an "intranet search"). The line that bothers me is the call to IndexMerger.main(), near the end of main()

[Nutch-dev] Re: svn commit: r384219 - /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java

[Nutch-dev] Re: no static NutchConf

[Nutch-dev] Re: version branches / two products

[Nutch-dev] Re: version branches / two products

[Nutch-dev] Filter at fetch list time

Re: [Nutch-dev] Crawl-urlfilter cann't deals with relative urls appropriately ??

Re: [Nutch-dev] Feature request - pluggable Analyzer

[Nutch-dev] Feature request - pluggable Analyzer

[Nutch-dev] Question re index merge call in crawl tool

9 matches

Site Navigation

Mail list logo

Footer information