RE: [Nutch-dev] HTTP->HTTPS redirects

2004-10-07 Thread Clermont, Roger C EIS-NWD Contractor
> I need to be able to get to HTTP status codes. > Response is nice and generic, but for my purposes it also hides too > much. > > > >Most protocols return status codes in responses, I think. > > If my assumption above is correct, then adding public abstract int > getCode() method would make sens

[Nutch-dev] cc for wiki as well?

2004-10-07 Thread Stefan Groschupf
Hi, wouldn't it be sense-fully to put the wiki content under creative common as well? Sorry I don't know how to change the template for new page creation, may this can only be done by the admin. Stefan --- This SF.net email is sponsored by

Re: [Nutch-dev] HTTP->HTTPS redirects

2004-10-07 Thread ogjunk-nutch
--- Andy Hedges <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] wrote: > > >I like the elegance of this. > >Would it also be possible to add this to Response: > > > >public abstract int getCode(); > > > > > What do you intent to do with the code? I think the idea is that all > protocol codes/o

Re: [Nutch-dev] new plugin cvs project?

2004-10-07 Thread Stefan Groschupf
Doug, I would love to suggest to create a new project "plugins" in our cvs and store most of the plugins there. I like having a single tree. That way, when an API changes, it's easy to update and test all of the plugins. To have most of the plugins in a separate cvs project mean not that we ca

Re: [Nutch-dev] HTTP->HTTPS redirects

2004-10-07 Thread Andy Hedges
[EMAIL PROTECTED] wrote: I like the elegance of this. Would it also be possible to add this to Response: public abstract int getCode(); What do you intent to do with the code? I think the idea is that all protocol codes/outcome can be mapped to the response class below. Most protocols return s

Re: [Nutch-dev] Re: [Nutch-general] 404

2004-10-07 Thread Andy Hedges
Doug Cutting wrote: Andy Hedges wrote: What I would like, I think, is to completely remove all record of 404s at fetch time as and when they are found from the db and the segments. Stepping back a bit, I think what you'd like is if the most recent attempt to fetch a URL resulted in a 404 then t

Re: [Nutch-dev] new plugin cvs project?

2004-10-07 Thread Doug Cutting
Stefan Groschupf wrote: I would love to suggest to create a new project "plugins" in our cvs and store most of the plugins there. I like having a single tree. That way, when an API changes, it's easy to update and test all of the plugins. I think that it would be good to have a minimal nutch an

[Nutch-dev] new plugin cvs project?

2004-10-07 Thread Stefan Groschupf
Hi, it is very nice to see that more and more plugin are born. I would love to suggest to create a new project "plugins" in our cvs and store most of the plugins there. I think that it would be good to have a minimal nutch and people then can add ftp, pdf, language cc and other plugins. From my

Re: [Nutch-dev] HTTP->HTTPS redirects

2004-10-07 Thread ogjunk-nutch
I like the elegance of this. Would it also be possible to add this to Response: public abstract int getCode(); Most protocols return status codes in responses, I think. Otis --- Doug Cutting <[EMAIL PROTECTED]> wrote: > Clermont, Roger C EIS-NWD Contractor wrote: > > My idea is along the lines

Re: [Nutch-dev] Bug in LocalFileSystem

2004-10-07 Thread Andrzej Bialecki
Doug Cutting wrote: Andrzej Bialecki wrote: Well, I can speak only for v. 0.6 ... Without patching the LocalFileSystem, and when using different partitions for temporary files and target files you _will_ trigger the bug. So, yes, it's likely that this is the reason. Michael, should I commit the

[Nutch-dev] (no subject)

2004-10-07 Thread Smarty
--- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productgui

Re: [Nutch-dev] HTTP->HTTPS redirects

2004-10-07 Thread Doug Cutting
Clermont, Roger C EIS-NWD Contractor wrote: My idea is along the lines of throwing a RedirectException from the protocol, catching it in the appropriate place and then handling it. As a design rule, Exceptions should not be used for normal control flow, and redirects are arguably normal control fl

Re: [Nutch-dev] Getting HTTP status code from Fetcher?

2004-10-07 Thread Doug Cutting
[EMAIL PROTECTED] wrote: I'm trying to get get the HTTP status codes (e.g. 200, 404, 302...) from Fetcher's output, but can't find a way to access them. I poked around the sources a bit and see that the HTTP status codes are used inside net/nutch/protocol/http/Http.java, but I don't see them used

Re: [Nutch-dev] rtf being filtered by regex templates

2004-10-07 Thread Doug Cutting
Applied. Thanks. Doug Andy Hedges wrote: I noticed that files with the rtf extension are being filtered by the regex templates. So I have produced this patch to remove them. If someone could commit this then that would be great. Thanks, Andy Index: conf/crawl-urlfilter.txt.template =

Re: [Nutch-dev] Bug in LocalFileSystem

2004-10-07 Thread Doug Cutting
Andrzej Bialecki wrote: Well, I can speak only for v. 0.6 ... Without patching the LocalFileSystem, and when using different partitions for temporary files and target files you _will_ trigger the bug. So, yes, it's likely that this is the reason. Michael, should I commit the patch or are you pl

Re: [Nutch-dev] Re: [Nutch-general] 404

2004-10-07 Thread Doug Cutting
Andy Hedges wrote: What I would like, I think, is to completely remove all record of 404s at fetch time as and when they are found from the db and the segments. Stepping back a bit, I think what you'd like is if the most recent attempt to fetch a URL resulted in a 404 then the URL should not sho

Re: [Nutch-dev] patch for "View as Plain Text"

2004-10-07 Thread Doug Cutting
[EMAIL PROTECTED] wrote: Attached is a patch to provide "View as Plain Text" feature. A new op OP_PARSETEXT is introduced in DistributedSearch.java. text.jsp is added. This looks great to me. Thanks again! Doug --- This SF.net email is sponsored

Re: [Nutch-dev] bug in distributed search

2004-10-07 Thread Doug Cutting
[EMAIL PROTECTED] wrote: The NullPointerException was thrown, due to a null string site in UTF8.writeString(out, site) at line 63 in Hit.java The bug can be fixed by if (site == null) site = ""; in either Hit.java or UTF8.java. Which one is preferred place? I agree that Hit.java is preferred. Tha

Re: [Nutch-dev] Crawling without script and cygwin (win32)

2004-10-07 Thread Doug Cutting
Meyer-Trexler, Carsten (K-DOK-1/E) wrote: I´m continously getting "java.io.IOException: File does not exist". When using the crawl command. Can you please send the full output? So I tried to use the java-class directly with the command: "java -classpath %classpath%;nutch-nightly.jar net.nutch.tools

[Nutch-dev] [ nutch-Bugs-1042000 ] ontology supported query refinement

2004-10-07 Thread SourceForge.net
Bugs item #1042000, was opened at 2004-10-07 05:54 Message generated for change (Comment added) made by mjpan You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=491356&aid=1042000&group_id=59548 Category: None Group: None Status: Open Resolution: None Priority: 5 Submi

Re: [Nutch-dev] Are we getting - Block-Level Link Analysis

2004-10-07 Thread Antonio Gulli
tigger . wrote: Hi All Are the people at Nutch going down the road off Block-Level Link Analysis, maybe we could have ot has a option. Paul Do you mean block level like the http://www.stanford.edu/~sdkamvar/papers/blockrank.pdf ? Does someone have any information if google real

[Nutch-dev] [ nutch-Bugs-1042000 ] ontology supported query refinement

2004-10-07 Thread SourceForge.net
Bugs item #1042000, was opened at 2004-10-07 01:54 Message generated for change (Comment added) made by otis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=491356&aid=1042000&group_id=59548 Category: None Group: None Status: Open Resolution: None Priority: 5 Submit

Re: [Nutch-dev] Index Errors

2004-10-07 Thread Jason Boss
Hey Paul, I have and did just go through a lot of segments that I found I had some issues with. If any of the data gets corrupt at all I have seen the java.io.FileNotFoundException. Upon trying to merge a lot of segments, I think I cleaned up a lot of corruption and saved what segments I could.

Re: [Nutch-dev] error on fetch

2004-10-07 Thread Jason Boss
Hey Paul, I am by no means any wealth of knowledge, as I constantly get owed on this. For the time being on the same version you are running I am having fairly good luck. I sent a crawler for the page that gave you a java.net.SocketTimeoutExpection error and it returned fine without locking up th