> I need to be able to get to HTTP status codes.
> Response is nice and generic, but for my purposes it also hides too
> much.
>
> > >Most protocols return status codes in responses, I think.
>
> If my assumption above is correct, then adding public abstract int
> getCode() method would make sens
Hi,
wouldn't it be sense-fully to put the wiki content under creative
common as well?
Sorry I don't know how to change the template for new page creation,
may this can only be done by the admin.
Stefan
---
This SF.net email is sponsored by
--- Andy Hedges <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] wrote:
>
> >I like the elegance of this.
> >Would it also be possible to add this to Response:
> >
> >public abstract int getCode();
> >
> >
> What do you intent to do with the code? I think the idea is that all
> protocol codes/o
Doug,
I would love to suggest to create a new project "plugins" in our cvs
and store most of the plugins there.
I like having a single tree. That way, when an API changes, it's easy
to update and test all of the plugins.
To have most of the plugins in a separate cvs project mean not that we
ca
[EMAIL PROTECTED] wrote:
I like the elegance of this.
Would it also be possible to add this to Response:
public abstract int getCode();
What do you intent to do with the code? I think the idea is that all
protocol codes/outcome can be mapped to the response class below.
Most protocols return s
Doug Cutting wrote:
Andy Hedges wrote:
What I would like, I think, is to completely remove all record of
404s at fetch time as and when they are found from the db and the
segments.
Stepping back a bit, I think what you'd like is if the most recent
attempt to fetch a URL resulted in a 404 then t
Stefan Groschupf wrote:
I would love to suggest to create a new project "plugins" in our cvs and
store most of the plugins there.
I like having a single tree. That way, when an API changes, it's easy
to update and test all of the plugins.
I think that it would be good to have a minimal nutch an
Hi,
it is very nice to see that more and more plugin are born.
I would love to suggest to create a new project "plugins" in our cvs
and store most of the plugins there.
I think that it would be good to have a minimal nutch and people then
can add ftp, pdf, language cc and other plugins.
From my
I like the elegance of this.
Would it also be possible to add this to Response:
public abstract int getCode();
Most protocols return status codes in responses, I think.
Otis
--- Doug Cutting <[EMAIL PROTECTED]> wrote:
> Clermont, Roger C EIS-NWD Contractor wrote:
> > My idea is along the lines
Doug Cutting wrote:
Andrzej Bialecki wrote:
Well, I can speak only for v. 0.6 ... Without patching the
LocalFileSystem, and when using different partitions for temporary
files and target files you _will_ trigger the bug. So, yes, it's
likely that this is the reason.
Michael, should I commit the
---
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productgui
Clermont, Roger C EIS-NWD Contractor wrote:
My idea is along the lines of throwing a
RedirectException from the protocol, catching it in the appropriate place and
then handling it.
As a design rule, Exceptions should not be used for normal control flow,
and redirects are arguably normal control fl
[EMAIL PROTECTED] wrote:
I'm trying to get get the HTTP status codes (e.g. 200, 404, 302...)
from Fetcher's output, but can't find a way to access them. I poked
around the sources a bit and see that the HTTP status codes are used
inside net/nutch/protocol/http/Http.java, but I don't see them used
Applied. Thanks.
Doug
Andy Hedges wrote:
I noticed that files with the rtf extension are being filtered by the
regex templates. So I have produced this patch to remove them. If
someone could commit this then that would be great.
Thanks,
Andy
Index: conf/crawl-urlfilter.txt.template
=
Andrzej Bialecki wrote:
Well, I can speak only for v. 0.6 ... Without patching the
LocalFileSystem, and when using different partitions for temporary files
and target files you _will_ trigger the bug. So, yes, it's likely that
this is the reason.
Michael, should I commit the patch or are you pl
Andy Hedges wrote:
What I would like, I think, is to completely
remove all record of 404s at fetch time as and when they are found from
the db and the segments.
Stepping back a bit, I think what you'd like is if the most recent
attempt to fetch a URL resulted in a 404 then the URL should not sho
[EMAIL PROTECTED] wrote:
Attached is a patch to provide "View as Plain Text" feature.
A new op OP_PARSETEXT is introduced in DistributedSearch.java.
text.jsp is added.
This looks great to me. Thanks again!
Doug
---
This SF.net email is sponsored
[EMAIL PROTECTED] wrote:
The NullPointerException was thrown, due to a null string site
in UTF8.writeString(out, site)
at line 63 in Hit.java
The bug can be fixed by
if (site == null)
site = "";
in either Hit.java or UTF8.java. Which one is preferred place?
I agree that Hit.java is preferred.
Tha
Meyer-Trexler, Carsten (K-DOK-1/E) wrote:
I´m continously getting "java.io.IOException: File does not exist". When
using the crawl command.
Can you please send the full output?
So I tried to use the java-class directly with the
command:
"java -classpath %classpath%;nutch-nightly.jar net.nutch.tools
Bugs item #1042000, was opened at 2004-10-07 05:54
Message generated for change (Comment added) made by mjpan
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=1042000&group_id=59548
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submi
tigger . wrote:
Hi All
Are the people at Nutch going down the road off Block-Level Link
Analysis, maybe we could have ot has a option.
Paul
Do you mean block level like the
http://www.stanford.edu/~sdkamvar/papers/blockrank.pdf ?
Does someone have any information if google real
Bugs item #1042000, was opened at 2004-10-07 01:54
Message generated for change (Comment added) made by otis
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=1042000&group_id=59548
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submit
Hey Paul,
I have and did just go through a lot of segments that I found I had some
issues with. If any of the data gets corrupt at all I have seen the
java.io.FileNotFoundException.
Upon trying to merge a lot of segments, I think I cleaned up a lot of
corruption and saved what segments I could.
Hey Paul,
I am by no means any wealth of knowledge, as I constantly get owed on this.
For the time being on the same version you are running I am having fairly
good luck.
I sent a crawler for the page that gave you a
java.net.SocketTimeoutExpection error and it returned fine without locking
up th
24 matches
Mail list logo