Charita, what is your value for $s1 in this example? ("echo $s1" to find out)
For more logging detail, try "nutch fetch -logLevel finest $s1"
--matt
On Thu, 20 Jan 2005 00:03:38 -0600, Charitha Tillekeratne
<[EMAIL PROTECTED]> wrote:
> I am trying to use Nutch by following the tutorial on a Wind
I am trying to use Nutch by following the tutorial on a Windows system
using Cygwin. When executing the fetch command I get the following
error (at the end). I added a print statement in LocalFileSystem.java
and found out that it was looking for the file "-l\fetchlist\data".
Any idea on how to fix
1. Does Nutch support URL alias? Meaning the URL I crawled is
different than the URL I display in the search result page?
2. Does Nutch support file system crawling and database crawling?
What would be the configuration?
3. So far the Nutch documentation I can find are:
1. the tutorial
2.
1. Yup, the outlink config fixes the problem.
2. The segread -fix is one way to save the broken data. This can be a
work around for the problem. How much time would it take to copy the
data compare to crawling? I think copying data from the local disk is
still faster than re-starting a new crawl
Three queries, somewhat related:
1. I'm feeling stupid, but I can't figure out the right syntax for a
file URL for the crawler for Nutch on Windows/Cygwin. Suggestions?
I've tried:
file:///c:/foo
file:///c|/foo [Netscape style]
and a few others.
2. A while ago, Andy Hedges mentioned that h
Bugs item #1105652, was opened at 2005-01-19 18:04
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=1105652&group_id=59548
Category: fetcher
Group: None
Status: Open
Resolution: N
Antigen for Exchange found Unknown infected with HTML/Bofr virus.
The file is currently Removed. The message, "[Nutch-dev] Confirmation", was
sent from [EMAIL PROTECTED] and was discovered in First Storage Group\Nikola
Midich\Inbox\Mailing Lists\Nutch-Dev
located at Perfectinfo/First Administrati
Bugs item #1077261, was opened at 2004-12-01 21:34
Message generated for change (Comment added) made by mkangas
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=1077261&group_id=59548
Category: tools
Group: mainline
Status: Open
Resolution: None
Priority:
I have seen in the wiki different samples of nutch. Is there any samples of
nutch in its default state. (out of the box)
Thanx
---
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create
Nutch wrote:
Nutch wrote:
>> Hi,
>> >> I have been testing Nutch on our Intranet site but since we have
"&" in >> our url"s Nutch doesn"t work very well. Are there some way of
getting Nutch
>> to accept url"s containing &?
> Yes, just add it to the allowed characters in the regex-urlfilter.t
Bugs item #1077261, was opened at 2004-12-02 03:34
Message generated for change (Comment added) made by abial
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=1077261&group_id=59548
Category: tools
Group: mainline
Status: Open
Resolution: None
Priority: 5
Bugs item #1077258, was opened at 2004-12-02 03:27
Message generated for change (Comment added) made by abial
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=1077258&group_id=59548
Category: web db
Group: mainline
>Status: Closed
>Resolution: Accepted
Pri
Bugs item #1077173, was opened at 2004-12-02 00:44
Message generated for change (Comment added) made by abial
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=1077173&group_id=59548
Category: indexer
Group: mainline
>Status: Closed
Resolution: None
Priorit
Nutch wrote:
>> Hi,
>>
>> I have been testing Nutch on our Intranet site but since we have "&" in
>> our url"s Nutch doesn"t work very well. Are there some way of getting Nutch
>> to accept url"s containing &?
> Yes, just add it to the allowed characters in the regex-urlfilter.txt
> config file
Thank you all for your input.
--- Ken Meltsner <[EMAIL PROTECTED]> wrote:
> [...] have Windows, while Java or C++ solutions (POI, *WVWare*, OpenOffice)
> run on Linux/Unix as well.
I didn't know about WVWare.
Did you have a chance to use it? How it compares to POI or OpenOffice/UNO ?
_
Nutch wrote:
Hi,
I have been testing Nutch on our Intranet site but since we have "&" in
our url's Nutch doesn't work very well. Are there some way of getting Nutch
to accept url's containing &?
Yes, just add it to the allowed characters in the regex-urlfilter.txt
config file.
--
Best regards,
Hi,
I have been testing Nutch on our Intranet site but since we have "&" in
our url's Nutch doesn't work very well. Are there some way of getting Nutch
to accept url's containing &?
Thanks
Fredrik
---
The SF.Net email is sponsored by: Beat the p
Gavin Chan wrote:
We are evaluating nutch for our internet and intranet crawling.
However, I am encountering the following problems/questions when using
it and would like to seek your comments/suggestions:
1. Not all URLs in a HTML page are crawled/indexed.
* After massaging the URL filters and
18 matches
Mail list logo