My replies inline.
On Fri, Apr 4, 2008 at 12:47 PM, Vineet Garg <[EMAIL PROTECTED]> wrote:
> Hi
>
> Thanks for the response. Maybe I was not clear in expressing myself.
>
> I am crawling a parent directory in my 'home' on Linux machine therefore my
> urls have to begin with file: and not http:.
I have tried that but it does not work..
[EMAIL PROTECTED] wrote:
Hello Vinet,
Try using regex-urlfilter instead of crawl-urlfilter.
Regards,
Arkadi
-Original Message-
From: Vineet Garg [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 02, 2008 10:34 PM
To: nutch-user@lucene.apach
Hi
Thanks for the response. Maybe I was not clear in expressing myself.
I am crawling a parent directory in my 'home' on Linux machine therefore my
urls have to begin with file: and not http:. I have defined the file
protocol and the crawl too is okay. My question is though I have modified
the c
Find my reply inline.
On Wed, Apr 2, 2008 at 5:04 PM, Vineet Garg <[EMAIL PROTECTED]> wrote:
> Hi,
> I am using Nutch to crawl local file system. I am crawling by bin/nutch
> crawl urls -dir crawl -depth 5 -topN 500 > & crawl.log.
> But nutch is fetching files e.g. .css or .png files which i ha
Hello Vinet,
Try using regex-urlfilter instead of crawl-urlfilter.
Regards,
Arkadi
> -Original Message-
> From: Vineet Garg [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, April 02, 2008 10:34 PM
> To: nutch-user@lucene.apache.org
> Subject: Nutch fetching skipped files
>
> Hi,
> I am usi