Did u check crawl-urlfilter.txt?
All the domain names that you'd like to crawl have to mentioned.
e.g.
# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*mersin\.edu\.tr/
+^http://([a-z0-9]*\.)*tubitak\.gov\.tr/
Also check property db.ignore.external.links in nutch-default.xml. Should be
se
i have problems about nutch.my project is link analysis i crawled
"www.mersin.edu.tr" and i analyse linkdb and i saw all about mersin.edu.tr
links.But i have to find other links in site example www.tubitak.gov.tr bu i
cannot find?i have to find these links ?please help me
Excellent, I'll have a look at the patch.
Thanks, T
On 23/03/2010 19:25, Julien Nioche wrote:
Hi Toby,
Have a look at https://issues.apache.org/jira/browse/NUTCH-655
The patch has been committed to the SVN repository and should allow you to
do exactly what you described.
HTH
Julien
Hi Toby,
Have a look at https://issues.apache.org/jira/browse/NUTCH-655
The patch has been committed to the SVN repository and should allow you to
do exactly what you described.
HTH
Julien
--
DigitalPebble Ltd
http://www.digitalpebble.com
On 23 March 2010 17:35, Toby Cole wrote:
> Hi Nu
Hi Nutch list,
We're using nutch for what basically amounts to an intranet crawl (just
a few domains). We have a HUGE inject list as the site contains a lot of
Ajax pages.
What I'm wondering is… is there a simple way of getting the injected
URLs to have a higher default score
On Thu, Apr 23, 2009 at 12:09 PM, askNutch wrote:
>
>
> can hadoop run in vmware machine?
>
I am running a Hadoop cluster where each node is a VMware virtual
machine. So, yes, it is possible. As long as you are able to connect
to sockets from one virtual machine to another, I don't see why you
c
askNutch wrote:
hi kubes:
thank you for your answers!
i'm sorry that i didn't express my question.
i run nutch only on one machine! and ,i cann't debug hadoop in nutch.because
the hadoop's exist is lib.
how can i debug hadoop source in nutch?
Build hadoop from scrat
hi kubes:
thank you for your answers!
i'm sorry that i didn't express my question.
i run nutch only on one machine! and ,i cann't debug hadoop in nutch.because
the hadoop's exist is lib.
how can i debug hadoop source in nutch?
and to my surprise ,the Tutorial "RunNutchIn
ennis Kubes
>
>
> Alexander Aristov wrote:
>
>> Why not to post such mails personally if you address to single person?
>>
>> Want to know other opinions?
>>
>
> I would :)
>
> Dennis
>
>
>
>> Best Regards
>> Alexander Aristov
Alexander Aristov wrote:
Why not to post such mails personally if you address to single person?
Want to know other opinions?
I would :)
Dennis
Best Regards
Alexander Aristov
2009/4/22 askNutch
hi Kubes:
You are the expert!
Can you tell me What is the develop
askNutch wrote:
hi Kubes:
You are the expert!
Can you tell me What is the develop environment do you use to
develop nutch ?
Linux, Ubuntu (usually the most recent), sun jdk, core2 laptop (although
hoping to upgrade to a sagernotebook.com quad core soon
Why not to post such mails personally if you address to single person?
Want to know other opinions?
Best Regards
Alexander Aristov
2009/4/22 askNutch
>
> hi Kubes:
>You are the expert!
>
>Can you tell me What is the develop environment do you use to
hi Kubes:
You are the expert!
Can you tell me What is the develop environment do you use to
develop nutch ?
such as IDE etc.
I want to debug nutch.
thank you !!!
--
View this message in context:
http://www.nabble.com/hi
inalasuresh wrote:
Hi ,
I am uncommented the refine-query.jsp and refine-query-init.jsp in the
search.jsp
i searched for bikekeyword it given result.
Before that i am trying to run the application with comments & witout
comments .
but that had given the same result.
so plz any one
inalasuresh wrote:
Hi ,
Any one help me. i am new for nutch..
what is the use of subcollections.xml
when it is called.
plz give the response for my query,...
thanx & regards
suresh..
Hi,
Subcollections is a plugin for indexing the urls matching a regular
expression and subcollections
Hi ,
Any one help me. i am new for nutch..
what is the use of subcollections.xml
when it is called.
plz give the response for my query,...
thanx & regards
suresh..
--
View this message in context:
http://www.nabble.com/Hi-what-is-the-use-of-subcollections.xml-tf3389528.html#a9434780
Sent
Hi ,
I am uncommented the refine-query.jsp and refine-query-init.jsp in the
search.jsp
i searched for bikekeyword it given result.
Before that i am trying to run the application with comments & witout
comments .
but that had given the same result.
so plz any one can sugest me
what is
Hi ,
I am uncommented the refine-query.jsp and refine-query-init.jsp in the
search.jsp
i searched for bikekeyword it given result.
Before that i am trying to run the application with comments & witout
comments .
but that had given the same result.
so plz any one can sugest me what is the
verbosely.
fetcher.verbose
false
If true, fetcher will log more verbosely.
- Original Message
From: kevin <[EMAIL PROTECTED]>
To: nutch-user@lucene.apache.org
Sent: Thursday, December 21, 2006 10:55:38 PM
Subject: Hi...How to set Nutch-0.8.1 to save logs into log files when running
the
Hi,
How to set Nutch-0.8.1 to save logs into log files when running the crawl
job?
Is it setting in the nutch-site.xml, or other configuration file?
Thanks your help in advance!
--
kevin
hi
i have a problem now.
i want to crawl the pages which's url contain "...item_detail",but i
must crawl from the www..com
,and if i set rules in the "crawl-urlfilter.txt",i can't get the pages what
i want at all.
so what i need to do now ?
should
Andrzej,
Cheers! Good to know. Thanks!
r/d
-Original Message-
From: Andrzej Bialecki [mailto:[EMAIL PROTECTED]
Sent: Sunday, April 02, 2006 5:01 PM
To: nutch-user@lucene.apache.org
Subject: Re: hi all
Dan Morrill wrote:
> Since you are using Luke to see the index, luke may not have
there is nothing that states
they support any character set.
When you run your search, do you see good characters, or do you see gork?
Luke may not be able to understand the ISO character sets. (Hypothesis).
Hi,
(I'm the guy behind Luke)
Luke uses UTF-8, because that's what Luc
states
> they support any character set.
>
> When you run your search, do you see good characters, or do you see gork?
> Luke may not be able to understand the ISO character sets. (Hypothesis).
>
> r/d
>
> -Original Message-
> From: kauu [mailto:[EMAIL PROTECTED
Subject: Re: hi all
thx for advice!
now i know what's up.
but my OS is WinXp(CHINESE), it supports Chinese very well. and i used the
LUKE to see the index, ant there are messy character when crawl the Chinese
webs.
so ,how can i deal with it??
any reply will be appreciated.
On 4/2/06
AIL PROTECTED]
> Sent: Sunday, April 02, 2006 7:48 AM
> To: nutch-user@lucene.apache.org
> Subject: hi all
>
> hi all:
>i get a big problem when crawl the ftp.
> it seems that Nutch couldn't parse or index the files named in
> Chinese
> so after the command
pack installed to
properly support Chinese.
Personally, I would download the language pack for your Operating system and
see what happens.
r/d
-Original Message-
From: kauu [mailto:[EMAIL PROTECTED]
Sent: Sunday, April 02, 2006 7:48 AM
To: nutch-user@lucene.apache.org
Subject: hi all
hi
hi all:
i get a big problem when crawl the ftp.
it seems that Nutch couldn't parse or index the files named in Chinese
so after the command looks like:
bin/nutch crawl urls.txt -dir test.dir
(i've modified the crawl-urlfilter.txt)
# skip file:, ftp:, & mailto: urls
#-^(f
Hi Kumar,
I'm not a Nutch expert, but I think you'd need to re-crawl all URLs to
determine if they changed since the last crawl, yes? Depending on what
you're doing, you might re-crawl URLs that are most frequently accessed
by users, or keep track per crawl how often pages change
Hi,
I am trying to create a site which will crawl a handful no. of sites. I am
using whole-web crawling to crawl these sites. The problem is I am don't
know how to do a incremental crawling, i.e. only fetch and update, the
webpages which has changed since last crawled.
Thank you all.
--
30 matches
Mail list logo