Hi all
This is my crawl-urlfilter.txt
==
# The url filter file used by the crawl command.
# Better for intranet crawling.
# Be sure to change MY.DOMAIN.NAME to your domain name.
# Each non-comment, non-blank line con
I think Lucene would be better because you get lucene and nutch users.
But nutch works also.
On 8/21/07, Lyndon Maydwell <[EMAIL PROTECTED]> wrote:
> Sounds like a good idea to me.
>
--
Berlin Brown
http://www.newspiritcompany.com - newspirit technologies
Sounds like a good idea to me.
> http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.htm
>
> The link above is not working...
Change the extension from htm to html and it works.
JohnM
> Naess, Ronny wrote:
> >
> > Take a look at this article
> > http://today.java.net/pub/a/today/2006/02/16/introduction-to-
In the trunk I can see some extra folders like 'contrib', 'site'. What
are they? These are not present in nutch-0.9.
Is there an IRC channel for nutch-users? If not, can we not join
#nutch at irc.freenode.net and help each other out and discuss.
Find my replies inline.
On 8/21/07, Naresh Saxena <[EMAIL PROTECTED]> wrote:
> I am unable to find out where to begin. Kindly please help me.
>
> 1. How can I build nutch-0.9.war after developing a patch and patching it?
'ant war' should do it. See the target name 'war' in 'build.xml'.
> 2. Wher
I am unable to find out where to begin. Kindly please help me.
1. How can I build nutch-0.9.war after developing a patch and patching it?
2. Where are the files for the Nutch web gui located in the source?
On 8/21/07, Michael Wechner <[EMAIL PROTECTED]> wrote:
> Naresh Saxena wrote:
>
> >hi,
> >
Naresh Saxena wrote:
hi,
this is me posting again since I did not get any response. I know it
is too quick to expect a response and probably I am being a little too
impatient. but I would like to just get a "yes" or "no" from you all
whether result page navigation links are available at all?
Naresh Saxena wrote:
hi,
this is me posting again since I did not get any response. I know it
is too quick to expect a response and probably I am being a little too
impatient. but I would like to just get a "yes" or "no" from you all
whether result page navigation links are available at all? If
hi,
this is me posting again since I did not get any response. I know it
is too quick to expect a response and probably I am being a little too
impatient. but I would like to just get a "yes" or "no" from you all
whether result page navigation links are available at all? If not
available, I would
hi friends,
i am facing one problem with Nutch. I don't like the default search
results page of Nutch. It does not have any navigation links like page
1, 2, 3, 4, etc. like that of Google, Yahoo, etc.
is there any patch for it?
cheers,
Naresh
Matt,
(I didn't notice your message until this morning...)
I implemented this as well but I am using Nutch 0.8 so that the generated
Lucene index version matches what our front end searcher is using. I
implemented this in a manner similar to scoring. The first step was to add
a crawl depth fiel
On 8/21/07, Smith Norton <[EMAIL PROTECTED]> wrote:
> Thank you for the quick replies. If I do not wish to use svn and the
> svn diff am I allowed to submit patches against Nutch 0.9 generated
> using diff -au command?
Of course you are allowed :). However, most developers use latest
trunk so send
Thank you for the quick replies. If I do not wish to use svn and the
svn diff am I allowed to submit patches against Nutch 0.9 generated
using diff -au command?
On 8/21/07, Doğacan Güney <[EMAIL PROTECTED]> wrote:
> On 8/21/07, Smith Norton <[EMAIL PROTECTED]> wrote:
> > Is there any guideline for
On 8/21/07, Smith Norton <[EMAIL PROTECTED]> wrote:
> Is there any guideline for submission of Patches? Should the patch be
> against the last stable version, i.e Nutch 0.9 or something else like
> the latest code in the CVS?
Nutch wiki has a good article: http://wiki.apache.org/nutch/HowToContrib
Smith Norton wrote:
Is there any guideline for submission of Patches? Should the patch be
against the last stable version, i.e Nutch 0.9 or something else like
the latest code in the CVS?
IIRC Nutch is using SVN (http://subversion.tigris.org/faq.html)
I also keep finding about 'trunk' ver
hi..
there was some problem in my config files and rectified it...
but still getting the same error
This is some part of the log saying that NOT INCLUDING CERTAIN PLUGINS
INCLUDING Protocol-smb...
2007-08-20 10:15:28,891 DEBUG plugin.PluginRepository - parsing:
/var/www/html/nutch9loc/plugins/
Hi,
No itried with different search string but it returned 0 results.
Also one more thing while indexing Local file system. I changed the
nutch-site.xml as:
plugin.includes
protocol-file|protocol-http|parse-(text|html)|index-basic|query-(basic|site|url)
h
Is there any guideline for submission of Patches? Should the patch be
against the last stable version, i.e Nutch 0.9 or something else like
the latest code in the CVS?
I also keep finding about 'trunk' very often. I am a newbie to open
source way of doing projects, so it would be great if you coul
You are searching for 'apache' in the search results. Are you sure the
word 'apache' should exist in the search results?
You can try some other string instead of 'apache' that you know would
surely exist in one of the websites that you have crawled.
There are a number of other things that could g
Ya Thanks, That solved my problem. However, while checking for the
integrity of the indexes i execute the following command:
bin/nutch org.apache.nutch.searcher.NutchBean apache
but its returns me 0 Hits. Can u please tell me what i am missing?
Thanks in Advance.
Regards,
Sachin.
> You need to
You need to set the following properties in 'conf/nutch-site.xml'.
Though, in the example below, I have left the agent description, agent
url, etc. void but ideally you should set them so that the owner of a
website can find out who is crawling the site and how to reach them.
http.agent.name
23 matches
Mail list logo