te: Sunday, February 8, 2015 at 10:56 AM
To: "user-ow...@nutch.apache.org"
Subject: Newbie
>I am new to Nutch - is there a learning resource anywhere?
>
>Thanks
>Trevor
>
>
Hi everyone,
This is my first post so apologies if this is not the correct question to ask.
I have followed the wiki tutorial and I am getting the below error. I am
running in the local mode and don't have hadoop installed. Can you please help
as I have no clue what's going wrong.
Thanks.
Nish
Hello everyone, I'm a newbie to nutch... sorry if the question is silly...
I've installed Nutch according to the steps of the official tutorial.
Everything seems ok, and the crawl completes (just with some error on
specific pages), but I cannot get any result through the browser s
Hi Guys,
I am attempting to run nutch using cygwin, and I am having the following
problem:
Ps. I added Hadoop-core to the lib folder already -
I appreciate any insight or comment you guys may have -
$ bin/crawl -i urls/ TestCrawl/ 2
Injecting seed URLs
/cygdrive/c/apache-nutch-1.11/bin/nutch i
Hi all,
I am very new to Nutch and Lucene as well. I am having few questions about
Nutch, I know they are very much basic but I could not get clear cut answers
out of googling for this. The questions are,
- If I have to crawl just 5-6 web sites or URL's should I use intranet
crawl or whole
done with Tika (amongst others)
Lewis
From: Roberto [rmez...@infinito.it]
Sent: 04 May 2011 11:36
To: user@nutch.apache.org
Subject: Newbie: No search result
Hello everyone, I'm a newbie to nutch... sorry if the question is silly...
I've installed Nutch according to the
Thank you very much Lewis! It seems ok now...
The property "searcher.dir" was not set at all (The tutorial do not
mention it...).
I edited /var/lib/tomcat6/webapps/nutch/WEB-INF/classes/nutch-site.xml
this way:
searcher.dir
file:/var/apache-nutch-1.1-bin/crawl.test/
the "file:" pref
>Now nutch web search works, but just for one of two sites configured
Just to clarify, are you saying that the pages you configured have been
fetched, processed and indexed but do not feature when you submit a query or
that Nutch is failing to fetch one site when you are crawling?
>moreover
Ok, everything seems to work now. I've just created four separated
'conf' and 'url' files (two sites with two language version each) and
four tomcat nutch instances, following this guide:
http://wiki.apache.org/nutch/GettingNutchRunningWithDebian
Thank you again for your help!
Hi,
I have manual script for my first crawl, anyone can explain this command
step by step:
*Initialize the crawldb*
bin/nutch inject urls/
*Generate URLs from crawldb*
bin/nutch generate -topN 80
*Fetch generated URLs*
bin/nutch fetch -all
*Parse fetched URLs*
bin/nutch parse -all
*Update databas
Hi Sas,
See response inline :)
On Wed, Jun 15, 2016 at 5:36 AM, wrote:
> From: "Jamal, Sarfaraz"
> To: "'user@nutch.apache.org'"
> Cc:
> Date: Mon, 13 Jun 2016 17:36:44 -0400
> Subject: Newbie Question, hadoop error?
> Hi Guys,
>
> I am
Hi Guy,
I have nutch 'working' relatively, and I am now ready to index it to solr.
I already have a solr environment up and running and now wish to index a few
websites.
I have read through the documentation and I believe I have to do something like
this:
Instead of this:
"cp ${NUTCH_RUNTIME_
cz i m also a newbie
Best of luck with nutch learning
On Mon, Jan 24, 2011 at 9:04 PM, .: Abhishek :. wrote:
> Hi all,
>
> I am very new to Nutch and Lucene as well. I am having few questions about
> Nutch, I know they are very much basic but I could not get clear cut
> a
tes,u can use both cases but intranet crawl
> gives u more control and speed
> 2.After the first crawl,the recrawling the same sites time is 30 days by
> default in db.fetcher.interval,you can change it according to ur own
> convenience.
> 3.I ve no idea about the third question
>
Subject: Re: Few questions from a newbie
Refer NutchBean.java for the their question. You can run than from command line
to test the index.
If you use SOLR indexing, it is going to be much simpler, they have a solr
java
client..
Sent from my iPhone
On Jan 24, 2011, at 8:07 PM, Amna
questions from a newbie
How to use solr to index nutch segments?
What is the meaning of db.fetcher.interval? Does this mean that if I run the
same crawl command before 30 days it will do nothing?
Thanks.
Alex.
-Original Message-
From: Charan K
To: user
Cc: user
Sent: Mon, Jan 24
ing of db.fetcher.interval? Does this mean that if I run
> the same crawl command before 30 days it will do nothing?
>
> Thanks.
> Alex.
>
>
>
>
>
>
>
>
>
>
> -Original Message-
> From: Charan K
> To: user
> Cc: user
> Sent: Mon, Jan 24, 2011
ll do nothing?
> >
> > Thanks.
> > Alex.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > -Original Message-
> > From: Charan K
> > To: user
> > Cc: user
> > Sent: Mon, Jan 24, 2011 8:24 pm
> > Subject: Re: Few questions from a ne
> > > Thanks.
> > > Alex.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > -Original Message-
> > > From: Charan K
> > > To: user
l command before 30 days it will do nothing?
> > > >
> > > > Thanks.
> > > > Alex.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
d in the
> last
> > > > 30 days will not be fetched. Or A URL is eligible for
> refetch
> > > > only after 30 days of last crawl.
> > > >
> > > > On Mon, Jan 24, 2011 at 9:23 PM, wrote:
> > > > > How to use solr to index nutch segments?
> > > > > What is the meaning of db.fetcher.interval? Does this mean that if
> I
> > &
your time!
> > > >
> > > > On Tue, Jan 25, 2011 at 2:33 PM, charan kumar <
> charan.ku...@gmail.com
> > > >wrote:
> > > > > db.fetcher.interval : It means that URLS which were fetched in the
> > last
> > > > > 30 days will not be fetched. Or A URL is eligible for
> > refetch
> >
e examples are fairly
dated now.
Hope this helps
From: .: Abhishek :. [ab1s...@gmail.com]
Sent: 26 January 2011 03:02
To: markus.jel...@openindex.io
Cc: user@nutch.apache.org
Subject: Re: Few questions from a newbie
Thanks a bunch Markus.
By the way, is
Hi list,
I have given the set of urls as
http://is.gd/Jt32Cf
http://is.gd/hS3lEJ
http://is.gd/Jy1Im3
http://is.gd/QoJ8xy
http://is.gd/e4ct89
http://is.gd/WAOVmd
http://is.gd/lhkA69
http://is.gd/3OilLD
. 43 such urls
And I have run the crawl command bin/nutch crawl urls/ -dir crawl -depth 3
You probably have to literally click on each URL to get the URL it's
referencing. Those are URL shorteners and probably won't play nicely with a
crawler because of the redirection.
Adam
Sent from my iPhone
On Jan 26, 2011, at 8:02 AM, Arjun Kumar Reddy
wrote:
> Hi list,
>
> I have given t
I am developing an application based on twitter feeds...so 90% of the url's
will be short urls.
So, it is difficult for me to manually convert all these urls to actual
urls. Do we have any other solution for this?
Thanks and regards,
Arjun Kumar Reddy
On Wed, Jan 26, 2011 at 7:09 PM, Estrada Gr
hello
you have to use the short url APIs and get the long URLs... its abit
complex as you have to determine the url if its short, then determine the
url shortening service used eg: tinyurl.com bit.ly or goo.gl and then you
use their respective api and send in the url and they will return the long
Yea Hi Mambe,
Thanks for the feedback. I have mentioned the details of my application in
the above post.
I have tried doing this crawling job using php-multi curl and I am getting
results which are good enough but the problem I am facing is that it is
taking hell lot of time to get the contents of
even if the url being crawled is shortened, it will still lead nutch to the
actual link and nutch will fetch it
Churchill Nanje Mambe
237 77545907,
AfroVisioN Founder, President,CEO
www.camerborn.com/mambenanje
http://www.afrovisiongroup.com | http://mambenanje.blogspot.com
skypeID: mambenanje
www
even if the url being crawled is shortened, it will still lead nutch to the
actual link and nutch will fetch it
you can put fetch external and internal links to false and increase depth.
-Original Message-
From: Churchill Nanje Mambe
To: user
Sent: Wed, Jan 26, 2011 8:03 am
Subject: Re: Few questions from a newbie
even if the url being crawled is shortened, it will still lead nutch
I am trying Nutch for the first time. I created an automated docker setup
to load
Nutch 2 + Hbase (i had tried cassandra but could not get it to work so i
thought i start with Hbase to give it a try)
The project is available at https://github.com/bizmate/nutch
and with docker compose you can start
Hi Jamal - don't use managed schema with Solr 6.0 and/or 6.1. Just copy over
the schema Nutch provides and you are good to go.
Markus
-Original message-
> From:Jamal, Sarfaraz
> Sent: Friday 15th July 2016 15:47
> To: user@nutch.apache.org
> Subject: Newbie Nutc
.
Regards
Mike
Von:Arjun Kumar Reddy
An: user@nutch.apache.org
Datum: 26.01.2011 15:43
Betreff:Re: Few questions from a newbie
I am developing an application based on twitter feeds...so 90% of the
url's
will be short urls.
So, it is difficult for me to manually convert all
).
> I think there are four redirects needed to get the given url content. So
> you have to increase the depth for your crawling.
>
> Regards
> Mike
>
>
>
>
> Von:Arjun Kumar Reddy
> An: user@nutch.apache.org
> Datum: 26.01.2011 15:43
> Betreff:
I've been crawling the user groups, and I feel like Nutch can do this by
default, but I just can't seem to crack it.
I want to grab meta tags from indexed pages and insert them in the
database. Specifically, I'll have some meta tags that identity the type of
content on the page, so that I can g
Hi Diego,
On Mon, May 9, 2016 at 2:32 AM, wrote:
>
> From: diego gullo
> To: user@nutch.apache.org
> Cc:
> Date: Sat, 7 May 2016 09:41:00 +0100
> Subject: Newbie trouble - Hbase class not found
> I am trying Nutch for the first time. I created an automated docker setu
gt;
> On Mon, May 9, 2016 at 2:32 AM, wrote:
>
> >
> > From: diego gullo
> > To: user@nutch.apache.org
> > Cc:
> > Date: Sat, 7 May 2016 09:41:00 +0100
> > Subject: Newbie trouble - Hbase class not found
> > I am trying Nutch for the first time. I crea
> wrote:
>
>> Hi Diego,
>>
>> On Mon, May 9, 2016 at 2:32 AM,
>> wrote:
>>
>> >
>> > From: diego gullo
>> > To: user@nutch.apache.org
>> > Cc:
>> > Date: Sat, 7 May 2016 09:41:00 +0100
>> > Subject: Newbie t
Hi Diego,
The PR at https://github.com/apache/nutch/pull/111 will solve your issue.
Thanks
On Mon, May 16, 2016 at 11:40 AM, wrote:
>
> From: diego gullo
> To: user@nutch.apache.org
> Cc:
> Date: Sun, 15 May 2016 20:04:05 +0100
> Subject: Re: Newbie trouble - Hbase class no
sspath? Is it an environment variable?
Thanks,
Sas
-Original Message-
From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com]
Sent: Wednesday, June 15, 2016 11:46 PM
To: user@nutch.apache.org
Subject: [E] Re: Newbie Question, hadoop error?
Hi Sas,
See response inline :)
On W
M
To: user@nutch.apache.org
Subject: [E] Re: Newbie Question, hadoop error?
Hi Sas,
See response inline :)
On Wed, Jun 15, 2016 at 5:36 AM, wrote:
> From: "Jamal, Sarfaraz"
> To: "'user@nutch.apache.org'"
> Cc:
> Date: Mon, 13 Jun 2016 17:36:44 -0400
> Subj
Hi,
So I'm looking to add standard keyword and description metadata to my index.
I'm referencing NUTCH-809 (https://issues.apache.org/jira/browse/NUTCH-809) and
it includes a patch file that appears to be for a file in the source at the
following location:
src/plugin/index-metatags/src/java/at
The issue is still open.As a result of this the patch file was not applied
to any version.
Faruk
2011/11/29 John Dhabolt
> Hi,
>
> So I'm looking to add standard keyword and description metadata to my
> index. I'm referencing NUTCH-809 (
> https://issues.apache.org/jira/browse/NUTCH-809) and it
Whoops, forgot to reply all and left the mailing list out of my response.
- Forwarded Message -
From: John Dhabolt
To: Faruk Berksöz
Sent: Tuesday, November 29, 2011 4:59 PM
Subject: Re: Newbie question about non-trunk plug-in locations
Hi Frank,
Thank you for the reply. Is the
Re: Newbie question about non-trunk plug-in locations
Hi Frank,
Thank you for the reply. Is the original file(s) available somewhere that I can
download and apply the patch to? Since there was a discussion about something
that appears to be broken in the current version without the patch, I was j
46 matches
Mail list logo