utch.apache.org
> Subject: Newbie Nutch/Solr Question(s)
>
> Hi Guy,
>
> I have nutch 'working' relatively, and I am now ready to index it to solr.
>
> I already have a solr environment up and running and now wish to index a few
> websites.
>
> I have read through
Hi Guy,
I have nutch 'working' relatively, and I am now ready to index it to solr.
I already have a solr environment up and running and now wish to index a few
websites.
I have read through the documentation and I believe I have to do something like
this:
Instead of this:
"cp
M
To: user@nutch.apache.org
Subject: [E] Re: Newbie Question, hadoop error?
Hi Sas,
See response inline :)
On Wed, Jun 15, 2016 at 5:36 AM, <user-digest-h...@nutch.apache.org> wrote:
> From: "Jamal, Sarfaraz" <sarfaraz.ja...@verizonwireless.com.invalid>
> To: "'user@nutc
t; Date: Mon, 13 Jun 2016 17:36:44 -0400
> Subject: Newbie Question, hadoop error?
> Hi Guys,
>
> I am attempting to run nutch using cygwin,
Is this Nutch 1.11 binary distribution you mean?
> and I am having the following problem:
> Ps. I added Hadoop-core to the lib folder
15 May 2016 20:04:05 +0100
> Subject: Re: Newbie trouble - Hbase class not found
> Hi Lewis
>
> I have changed the build for the docker containers and in the weekend sent
> the PR for the logs folder. The original problem I had is still persistent.
>
> To reproduce
>
>
>
wrote:
>
>> Hi Diego,
>>
>> On Mon, May 9, 2016 at 2:32 AM, <user-digest-h...@nutch.apache.org>
>> wrote:
>>
>> >
>> > From: diego gullo <diegogu...@gmail.com>
>> > To: user@nutch.apache.org
>> > Cc:
>> >
M, <user-digest-h...@nutch.apache.org> wrote:
>
> >
> > From: diego gullo <diegogu...@gmail.com>
> > To: user@nutch.apache.org
> > Cc:
> > Date: Sat, 7 May 2016 09:41:00 +0100
> > Subject: Newbie trouble - Hbase class not found
> > I
Hi Diego,
On Mon, May 9, 2016 at 2:32 AM, <user-digest-h...@nutch.apache.org> wrote:
>
> From: diego gullo <diegogu...@gmail.com>
> To: user@nutch.apache.org
> Cc:
> Date: Sat, 7 May 2016 09:41:00 +0100
> Subject: Newbie trouble - Hbase class not found
> I am t
I am trying Nutch for the first time. I created an automated docker setup
to load
Nutch 2 + Hbase (i had tried cassandra but could not get it to work so i
thought i start with Hbase to give it a try)
The project is available at https://github.com/bizmate/nutch
and with docker compose you can
Hi,
I have manual script for my first crawl, anyone can explain this command
step by step:
*Initialize the crawldb*
bin/nutch inject urls/
*Generate URLs from crawldb*
bin/nutch generate -topN 80
*Fetch generated URLs*
bin/nutch fetch -all
*Parse fetched URLs*
bin/nutch parse -all
*Update
...@merrows.co.uk tre...@merrows.co.uk
Date: Sunday, February 8, 2015 at 10:56 AM
To: user-ow...@nutch.apache.org user-ow...@nutch.apache.org
Subject: Newbie
I am new to Nutch - is there a learning resource anywhere?
Thanks
Trevor
Hi everyone,
This is my first post so apologies if this is not the correct question to ask.
I have followed the wiki tutorial and I am getting the below error. I am
running in the local mode and don't have hadoop installed. Can you please help
as I have no clue what's going wrong.
Thanks.
, November 29, 2011 4:59 PM
Subject: Re: Newbie question about non-trunk plug-in locations
Hi Frank,
Thank you for the reply. Is the original file(s) available somewhere that I can
download and apply the patch to? Since there was a discussion about something
that appears to be broken in the current
Hi,
So I'm looking to add standard keyword and description metadata to my index.
I'm referencing NUTCH-809 (https://issues.apache.org/jira/browse/NUTCH-809) and
it includes a patch file that appears to be for a file in the source at the
following location:
The issue is still open.As a result of this the patch file was not applied
to any version.
Faruk
2011/11/29 John Dhabolt myco...@yahoo.com
Hi,
So I'm looking to add standard keyword and description metadata to my
index. I'm referencing NUTCH-809 (
Hello everyone, I'm a newbie to nutch... sorry if the question is silly...
I've installed Nutch according to the steps of the official tutorial.
Everything seems ok, and the crawl completes (just with some error on
specific pages), but I cannot get any result through the browser search.
My
From: Roberto [rmez...@infinito.it]
Sent: 04 May 2011 11:36
To: user@nutch.apache.org
Subject: Newbie: No search result
Hello everyone, I'm a newbie to nutch... sorry if the question is silly...
I've installed Nutch according to the steps of the official tutorial.
Everything seems ok
Now nutch web search works, but just for one of two sites configured
Just to clarify, are you saying that the pages you configured have been
fetched, processed and indexed but do not feature when you submit a query or
that Nutch is failing to fetch one site when you are crawling?
moreover
, 2011 8:24 pm
Subject: Re: Few questions from a newbie
Refer NutchBean.java for the their question. You can run than from
command
line
to test the index.
If you use SOLR indexing, it is going to be much simpler, they
have
a
solr
.
Alex.
-Original Message-
From: Charan K charan.ku...@gmail.com
To: user user@nutch.apache.org
Cc: user user@nutch.apache.org
Sent: Mon, Jan 24, 2011 8:24 pm
Subject: Re: Few questions from a newbie
are fairly
dated now.
Hope this helps
From: .: Abhishek :. [ab1s...@gmail.com]
Sent: 26 January 2011 03:02
To: markus.jel...@openindex.io
Cc: user@nutch.apache.org
Subject: Re: Few questions from a newbie
Thanks a bunch Markus.
By the way, is there some
Hi list,
I have given the set of urls as
http://is.gd/Jt32Cf
http://is.gd/hS3lEJ
http://is.gd/Jy1Im3
http://is.gd/QoJ8xy
http://is.gd/e4ct89
http://is.gd/WAOVmd
http://is.gd/lhkA69
http://is.gd/3OilLD
. 43 such urls
And I have run the crawl command bin/nutch crawl urls/ -dir crawl -depth 3
I am developing an application based on twitter feeds...so 90% of the url's
will be short urls.
So, it is difficult for me to manually convert all these urls to actual
urls. Do we have any other solution for this?
Thanks and regards,
Arjun Kumar Reddy
On Wed, Jan 26, 2011 at 7:09 PM, Estrada
.
Regards
Mike
Von:Arjun Kumar Reddy charjunkumar.re...@iiitb.net
An: user@nutch.apache.org
Datum: 26.01.2011 15:43
Betreff:Re: Few questions from a newbie
I am developing an application based on twitter feeds...so 90% of the
url's
will be short urls.
So, it is difficult for me
a newbie
I am developing an application based on twitter feeds...so 90% of the
url's
will be short urls.
So, it is difficult for me to manually convert all these urls to actual
urls. Do we have any other solution for this?
Thanks and regards,
Arjun Kumar Reddy
On Wed, Jan 26, 2011 at 7
hello
you have to use the short url APIs and get the long URLs... its abit
complex as you have to determine the url if its short, then determine the
url shortening service used eg: tinyurl.com bit.ly or goo.gl and then you
use their respective api and send in the url and they will return the long
Yea Hi Mambe,
Thanks for the feedback. I have mentioned the details of my application in
the above post.
I have tried doing this crawling job using php-multi curl and I am getting
results which are good enough but the problem I am facing is that it is
taking hell lot of time to get the contents
you can put fetch external and internal links to false and increase depth.
-Original Message-
From: Churchill Nanje Mambe mambena...@afrovisiongroup.com
To: user user@nutch.apache.org
Sent: Wed, Jan 26, 2011 8:03 am
Subject: Re: Few questions from a newbie
even if the url
.
-Original Message-
From: Charan K charan.ku...@gmail.com
To: user user@nutch.apache.org
Cc: user user@nutch.apache.org
Sent: Mon, Jan 24, 2011 8:24 pm
Subject: Re: Few questions from a newbie
Refer NutchBean.java for the their question. You can run than from
command
line
charan.ku...@gmail.com
To: user user@nutch.apache.org
Cc: user user@nutch.apache.org
Sent: Mon, Jan 24, 2011 8:24 pm
Subject: Re: Few questions from a newbie
Refer NutchBean.java for the their question. You can run than from
command
line
to test the index
Hi all,
I am very new to Nutch and Lucene as well. I am having few questions about
Nutch, I know they are very much basic but I could not get clear cut answers
out of googling for this. The questions are,
- If I have to crawl just 5-6 web sites or URL's should I use intranet
crawl or
cz i m also a newbie
Best of luck with nutch learning
On Mon, Jan 24, 2011 at 9:04 PM, .: Abhishek :. ab1s...@gmail.com wrote:
Hi all,
I am very new to Nutch and Lucene as well. I am having few questions about
Nutch, I know they are very much basic but I could not get clear cut
answers
m also a newbie
Best of luck with nutch learning
On Mon, Jan 24, 2011 at 9:04 PM, .: Abhishek :. ab1s...@gmail.com wrote:
Hi all,
I am very new to Nutch and Lucene as well. I am having few questions about
Nutch, I know they are very much basic but I could not get clear cut
answers
out
questions from a newbie
How to use solr to index nutch segments?
What is the meaning of db.fetcher.interval? Does this mean that if I run the
same crawl command before 30 days it will do nothing?
Thanks.
Alex.
-Original Message-
From: Charan K charan.ku...@gmail.com
To: user user
34 matches
Mail list logo