RE: Newbie Nutch/Solr Question(s)

2016-07-18 Thread Markus Jelsma
utch.apache.org > Subject: Newbie Nutch/Solr Question(s) > > Hi Guy, > > I have nutch 'working' relatively, and I am now ready to index it to solr. > > I already have a solr environment up and running and now wish to index a few > websites. > > I have read through

Newbie Nutch/Solr Question(s)

2016-07-15 Thread Jamal, Sarfaraz
Hi Guy, I have nutch 'working' relatively, and I am now ready to index it to solr. I already have a solr environment up and running and now wish to index a few websites. I have read through the documentation and I believe I have to do something like this: Instead of this: "cp

RE: [E] Re: Newbie Question, hadoop error?

2016-06-16 Thread Jamal, Sarfaraz
M To: user@nutch.apache.org Subject: [E] Re: Newbie Question, hadoop error? Hi Sas, See response inline :) On Wed, Jun 15, 2016 at 5:36 AM, <user-digest-h...@nutch.apache.org> wrote: > From: "Jamal, Sarfaraz" <sarfaraz.ja...@verizonwireless.com.invalid> > To: "'user@nutc

Re: Newbie Question, hadoop error?

2016-06-15 Thread Lewis John Mcgibbney
t; Date: Mon, 13 Jun 2016 17:36:44 -0400 > Subject: Newbie Question, hadoop error? > Hi Guys, > > I am attempting to run nutch using cygwin, Is this Nutch 1.11 binary distribution you mean? > and I am having the following problem: > Ps. I added Hadoop-core to the lib folder

Re: Newbie trouble - Hbase class not found

2016-05-16 Thread Lewis John Mcgibbney
15 May 2016 20:04:05 +0100 > Subject: Re: Newbie trouble - Hbase class not found > Hi Lewis > > I have changed the build for the docker containers and in the weekend sent > the PR for the logs folder. The original problem I had is still persistent. > > To reproduce > > >

Re: Newbie trouble - Hbase class not found

2016-05-15 Thread diego gullo
wrote: > >> Hi Diego, >> >> On Mon, May 9, 2016 at 2:32 AM, <user-digest-h...@nutch.apache.org> >> wrote: >> >> > >> > From: diego gullo <diegogu...@gmail.com> >> > To: user@nutch.apache.org >> > Cc: >> >

Re: Newbie trouble - Hbase class not found

2016-05-10 Thread diego gullo
M, <user-digest-h...@nutch.apache.org> wrote: > > > > > From: diego gullo <diegogu...@gmail.com> > > To: user@nutch.apache.org > > Cc: > > Date: Sat, 7 May 2016 09:41:00 +0100 > > Subject: Newbie trouble - Hbase class not found > > I

Re: Newbie trouble - Hbase class not found

2016-05-09 Thread Lewis John Mcgibbney
Hi Diego, On Mon, May 9, 2016 at 2:32 AM, <user-digest-h...@nutch.apache.org> wrote: > > From: diego gullo <diegogu...@gmail.com> > To: user@nutch.apache.org > Cc: > Date: Sat, 7 May 2016 09:41:00 +0100 > Subject: Newbie trouble - Hbase class not found > I am t

Newbie trouble - Hbase class not found

2016-05-07 Thread diego gullo
I am trying Nutch for the first time. I created an automated docker setup to load Nutch 2 + Hbase (i had tried cassandra but could not get it to work so i thought i start with Hbase to give it a try) The project is available at https://github.com/bizmate/nutch and with docker compose you can

Re-Crawling Basic Syntax - newbie

2015-09-30 Thread Muhamad Muchlis
Hi, I have manual script for my first crawl, anyone can explain this command step by step: *Initialize the crawldb* bin/nutch inject urls/ *Generate URLs from crawldb* bin/nutch generate -topN 80 *Fetch generated URLs* bin/nutch fetch -all *Parse fetched URLs* bin/nutch parse -all *Update

Re: Newbie

2015-02-08 Thread Mattmann, Chris A (3980)
...@merrows.co.uk tre...@merrows.co.uk Date: Sunday, February 8, 2015 at 10:56 AM To: user-ow...@nutch.apache.org user-ow...@nutch.apache.org Subject: Newbie I am new to Nutch - is there a learning resource anywhere? Thanks Trevor

[Nutch-newbie] Installation error

2013-05-17 Thread Shah, Nishant
Hi everyone, This is my first post so apologies if this is not the correct question to ask. I have followed the wiki tutorial and I am getting the below error. I am running in the local mode and don't have hadoop installed. Can you please help as I have no clue what's going wrong. Thanks.

Re: Fw: Newbie question about non-trunk plug-in locations

2011-11-30 Thread Elisabeth Adler
, November 29, 2011 4:59 PM Subject: Re: Newbie question about non-trunk plug-in locations Hi Frank, Thank you for the reply. Is the original file(s) available somewhere that I can download and apply the patch to? Since there was a discussion about something that appears to be broken in the current

Newbie question about non-trunk plug-in locations

2011-11-29 Thread John Dhabolt
Hi, So I'm looking to add standard keyword and description metadata to my index. I'm referencing NUTCH-809 (https://issues.apache.org/jira/browse/NUTCH-809) and it includes a patch file that appears to be for a file in the source at the following location:

Re: Newbie question about non-trunk plug-in locations

2011-11-29 Thread Faruk Berksöz
The issue is still open.As a result of this the patch file was not applied to any version. Faruk 2011/11/29 John Dhabolt myco...@yahoo.com Hi, So I'm looking to add standard keyword and description metadata to my index. I'm referencing NUTCH-809 (

Newbie: No search result

2011-05-04 Thread Roberto
Hello everyone, I'm a newbie to nutch... sorry if the question is silly... I've installed Nutch according to the steps of the official tutorial. Everything seems ok, and the crawl completes (just with some error on specific pages), but I cannot get any result through the browser search. My

RE: Newbie: No search result

2011-05-04 Thread McGibbney, Lewis John
From: Roberto [rmez...@infinito.it] Sent: 04 May 2011 11:36 To: user@nutch.apache.org Subject: Newbie: No search result Hello everyone, I'm a newbie to nutch... sorry if the question is silly... I've installed Nutch according to the steps of the official tutorial. Everything seems ok

RE: Newbie: No search result

2011-05-04 Thread McGibbney, Lewis John
Now nutch web search works, but just for one of two sites configured Just to clarify, are you saying that the pages you configured have been fetched, processed and indexed but do not feature when you submit a query or that Nutch is failing to fetch one site when you are crawling? moreover

Re: Few questions from a newbie

2011-01-26 Thread Julien Nioche
, 2011 8:24 pm Subject: Re: Few questions from a newbie Refer NutchBean.java for the their question. You can run than from command line to test the index. If you use SOLR indexing, it is going to be much simpler, they have a solr

Re: Few questions from a newbie

2011-01-26 Thread .: Abhishek :.
. Alex. -Original Message- From: Charan K charan.ku...@gmail.com To: user user@nutch.apache.org Cc: user user@nutch.apache.org Sent: Mon, Jan 24, 2011 8:24 pm Subject: Re: Few questions from a newbie

RE: Few questions from a newbie

2011-01-26 Thread McGibbney, Lewis John
are fairly dated now. Hope this helps From: .: Abhishek :. [ab1s...@gmail.com] Sent: 26 January 2011 03:02 To: markus.jel...@openindex.io Cc: user@nutch.apache.org Subject: Re: Few questions from a newbie Thanks a bunch Markus. By the way, is there some

Re: Few questions from a newbie

2011-01-26 Thread Arjun Kumar Reddy
Hi list, I have given the set of urls as http://is.gd/Jt32Cf http://is.gd/hS3lEJ http://is.gd/Jy1Im3 http://is.gd/QoJ8xy http://is.gd/e4ct89 http://is.gd/WAOVmd http://is.gd/lhkA69 http://is.gd/3OilLD . 43 such urls And I have run the crawl command bin/nutch crawl urls/ -dir crawl -depth 3

Re: Few questions from a newbie

2011-01-26 Thread Arjun Kumar Reddy
I am developing an application based on twitter feeds...so 90% of the url's will be short urls. So, it is difficult for me to manually convert all these urls to actual urls. Do we have any other solution for this? Thanks and regards, Arjun Kumar Reddy On Wed, Jan 26, 2011 at 7:09 PM, Estrada

Antwort: Re: Few questions from a newbie

2011-01-26 Thread Mike Zuehlke
. Regards Mike Von:Arjun Kumar Reddy charjunkumar.re...@iiitb.net An: user@nutch.apache.org Datum: 26.01.2011 15:43 Betreff:Re: Few questions from a newbie I am developing an application based on twitter feeds...so 90% of the url's will be short urls. So, it is difficult for me

Re: Re: Few questions from a newbie

2011-01-26 Thread Arjun Kumar Reddy
a newbie I am developing an application based on twitter feeds...so 90% of the url's will be short urls. So, it is difficult for me to manually convert all these urls to actual urls. Do we have any other solution for this? Thanks and regards, Arjun Kumar Reddy On Wed, Jan 26, 2011 at 7

Re: Few questions from a newbie

2011-01-26 Thread Churchill Nanje Mambe
hello you have to use the short url APIs and get the long URLs... its abit complex as you have to determine the url if its short, then determine the url shortening service used eg: tinyurl.com bit.ly or goo.gl and then you use their respective api and send in the url and they will return the long

Re: Few questions from a newbie

2011-01-26 Thread Arjun Kumar Reddy
Yea Hi Mambe, Thanks for the feedback. I have mentioned the details of my application in the above post. I have tried doing this crawling job using php-multi curl and I am getting results which are good enough but the problem I am facing is that it is taking hell lot of time to get the contents

Re: Few questions from a newbie

2011-01-26 Thread alxsss
you can put fetch external and internal links to false and increase depth. -Original Message- From: Churchill Nanje Mambe mambena...@afrovisiongroup.com To: user user@nutch.apache.org Sent: Wed, Jan 26, 2011 8:03 am Subject: Re: Few questions from a newbie even if the url

Re: Few questions from a newbie

2011-01-25 Thread .: Abhishek :.
. -Original Message- From: Charan K charan.ku...@gmail.com To: user user@nutch.apache.org Cc: user user@nutch.apache.org Sent: Mon, Jan 24, 2011 8:24 pm Subject: Re: Few questions from a newbie Refer NutchBean.java for the their question. You can run than from command line

Re: Few questions from a newbie

2011-01-25 Thread Markus Jelsma
charan.ku...@gmail.com To: user user@nutch.apache.org Cc: user user@nutch.apache.org Sent: Mon, Jan 24, 2011 8:24 pm Subject: Re: Few questions from a newbie Refer NutchBean.java for the their question. You can run than from command line to test the index

Few questions from a newbie

2011-01-24 Thread .: Abhishek :.
Hi all, I am very new to Nutch and Lucene as well. I am having few questions about Nutch, I know they are very much basic but I could not get clear cut answers out of googling for this. The questions are, - If I have to crawl just 5-6 web sites or URL's should I use intranet crawl or

Re: Few questions from a newbie

2011-01-24 Thread Amna Waqar
cz i m also a newbie Best of luck with nutch learning On Mon, Jan 24, 2011 at 9:04 PM, .: Abhishek :. ab1s...@gmail.com wrote: Hi all, I am very new to Nutch and Lucene as well. I am having few questions about Nutch, I know they are very much basic but I could not get clear cut answers

Re: Few questions from a newbie

2011-01-24 Thread Charan K
m also a newbie Best of luck with nutch learning On Mon, Jan 24, 2011 at 9:04 PM, .: Abhishek :. ab1s...@gmail.com wrote: Hi all, I am very new to Nutch and Lucene as well. I am having few questions about Nutch, I know they are very much basic but I could not get clear cut answers out

RE: Few questions from a newbie

2011-01-24 Thread Chris Woolum
questions from a newbie How to use solr to index nutch segments? What is the meaning of db.fetcher.interval? Does this mean that if I run the same crawl command before 30 days it will do nothing? Thanks. Alex. -Original Message- From: Charan K charan.ku...@gmail.com To: user user