Re: Newbie

2015-02-08 Thread Mattmann, Chris A (3980)
te: Sunday, February 8, 2015 at 10:56 AM To: "user-ow...@nutch.apache.org" Subject: Newbie >I am new to Nutch - is there a learning resource anywhere? > >Thanks >Trevor > >

[Nutch-newbie] Installation error

2013-05-17 Thread Shah, Nishant
Hi everyone, This is my first post so apologies if this is not the correct question to ask. I have followed the wiki tutorial and I am getting the below error. I am running in the local mode and don't have hadoop installed. Can you please help as I have no clue what's going wrong. Thanks. Nish

Newbie: No search result

2011-05-04 Thread Roberto
Hello everyone, I'm a newbie to nutch... sorry if the question is silly... I've installed Nutch according to the steps of the official tutorial. Everything seems ok, and the crawl completes (just with some error on specific pages), but I cannot get any result through the browser s

Newbie Question, hadoop error?

2016-06-13 Thread Jamal, Sarfaraz
Hi Guys, I am attempting to run nutch using cygwin, and I am having the following problem: Ps. I added Hadoop-core to the lib folder already - I appreciate any insight or comment you guys may have - $ bin/crawl -i urls/ TestCrawl/ 2 Injecting seed URLs /cygdrive/c/apache-nutch-1.11/bin/nutch i

Few questions from a newbie

2011-01-24 Thread .: Abhishek :.
Hi all, I am very new to Nutch and Lucene as well. I am having few questions about Nutch, I know they are very much basic but I could not get clear cut answers out of googling for this. The questions are, - If I have to crawl just 5-6 web sites or URL's should I use intranet crawl or whole

RE: Newbie: No search result

2011-05-04 Thread McGibbney, Lewis John
done with Tika (amongst others) Lewis From: Roberto [rmez...@infinito.it] Sent: 04 May 2011 11:36 To: user@nutch.apache.org Subject: Newbie: No search result Hello everyone, I'm a newbie to nutch... sorry if the question is silly... I've installed Nutch according to the

Re: Newbie: No search result

2011-05-04 Thread Roberto
Thank you very much Lewis! It seems ok now... The property "searcher.dir" was not set at all (The tutorial do not mention it...). I edited /var/lib/tomcat6/webapps/nutch/WEB-INF/classes/nutch-site.xml this way: searcher.dir file:/var/apache-nutch-1.1-bin/crawl.test/ the "file:" pref

RE: Newbie: No search result

2011-05-04 Thread McGibbney, Lewis John
>Now nutch web search works, but just for one of two sites configured Just to clarify, are you saying that the pages you configured have been fetched, processed and indexed but do not feature when you submit a query or that Nutch is failing to fetch one site when you are crawling? >moreover

Re: Newbie: No search result

2011-05-04 Thread Roberto
Ok, everything seems to work now. I've just created four separated 'conf' and 'url' files (two sites with two language version each) and four tomcat nutch instances, following this guide: http://wiki.apache.org/nutch/GettingNutchRunningWithDebian Thank you again for your help!

Re-Crawling Basic Syntax - newbie

2015-09-30 Thread Muhamad Muchlis
Hi, I have manual script for my first crawl, anyone can explain this command step by step: *Initialize the crawldb* bin/nutch inject urls/ *Generate URLs from crawldb* bin/nutch generate -topN 80 *Fetch generated URLs* bin/nutch fetch -all *Parse fetched URLs* bin/nutch parse -all *Update databas

Re: Newbie Question, hadoop error?

2016-06-15 Thread Lewis John Mcgibbney
Hi Sas, See response inline :) On Wed, Jun 15, 2016 at 5:36 AM, wrote: > From: "Jamal, Sarfaraz" > To: "'user@nutch.apache.org'" > Cc: > Date: Mon, 13 Jun 2016 17:36:44 -0400 > Subject: Newbie Question, hadoop error? > Hi Guys, > > I am

Newbie Nutch/Solr Question(s)

2016-07-15 Thread Jamal, Sarfaraz
Hi Guy, I have nutch 'working' relatively, and I am now ready to index it to solr. I already have a solr environment up and running and now wish to index a few websites. I have read through the documentation and I believe I have to do something like this: Instead of this: "cp ${NUTCH_RUNTIME_

Re: Few questions from a newbie

2011-01-24 Thread Amna Waqar
cz i m also a newbie Best of luck with nutch learning On Mon, Jan 24, 2011 at 9:04 PM, .: Abhishek :. wrote: > Hi all, > > I am very new to Nutch and Lucene as well. I am having few questions about > Nutch, I know they are very much basic but I could not get clear cut > a

Re: Few questions from a newbie

2011-01-24 Thread Charan K
tes,u can use both cases but intranet crawl > gives u more control and speed > 2.After the first crawl,the recrawling the same sites time is 30 days by > default in db.fetcher.interval,you can change it according to ur own > convenience. > 3.I ve no idea about the third question >

Re: Few questions from a newbie

2011-01-24 Thread alxsss
Subject: Re: Few questions from a newbie Refer NutchBean.java for the their question. You can run than from command line to test the index. If you use SOLR indexing, it is going to be much simpler, they have a solr java client.. Sent from my iPhone On Jan 24, 2011, at 8:07 PM, Amna

RE: Few questions from a newbie

2011-01-24 Thread Chris Woolum
questions from a newbie How to use solr to index nutch segments? What is the meaning of db.fetcher.interval? Does this mean that if I run the same crawl command before 30 days it will do nothing? Thanks. Alex. -Original Message- From: Charan K To: user Cc: user Sent: Mon, Jan 24

Re: Few questions from a newbie

2011-01-24 Thread charan kumar
ing of db.fetcher.interval? Does this mean that if I run > the same crawl command before 30 days it will do nothing? > > Thanks. > Alex. > > > > > > > > > > > -Original Message- > From: Charan K > To: user > Cc: user > Sent: Mon, Jan 24, 2011

Re: Few questions from a newbie

2011-01-25 Thread .: Abhishek :.
ll do nothing? > > > > Thanks. > > Alex. > > > > > > > > > > > > > > > > > > > > > > -Original Message- > > From: Charan K > > To: user > > Cc: user > > Sent: Mon, Jan 24, 2011 8:24 pm > > Subject: Re: Few questions from a ne

Re: Few questions from a newbie

2011-01-25 Thread Markus Jelsma
> > > Thanks. > > > Alex. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -Original Message- > > > From: Charan K > > > To: user

Re: Few questions from a newbie

2011-01-25 Thread .: Abhishek :.
l command before 30 days it will do nothing? > > > > > > > > Thanks. > > > > Alex. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >

Re: Few questions from a newbie

2011-01-26 Thread Julien Nioche
d in the > last > > > > 30 days will not be fetched. Or A URL is eligible for > refetch > > > > only after 30 days of last crawl. > > > > > > > > On Mon, Jan 24, 2011 at 9:23 PM, wrote: > > > > > How to use solr to index nutch segments? > > > > > What is the meaning of db.fetcher.interval? Does this mean that if > I > > &

Re: Few questions from a newbie

2011-01-26 Thread .: Abhishek :.
your time! > > > > > > > > On Tue, Jan 25, 2011 at 2:33 PM, charan kumar < > charan.ku...@gmail.com > > > >wrote: > > > > > db.fetcher.interval : It means that URLS which were fetched in the > > last > > > > > 30 days will not be fetched. Or A URL is eligible for > > refetch > >

RE: Few questions from a newbie

2011-01-26 Thread McGibbney, Lewis John
e examples are fairly dated now. Hope this helps From: .: Abhishek :. [ab1s...@gmail.com] Sent: 26 January 2011 03:02 To: markus.jel...@openindex.io Cc: user@nutch.apache.org Subject: Re: Few questions from a newbie Thanks a bunch Markus. By the way, is

Re: Few questions from a newbie

2011-01-26 Thread Arjun Kumar Reddy
Hi list, I have given the set of urls as http://is.gd/Jt32Cf http://is.gd/hS3lEJ http://is.gd/Jy1Im3 http://is.gd/QoJ8xy http://is.gd/e4ct89 http://is.gd/WAOVmd http://is.gd/lhkA69 http://is.gd/3OilLD . 43 such urls And I have run the crawl command bin/nutch crawl urls/ -dir crawl -depth 3

Re: Few questions from a newbie

2011-01-26 Thread Estrada Groups
You probably have to literally click on each URL to get the URL it's referencing. Those are URL shorteners and probably won't play nicely with a crawler because of the redirection. Adam Sent from my iPhone On Jan 26, 2011, at 8:02 AM, Arjun Kumar Reddy wrote: > Hi list, > > I have given t

Re: Few questions from a newbie

2011-01-26 Thread Arjun Kumar Reddy
I am developing an application based on twitter feeds...so 90% of the url's will be short urls. So, it is difficult for me to manually convert all these urls to actual urls. Do we have any other solution for this? Thanks and regards, Arjun Kumar Reddy On Wed, Jan 26, 2011 at 7:09 PM, Estrada Gr

Re: Few questions from a newbie

2011-01-26 Thread Churchill Nanje Mambe
hello you have to use the short url APIs and get the long URLs... its abit complex as you have to determine the url if its short, then determine the url shortening service used eg: tinyurl.com bit.ly or goo.gl and then you use their respective api and send in the url and they will return the long

Re: Few questions from a newbie

2011-01-26 Thread Arjun Kumar Reddy
Yea Hi Mambe, Thanks for the feedback. I have mentioned the details of my application in the above post. I have tried doing this crawling job using php-multi curl and I am getting results which are good enough but the problem I am facing is that it is taking hell lot of time to get the contents of

Re: Few questions from a newbie

2011-01-26 Thread Churchill Nanje Mambe
even if the url being crawled is shortened, it will still lead nutch to the actual link and nutch will fetch it Churchill Nanje Mambe 237 77545907, AfroVisioN Founder, President,CEO www.camerborn.com/mambenanje http://www.afrovisiongroup.com | http://mambenanje.blogspot.com skypeID: mambenanje www

Re: Few questions from a newbie

2011-01-26 Thread Churchill Nanje Mambe
even if the url being crawled is shortened, it will still lead nutch to the actual link and nutch will fetch it

Re: Few questions from a newbie

2011-01-26 Thread alxsss
you can put fetch external and internal links to false and increase depth. -Original Message- From: Churchill Nanje Mambe To: user Sent: Wed, Jan 26, 2011 8:03 am Subject: Re: Few questions from a newbie even if the url being crawled is shortened, it will still lead nutch

Newbie trouble - Hbase class not found

2016-05-07 Thread diego gullo
I am trying Nutch for the first time. I created an automated docker setup to load Nutch 2 + Hbase (i had tried cassandra but could not get it to work so i thought i start with Hbase to give it a try) The project is available at https://github.com/bizmate/nutch and with docker compose you can start

RE: Newbie Nutch/Solr Question(s)

2016-07-18 Thread Markus Jelsma
Hi Jamal - don't use managed schema with Solr 6.0 and/or 6.1. Just copy over the schema Nutch provides and you are good to go. Markus -Original message- > From:Jamal, Sarfaraz > Sent: Friday 15th July 2016 15:47 > To: user@nutch.apache.org > Subject: Newbie Nutc

Antwort: Re: Few questions from a newbie

2011-01-26 Thread Mike Zuehlke
. Regards Mike Von:Arjun Kumar Reddy An: user@nutch.apache.org Datum: 26.01.2011 15:43 Betreff:Re: Few questions from a newbie I am developing an application based on twitter feeds...so 90% of the url's will be short urls. So, it is difficult for me to manually convert all

Re: Re: Few questions from a newbie

2011-01-26 Thread Arjun Kumar Reddy
). > I think there are four redirects needed to get the given url content. So > you have to increase the depth for your crawling. > > Regards > Mike > > > > > Von:Arjun Kumar Reddy > An: user@nutch.apache.org > Datum: 26.01.2011 15:43 > Betreff:

Another question from a meta tag newbie

2011-01-31 Thread Joshua J Pavel
I've been crawling the user groups, and I feel like Nutch can do this by default, but I just can't seem to crack it. I want to grab meta tags from indexed pages and insert them in the database. Specifically, I'll have some meta tags that identity the type of content on the page, so that I can g

Re: Newbie trouble - Hbase class not found

2016-05-09 Thread Lewis John Mcgibbney
Hi Diego, On Mon, May 9, 2016 at 2:32 AM, wrote: > > From: diego gullo > To: user@nutch.apache.org > Cc: > Date: Sat, 7 May 2016 09:41:00 +0100 > Subject: Newbie trouble - Hbase class not found > I am trying Nutch for the first time. I created an automated docker setu

Re: Newbie trouble - Hbase class not found

2016-05-09 Thread diego gullo
gt; > On Mon, May 9, 2016 at 2:32 AM, wrote: > > > > > From: diego gullo > > To: user@nutch.apache.org > > Cc: > > Date: Sat, 7 May 2016 09:41:00 +0100 > > Subject: Newbie trouble - Hbase class not found > > I am trying Nutch for the first time. I crea

Re: Newbie trouble - Hbase class not found

2016-05-15 Thread diego gullo
> wrote: > >> Hi Diego, >> >> On Mon, May 9, 2016 at 2:32 AM, >> wrote: >> >> > >> > From: diego gullo >> > To: user@nutch.apache.org >> > Cc: >> > Date: Sat, 7 May 2016 09:41:00 +0100 >> > Subject: Newbie t

Re: Newbie trouble - Hbase class not found

2016-05-16 Thread Lewis John Mcgibbney
Hi Diego, The PR at https://github.com/apache/nutch/pull/111 will solve your issue. Thanks On Mon, May 16, 2016 at 11:40 AM, wrote: > > From: diego gullo > To: user@nutch.apache.org > Cc: > Date: Sun, 15 May 2016 20:04:05 +0100 > Subject: Re: Newbie trouble - Hbase class no

RE: [E] Re: Newbie Question, hadoop error?

2016-06-16 Thread Jamal, Sarfaraz
sspath? Is it an environment variable? Thanks, Sas -Original Message- From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com] Sent: Wednesday, June 15, 2016 11:46 PM To: user@nutch.apache.org Subject: [E] Re: Newbie Question, hadoop error? Hi Sas, See response inline :) On W

RE: [E] Re: Newbie Question, hadoop error?

2016-06-16 Thread Jamal, Sarfaraz
M To: user@nutch.apache.org Subject: [E] Re: Newbie Question, hadoop error? Hi Sas, See response inline :) On Wed, Jun 15, 2016 at 5:36 AM, wrote: > From: "Jamal, Sarfaraz" > To: "'user@nutch.apache.org'" > Cc: > Date: Mon, 13 Jun 2016 17:36:44 -0400 > Subj

Newbie question about non-trunk plug-in locations

2011-11-29 Thread John Dhabolt
Hi, So I'm looking to add standard keyword and description metadata to my index. I'm referencing NUTCH-809 (https://issues.apache.org/jira/browse/NUTCH-809) and it includes a patch file that appears to be for a file in the source at the following location: src/plugin/index-metatags/src/java/at

Re: Newbie question about non-trunk plug-in locations

2011-11-29 Thread Faruk Berksöz
The issue is still open.As a result of this the patch file was not applied to any version. Faruk 2011/11/29 John Dhabolt > Hi, > > So I'm looking to add standard keyword and description metadata to my > index. I'm referencing NUTCH-809 ( > https://issues.apache.org/jira/browse/NUTCH-809) and it

Fw: Newbie question about non-trunk plug-in locations

2011-11-29 Thread John Dhabolt
Whoops, forgot to reply all and left the mailing list out of my response. - Forwarded Message - From: John Dhabolt To: Faruk Berksöz Sent: Tuesday, November 29, 2011 4:59 PM Subject: Re: Newbie question about non-trunk plug-in locations Hi Frank, Thank you for the reply. Is the

Re: Fw: Newbie question about non-trunk plug-in locations

2011-11-30 Thread Elisabeth Adler
Re: Newbie question about non-trunk plug-in locations Hi Frank, Thank you for the reply. Is the original file(s) available somewhere that I can download and apply the patch to? Since there was a discussion about something that appears to be broken in the current version without the patch, I was j