Re: Integrate nutch with solr

2018-10-23 Thread Elizabeth Haubert
Hi Dinesh, This article <https://opensourceconnections.com/blog/2014/05/24/crawling-with-nutch/> is quite old (Nutch 1.x, Solr 4.x), but the high-level steps are still pretty much the same: get your java set up, kick off a Solr <http://lucene.apache.org/solr/guide/7_5/installing-

Re: Integrate nutch with solr

2018-10-22 Thread Stephen Bianamara
To highlight Shawn's point, nutch leverages SOLR. That means that nutch defines the integration and is responsibile for providing their documentation. On Mon, Oct 22, 2018, 4:14 PM Atita Arora wrote: > and > https://lobster1234.github.io/2017/08/14/search-with-nutch-mongodb-solr/ >

Re: Integrate nutch with solr

2018-10-22 Thread Atita Arora
and https://lobster1234.github.io/2017/08/14/search-with-nutch-mongodb-solr/ On Tue, Oct 23, 2018 at 1:12 AM Atita Arora wrote: > I think this should be kind of useful : > > > https://blog.building-blocks.com/building-a-search-engine-with-nutch-and-solr-in-10-minutes/ > > I i

Re: Integrate nutch with solr

2018-10-22 Thread Atita Arora
I think this should be kind of useful : https://blog.building-blocks.com/building-a-search-engine-with-nutch-and-solr-in-10-minutes/ I integrated Aperture with Solr way back in 2008. On Mon, Oct 22, 2018 at 11:27 PM Dinesh Sundaram wrote: > Thanks Shawn for the reply, yes I do have s

Re: Integrate nutch with solr

2018-10-22 Thread Shawn Heisey
On 10/22/2018 3:26 PM, Dinesh Sundaram wrote: Thanks Shawn for the reply, yes I do have some questions on the solr too. can you please share the steps for solr side to integate the nutch or no steps are needed in solr? Since I have no idea what has to happen on the nutch side, I really can't

Re: Integrate nutch with solr

2018-10-22 Thread Dinesh Sundaram
Thanks Shawn for the reply, yes I do have some questions on the solr too. can you please share the steps for solr side to integate the nutch or no steps are needed in solr? On Thu, Oct 18, 2018 at 8:35 PM Shawn Heisey wrote: > On 10/18/2018 12:35 PM, Dinesh Sundaram wrote: > > Can you please

Re: Integrate nutch with solr

2018-10-18 Thread Shawn Heisey
On 10/18/2018 12:35 PM, Dinesh Sundaram wrote: Can you please share the steps to integrate nutch 2.3.1 with solrcloud 7.1.0. You will need to speak to the nutch project about how to configure their software to interact with Solr.  If you have questions about Solr itself, we can answer those.

Integrate nutch with solr

2018-10-18 Thread Dinesh Sundaram
Hi Team, Can you please share the steps to integrate nutch 2.3.1 with solrcloud 7.1.0. Thanks, Dinesh Sundaram

RE: crawling all links of same domain in nutch in solr

2014-07-29 Thread Markus Jelsma
domain in nutch in solr Hi, Can anyone tel me how to crawl all other pages of same domain. For example i'm feeding a website http://www.techcrunch.com/ in seed.txt. Following property is added in nutch-site.xml property namedb.ignore.internal.links/name valuefalse/value

crawling all links of same domain in nutch in solr

2014-07-28 Thread Vivekanand Ittigi
Hi, Can anyone tel me how to crawl all other pages of same domain. For example i'm feeding a website http://www.techcrunch.com/ in seed.txt. Following property is added in nutch-site.xml property namedb.ignore.internal.links/name valuefalse/value descriptionIf true, when adding new links

Re: nutch 1.4, solr 3.4 configuration error

2013-06-07 Thread Tuğcem Oral
I had a similar error. I couldn't find any documentation which nutch and solr versions are compatible. For instance, we' re using nutch 1.6 on hadoop 1.0.4 with solrj 3.4.0 and index crawled segments to solr 4.2.0. But I remember that I could find a compatible version of solrj for nutch 1.4

nutch 1.4, solr 3.4 configuration error

2013-06-06 Thread Isaac Stennett
I am trying to configure nutch 1.4 with solr 3.4. I configured everything and when I run the command: ./nutch crawl urls -dir myCrawl2 -solr http://localhost:8080 -depth 2 -topN 2 I get the following error: java.io.IOException: Job failed! SolrDeleteDuplicates: starting at 2013-06-06 15:49:30

Re: nutch 1.4, solr 3.4 configuration error

2013-06-06 Thread bbarani
can you check if you have correct solrj client library version in both nutch and Solr server. -- View this message in context: http://lucene.472066.n3.nabble.com/nutch-1-4-solr-3-4-configuration-error-tp4068724p4068733.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: nutch 1.4, solr 3.4 configuration error

2013-06-06 Thread Chris Hostetter
information: : 1) The solr admin screen comes up fine in the browser. At which URL does the Solr admin screen come up fine in your browser? Best guess... 1) you have solr installed such that it uses the webcontext /solr but you gave the wrong url to nutch (ie: try -solr http://localhost:8080/solr

Re: duplicated URL sent from Nutch to solr index

2012-12-03 Thread Xi Shen
to confirm an expected behavior of solr: Assuming we have uniqueKeyid/uniqueKey in schema.xml for solr, when we send the same URL from nutch to solr multiple times. would there be ONLY ONE entry for that URL, but the content (if changed) and timestamp would be updated

Re: duplicated URL sent from Nutch to solr index

2012-12-02 Thread Xi Shen
we have uniqueKeyid/uniqueKey in schema.xml for solr, when we send the same URL from nutch to solr multiple times. would there be ONLY ONE entry for that URL, but the content (if changed) and timestamp would be updated? Thanks! Joe -- Regards, David Shen http://about.me/davidshen

Re: duplicated URL sent from Nutch to solr index

2012-12-02 Thread Joe Zhang
wrote: Dear list, I just want to confirm an expected behavior of solr: Assuming we have uniqueKeyid/uniqueKey in schema.xml for solr, when we send the same URL from nutch to solr multiple times. would there be ONLY ONE entry for that URL, but the content (if changed) and timestamp

Re: duplicated URL sent from Nutch to solr index

2012-12-02 Thread Joe Zhang
the same URL from nutch to solr multiple times. would there be ONLY ONE entry for that URL, but the content (if changed) and timestamp would be updated? Thanks! Joe -- Regards, David Shen http://about.me/davidshen https://twitter.com/#!/davidshen84

Re: nutch and solr

2012-02-27 Thread alessio crisantemi
now, all works! I have another problem If I use a conector with my solr-nutch. this is the error: Grave: java.lang.RuntimeException: org.apache.lucene.index.CorruptIndexException: Unknown format version: -11 at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068

Re: nutch and solr

2012-02-25 Thread alessio crisantemi
is becayse nutch is unable to find a url in the url location that you provide. Kindly ensure there is a url there. -- View this message in context: http://lucene.472066.n3.nabble.com/nutch-and-solr-tp3765166p3773089.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: nutch and solr

2012-02-24 Thread tamanjit.bin...@yahoo.co.in
The empty path message is becayse nutch is unable to find a url in the url location that you provide. Kindly ensure there is a url there. -- View this message in context: http://lucene.472066.n3.nabble.com/nutch-and-solr-tp3765166p3773089.html Sent from the Solr - User mailing list archive

Re: nutch and solr

2012-02-22 Thread alessio crisantemi
will be for different domains. So for each domain folder in urls folder there has to be a corresponding folder (with the same name) in the crawl folder. -- View this message in context: http://lucene.472066.n3.nabble.com/nutch-and-solr-tp3765166p3765607.html Sent from the Solr - User mailing list

nutch and solr

2012-02-21 Thread alessio crisantemi
I try to configured nutch (1.4) on my solr 3.2 But when I try with a crawl command bin/nutch inject crawl/crawldb urls don't works, and it reply with can't convert a empty path why, in your opinion? tx a.

Re: nutch and solr

2012-02-21 Thread tamanjit.bin...@yahoo.co.in
-- folder name The folder name will be for different domains. So for each domain folder in urls folder there has to be a corresponding folder (with the same name) in the crawl folder. -- View this message in context: http://lucene.472066.n3.nabble.com/nutch-and-solr-tp3765166p3765607.html Sent from

nutch in solr

2012-02-05 Thread alessio crisantemi
Hi All, I have some problems with integration of Nutch in Solr and Tomcat. I follo Nutch tutorial for integration and now, I can crawl a website: all works right. But It I try the solr integration, I can't indexing on Solr. follow the nutch output after the command: bin/nutch crawl urls -solr

Re: nutch in solr

2012-02-05 Thread Matthew Parker
Doesn't tomcat run on port 8080, and not port 8983? Or did you change the tomcat's default port to 8983? On Feb 5, 2012 5:17 AM, alessio crisantemi alessio.crisant...@gmail.com wrote: Hi All, I have some problems with integration of Nutch in Solr and Tomcat. I follo Nutch tutorial

Re: nutch in solr

2012-02-05 Thread tamanjit.bin...@yahoo.co.in
alessio crisantemi-2, I think you got it.. Check the jars in nutch lib and see if the solr n solrj jars are same... That could be the issue -- View this message in context: http://lucene.472066.n3.nabble.com/nutch-in-solr-tp3716969p3717542.html Sent from the Solr - User mailing list archive

Re: nutch in solr

2012-02-05 Thread alessio crisantemi
some problems with integration of Nutch in Solr and Tomcat. I follo Nutch tutorial for integration and now, I can crawl a website: all works right. But It I try the solr integration, I can't indexing on Solr. follow the nutch output after the command: bin/nutch crawl urls -solr http

Re: nutch in solr

2012-02-05 Thread Matthew Parker
in Solr and Tomcat. I follo Nutch tutorial for integration and now, I can crawl a website: all works right. But It I try the solr integration, I can't indexing on Solr. follow the nutch output after the command: bin/nutch crawl urls -solr http://127.0.0.1:8983/solr/ -depth 3

Re: nutch in solr

2012-02-05 Thread Geek Gamer
looks like solrj version in nutch classpath is different that the solr version on server, can you post the versions for both nutch and solr? On Sun, Feb 5, 2012 at 10:24 PM, alessio crisantemi alessio.crisant...@gmail.com wrote: no, all run on port 8983. .. 2012/2/5 Matthew Parker mpar

Re: nutch in solr

2012-02-05 Thread alessio crisantemi
on server, can you post the versions for both nutch and solr? On Sun, Feb 5, 2012 at 10:24 PM, alessio crisantemi alessio.crisant...@gmail.com wrote: no, all run on port 8983. .. 2012/2/5 Matthew Parker mpar...@apogeeintegration.com Doesn't tomcat run on port 8080, and not port

Re: nutch in solr

2012-02-05 Thread Geek Gamer
solj is the solr java client library, so there seem to be two versions 1.4.1 and 3.4.0, which are incompatible, so you can do the following, refer : https://github.com/geek4377/nutch/commit/c66bf35ff4f86393413621b3b889b1c78281df4d to see how to upgrade the solr version in nutch, teh above

Re: nutch in solr

2012-02-05 Thread alessio crisantemi
/c66bf35ff4f86393413621b3b889b1c78281df4d to see how to upgrade the solr version in nutch, teh above example replaces solr 1.4.0 with 3.1.0. On Sun, Feb 5, 2012 at 11:02 PM, alessio crisantemi alessio.crisant...@gmail.com wrote: if I look the solr and nuth libs I found: apache-solr-solrj-1.4.1.jar on Solr

Re: nutch 1.2, solr 3.3, tomcat6. java.io.IOException: Job failed! problem when building solrindex

2011-07-13 Thread Geek Gamer
...@zudiewiener.com wrote: I'm running 64bit Ubuntu 11.04, nutch 1.2, solr 3.3 (downloaded, not built) and tomcat6 following this (and some other) links http://wiki.apache.org/nutch/RunningNutchAndSolr I have added the nutch schema and can access/view this schema via the admin page. nutch also works as I can

Re: nutch 1.2, solr 3.3, tomcat6. java.io.IOException: Job failed! problem when building solrindex

2011-07-13 Thread Leo Subscriptions
hope that helps, On Wed, Jul 13, 2011 at 8:58 AM, Leo Subscriptions llsub...@zudiewiener.com wrote: I'm running 64bit Ubuntu 11.04, nutch 1.2, solr 3.3 (downloaded, not built) and tomcat6 following this (and some other) links http://wiki.apache.org/nutch/RunningNutchAndSolr I have

Re: nutch 1.2, solr 3.3, tomcat6. java.io.IOException: Job failed! problem when building solrindex

2011-07-13 Thread Markus Jelsma
If you're using Solr anyway, you'd better upgrade to Nutch 1.3 with Solr 3.x support. Works like a charm. Thanks, Leo On Wed, 2011-07-13 at 11:31 +0530, Geek Gamer wrote: you need to update the solrj libs to 3.x version. the java bin format has changed . I made the change a few

nutch 1.2, solr 3.3, tomcat6. java.io.IOException: Job failed! problem when building solrindex

2011-07-12 Thread Leo Subscriptions
I'm running 64bit Ubuntu 11.04, nutch 1.2, solr 3.3 (downloaded, not built) and tomcat6 following this (and some other) links http://wiki.apache.org/nutch/RunningNutchAndSolr I have added the nutch schema and can access/view this schema via the admin page. nutch also works as I can perfrom

Apache Nutch and Solr Integration

2011-07-05 Thread serenity keningston
Hello Friends, I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and Solr 3.2 . I did the steps explained in the following two URL's : http://wiki.apache.org/nutch/RunningNutchAndSolr http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html I

Re: Apache Nutch and Solr Integration

2011-07-05 Thread Way Cool
Can you let me know when and where you were getting the error? A screen-shot will be helpful. On Tue, Jul 5, 2011 at 8:15 AM, serenity keningston serenity.kenings...@gmail.com wrote: Hello Friends, I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and Solr 3.2 . I did

Re: Apache Nutch and Solr Integration

2011-07-05 Thread Markus Jelsma
You are using the crawl job so you must specify the URL to your Solr instance. The newly updated wiki has you answer: http://wiki.apache.org/nutch/bin/nutch_crawl Hello Friends, I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and Solr 3.2 . I did the steps explained

Re: Apache Nutch and Solr Integration

2011-07-05 Thread serenity keningston
: Hello Friends, I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and Solr 3.2 . I did the steps explained in the following two URL's : http://wiki.apache.org/nutch/RunningNutchAndSolr http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache

Re: Apache Nutch and Solr Integration

2011-07-05 Thread serenity keningston
to integrate Apache Nutch 1.3 and Solr 3.2 . I did the steps explained in the following two URL's : http://wiki.apache.org/nutch/RunningNutchAndSolr http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html I downloaded both the softwares

Re: Nutch and Solr search on the fly

2011-02-09 Thread Markus Jelsma
after it's being parsed, you need to do it later on. On Wednesday 09 February 2011 04:29:44 .: Abhishek :. wrote: Hi all, I am a newbie to nutch and solr. Well relatively much newer to Solr than Nutch :) I have been using nutch for past two weeks, and I wanted to know if I can query

Re: Nutch and Solr search on the fly

2011-02-09 Thread .: Abhishek :.
in the seed.txt and does not proceed further from there. So I am just bit confused. Why is it not crawling the linked pages(a.html, b.html, c.html and d.html). I get a feeling that I am missing something that the author of the blog( http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/) assumed

Re: Nutch and Solr search on the fly

2011-02-09 Thread Erick Erickson
not proceed further from there. So I am just bit confused. Why is it not crawling the linked pages(a.html, b.html, c.html and d.html). I get a feeling that I am missing something that the author of the blog( http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/) assumed everyone would

Re: Nutch and Solr search on the fly

2011-02-09 Thread Markus Jelsma
am just bit confused. Why is it not crawling the linked pages(a.html, b.html, c.html and d.html). I get a feeling that I am missing something that the author of the blog( http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/) assumed everyone would know. Thanks, Abi On Wed, Feb 9

Re: Nutch and Solr search on the fly

2011-02-09 Thread .: Abhishek :.
Hi Erick, Thanks a bunch for the response Could be a chance..but all I am wondering is where to specify the depth in the whole entire process in the URL http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/? I tried specifying it during the fetcher phase but it was just ignored

Re: Nutch and Solr search on the fly

2011-02-09 Thread charan kumar
:. ab1s...@gmail.com wrote: Hi Erick, Thanks a bunch for the response Could be a chance..but all I am wondering is where to specify the depth in the whole entire process in the URL http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/? I tried specifying it during the fetcher phase

Re: Nutch and Solr search on the fly

2011-02-09 Thread .: Abhishek :.
Hi Charan, Thanks for the clarifications. The link I have been referring to( http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/) does not say anything about using the crawl? Do I have to do it after the last step mentioned? Thanks, Abi On Thu, Feb 10, 2011 at 12:58 AM, charan kumar

Re: [Nutch] and Solr integration

2011-01-03 Thread Adam Estrada
, Dec 20, 2010 at 4:21 PM, Anurag anurag.it.jo...@gmail.com wrote: why are using solrindex in the argument.? It is used when we need to index the crawled data in Solr For more read http://wiki.apache.org/nutch/NutchTutorial . Also for nutch-solr integration this is very useful blog http

Re: [Nutch] and Solr integration

2011-01-03 Thread Adam Estrada
BLEH! facepalm This is entirely possible to do in a single step AS LONG AS YOU GET THE SYNTAX CORRECT ;-) http://www.lucidimagination.com/blog/2010/09/10/refresh-using-nutch-with-solr/ http://www.lucidimagination.com/blog/2010/09/10/refresh-using-nutch-with-solr/bin/nutch crawl urls -dir crawl

[Nutch] and Solr integration

2010-12-20 Thread Adam Estrada
All, I have a couple websites that I need to crawl and the following command line used to work I think. Solr is up and running and everything is fine there and I can go through and index the site but I really need the results added to Solr after the crawl. Does anyone have any idea on how to make

Re: [Nutch] and Solr integration

2010-12-20 Thread Anurag
why are using solrindex in the argument.? It is used when we need to index the crawled data in Solr For more read http://wiki.apache.org/nutch/NutchTutorial . Also for nutch-solr integration this is very useful blog http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ I integrated nutch

Re: [Nutch] and Solr integration

2010-12-20 Thread Adam Estrada
. Also for nutch-solr integration this is very useful blog http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ I integrated nutch and solr and it works well. Thanks On Tue, Dec 21, 2010 at 1:57 AM, Adam Estrada-2 [via Lucene] ml-node+2122347-622655030-146...@n3.nabble.comml-node%2b2122347

problem with integrated nutch into solr

2010-04-06 Thread toocrazymail
hi, i'm new in using apache nutch and solr... has anyone from the list experiences in indexing nutch crawls into solr? the main problem is, that e.g. nutch crawled pdf documents (with the other stuff from the crawled site) after solr-indexing isn't queryable... e.g. query in nutch: bin

Re: Please help me integrate Nutch with Solr

2008-12-29 Thread Andrzej Bialecki
Tony Wang wrote: Thanks Otis. I've just downloaded NUTCH-442_v8.patchhttps://issues.apache.org/jira/secure/attachment/12391810/NUTCH-442_v8.patchfrom https://issues.apache.org/jira/browse/NUTCH-442, but the patching process gave me lots errors, see below: This patch will be integrated within

Re: Please help me integrate Nutch with Solr

2008-12-28 Thread Tony Wang
is CentOS 5.2 by the way. Thanks! Tony On Sun, Dec 28, 2008 at 10:18 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Tony, I think you should ignore the advice/code from foofactory blog and just go with NUTCH-442, as that's most likely going to result in the official Nutch-Solr

Re: Please help me integrate Nutch with Solr

2008-12-27 Thread Dingding Ye
understanding either Nutch or Solr. My suggestion is to first play only with Nutch and learn how to run various Nutch steps, all the way to generating an index. Then play with Solr (and forget about Nutch) by following the Solr tutorial. Once you get Solr by itself working, you will understand

Please help me integrate Nutch with Solr

2008-12-26 Thread Tony Wang
=onhl.fl=content I followed this guide to integrate Nutch with Solr http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html. I wonder what could be wrong with my integration. I use CentOS 5.2, Tomcat6 and Nutch Solr latest nightly builds. Thanks! Tony -- Signature

Re: Please help me integrate Nutch with Solr

2008-12-26 Thread Otis Gospodnetic
/ -- Lucene - Solr - Nutch - Original Message From: Tony Wang ivyt...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, December 26, 2008 11:20:06 AM Subject: Please help me integrate Nutch with Solr I got the web interface to work at here http://208.64.71.46:8080/search.jsp

Re: Please help me integrate Nutch with Solr

2008-12-26 Thread Tony Wang
wrong from the information you provided below. Are there any errors in the log? Are you sure your solr home is set correctly? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Tony Wang ivyt...@gmail.com To: solr-user

Re: Please help me integrate Nutch with Solr

2008-12-26 Thread Otis Gospodnetic
/example/solr/data/index/ directory. You will need to adjust the schema to match the Lucene/Nutch index fields, too. But honestly, it looks like you are starting from the middle without really following things step-by-step and without really understanding either Nutch or Solr. My suggestion

Re: Nutch with SOLR

2007-09-26 Thread Doğacan Güney
blog, or post it as a patch for inclusion in nutch/contrib (if sami is ok with that). If you have issues with how to use the solr client api, solr-user is here to help. I've done this. Apparently someone else has taken on the solr-nutch job and made it a bit more complicated (which is good

Re: Nutch with SOLR

2007-09-26 Thread Brian Whitman
On Sep 26, 2007, at 4:04 AM, Doğacan Güney wrote: NUTCH-442 is one of the issues that I want to really see resolved. Unfortunately, I haven't received many (as in, none) comments, so I haven't made further progress on it. I am probably your target customer but to be honest all we care about

Re: Nutch with SOLR

2007-09-25 Thread Ian Holsman
with SOLR Daniel, We just started to test/research posibility of integration of Nutch and Solr so it will be nice to hear any advices as well. Thanks, DT www.ejizn.com - Original Message - From: Daniel Clark [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, September 25, 2007 1:23 PM

Re: Nutch with SOLR

2007-09-25 Thread Brian Whitman
/contrib (if sami is ok with that). If you have issues with how to use the solr client api, solr-user is here to help. I've done this. Apparently someone else has taken on the solr-nutch job and made it a bit more complicated (which is good for the long term) than sami's original patch

Re: Nutch with SOLR

2007-09-25 Thread Brian Whitman
But we still use a version of Sami's patch that works on both trunk nutch and trunk solr (solrj.) I sent my changes to sami when we did it, if you need it let me know... I put my files up here: http://variogr.am/latest/?p=26 -b

Re: Nutch with SOLR

2007-09-25 Thread Ian Holsman
Thanks Brian. I'm sure this will help lots of people. Brian Whitman wrote: But we still use a version of Sami's patch that works on both trunk nutch and trunk solr (solrj.) I sent my changes to sami when we did it, if you need it let me know... I put my files up here: http://variogr.am