Re: Building Lucene index with Nutch 1.4

Emre Çelikten Thu, 07 Jun 2012 20:22:47 -0700

Hello again,

I managed to do it. Getting the entire thing to work was tricky. I had to
resort to a hack.


I will post how I managed to do it here soon, for people that might be
interested in the future.

Thanks again.

Best,

Emre

On Fri, Jun 8, 2012 at 12:33 AM, Emre Çelikten <e...@celikten.name> wrote:

> Hello Markus,
>
> Thanks very much for your help.
>
> I have looked at Nutch source. I think I need to make a different version
> of indexSolr method in SolrIndexer.java, yes? The current version is:
>
> public void indexSolr(String solrUrl, Path crawlDb, Path linkDb,
>       List<Path> segments, boolean noCommit, boolean deleteGone, String
> solrParams)
>
> I will try to change "String solrUrl" part to "SolrServer server" in the
> new method and use my own SolrServer that was created in the application.
> Do you think this is a correct approach?
>
> Best,
>
> Emre
>
>
> On Thu, Jun 7, 2012 at 11:27 PM, Markus Jelsma <markus.jel...@openindex.io
> > wrote:
>
>> Hello!
>>
>> Sounds very interesting. Anyway, Solr can run embedded in a Java
>> application called EmbeddedSolrServer. You do need to make some changes to
>> the SolrIndexer tools in Nutch.
>>
>> Cheers
>>
>> -----Original message-----
>> > From:Emre Çelikten <e...@celikten.name>
>> > Sent: Thu 07-Jun-2012 22:24
>> > To: user@nutch.apache.org
>> > Subject: Building Lucene index with Nutch 1.4
>> >
>> > Hello everybody,
>> >
>> > As part of a project, I am working on a FOSS tool that will build
>> language
>> > models using data obtained from the web which will then be used for
>> speech
>> > recognition. I plan to make this tool quite compact by encapsulating as
>> > much as I can in a single Java application and not requiring the user to
>> > install/configure tons of stuff.
>> >
>> > I have managed to set up Nutch and am able to crawl a website inside a
>> Java
>> > application. The next thing I need to do is to search for certain
>> keywords
>> > in the obtained data. I have read that the ability to build Lucene
>> indexes
>> > has been removed from Nutch and we now need to use Solr instead. The way
>> > Solr works (servlets, HTTP) is not really appropriate for a tool that
>> only
>> > needs search functionality that is invisible to the user.
>> >
>> > What would you recommend me to do in this case? Is there absolutely no
>> way
>> > of building Lucene indexes? I could not find anything other than
>> > recommendations to use Solr instead. Should I try to use an older
>> version
>> > of Nutch?
>> >
>> > Thanks in advance,
>> >
>> > Emre
>> >
>>
>
>

Re: Building Lucene index with Nutch 1.4

Reply via email to