Re: Simple Faceted Searching out of the box

Joachim Martin Fri, 22 Sep 2006 13:02:58 -0700

I think you will find that this architecture is quite common. Whatcommercial packagesprovide (remember you are getting this for free!) are the tools formanaging the dynamic

export of data out of your database into the full-text search engine.

Solr provides a very easy way to do this, but yes, you have to do someprogramming

to automate it.

Two common ways of doing this. 1) write a component that periodicallychecks fornew/updated database content and submits it to solr. 2) write a triggerin the databasethat immediately posts to solr (I would use JMS or some otherasynchronous messaging

system for this).  I'm sure there are other solutions.

When/if MYSQL full text search is as good as solr/lucene, you can cutout one of the steps.

I could see a component added to solr that did #1 above for you. MG4jhas a simpleloader that takes a SQL query and indexes the result(JdbcDocumentCollection). ForSolr, you'd want to be able to handle muti-valued fields, whichcomplicates things.

If this architecture bothers technical folks, they either are accustomedto using very

expensive software, or haven't been doing this very long.

Of course, I am trying to figure out a way to make Solr more like adatabase, so there

you go...

--Joachim

Tim Archambault wrote:

Okay, I'll use an example.

A recruitment (jobs) customer goes onto our website and posts anonline jobposting to our newspaper website. Upon insert into the database, Ineed to

generate an xml file to be sent to SOLR to ADD as  a record to the search
engine. Same  goes for an edit, my database updates the record and then I
have to send an ADD statement to Solr again to commit my change. 2x the
work.

I've been talking with other papers about Solr and I think whatbothers many

is that there a is a deposit of information in a structured database here
[named A], then we have another set of basically the same data over here
[named B] and they don't understand why they have to manage to different
sets of data [A & B] that are virtually the same thing.  Many foresee a

maintenance nightmare. I've come to the conclusion that there'ssomewhat ofa disconnect between what a database does and what a search enginedoes. Iaccept that the redundancy is necessary given the very different tasksthateach performs [keep in mind I'm still naive to the programming detailshere,

I understand conceptually].

In writing this to you another thought came to mind. Maybe there are
alternative ways to inject records into Solr outside the bounds of the

cygwin and CURL examples I've been using. Maybe that is the questionwe need

to be asking. What are some alternative ways to populate Solr?

Enough said, it's Friday afternoon.

Have a great weekend.

Tim

On 9/22/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:

On Sep 22, 2006, at 2:45 PM, Tim Archambault wrote:
> I believe there's a way to access MSSQL, MySQL etc. directly with
> Lucene,
> but not sure how to do this with SOLR.

Nope.  Lucene is a pure search engine, with no hooks to databases, or
document parsers, etc.  Lots of folks have built these kinds of
things on top of Lucene, but the Lucene core is purely the text engine.

How would you envision communicating with Solr with a database in the
picture?   How would the entire database be initially indexed?  How
would changes to the database trigger Solr updates?   I'm not quite
clear on what it would mean for Solr to work with a database directly
so I'm curious.

        Erik

Re: Simple Faceted Searching out of the box

Reply via email to