I think you will find that this architecture is quite common. What commercial packages provide (remember you are getting this for free!) are the tools for managing the dynamic
export of data out of your database into the full-text search engine.

Solr provides a very easy way to do this, but yes, you have to do some programming
to automate it.

Two common ways of doing this. 1) write a component that periodically checks for new/updated database content and submits it to solr. 2) write a trigger in the database that immediately posts to solr (I would use JMS or some other asynchronous messaging
system for this).  I'm sure there are other solutions.

When/if MYSQL full text search is as good as solr/lucene, you can cut out one of the steps.

I could see a component added to solr that did #1 above for you. MG4j has a simple loader that takes a SQL query and indexes the result (JdbcDocumentCollection). For Solr, you'd want to be able to handle muti-valued fields, which complicates things.

If this architecture bothers technical folks, they either are accustomed to using very
expensive software, or haven't been doing this very long.

Of course, I am trying to figure out a way to make Solr more like a database, so there
you go...

--Joachim

Tim Archambault wrote:

Okay, I'll use an example.

A recruitment (jobs) customer goes onto our website and posts an online job posting to our newspaper website. Upon insert into the database, I need to
generate an xml file to be sent to SOLR to ADD as  a record to the search
engine. Same  goes for an edit, my database updates the record and then I
have to send an ADD statement to Solr again to commit my change. 2x the
work.

I've been talking with other papers about Solr and I think what bothers many
is that there a is a deposit of information in a structured database here
[named A], then we have another set of basically the same data over here
[named B] and they don't understand why they have to manage to different
sets of data [A & B] that are virtually the same thing.  Many foresee a
maintenance nightmare. I've come to the conclusion that there's somewhat of a disconnect between what a database does and what a search engine does. I accept that the redundancy is necessary given the very different tasks that each performs [keep in mind I'm still naive to the programming details here,
I understand conceptually].

In writing this to you another thought came to mind. Maybe there are
alternative ways to inject records into Solr outside the bounds of the
cygwin and CURL examples I've been using. Maybe that is the question we need
to be asking. What are some alternative ways to populate Solr?

Enough said, it's Friday afternoon.

Have a great weekend.

Tim

On 9/22/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:



On Sep 22, 2006, at 2:45 PM, Tim Archambault wrote:
> I believe there's a way to access MSSQL, MySQL etc. directly with
> Lucene,
> but not sure how to do this with SOLR.

Nope.  Lucene is a pure search engine, with no hooks to databases, or
document parsers, etc.  Lots of folks have built these kinds of
things on top of Lucene, but the Lucene core is purely the text engine.

How would you envision communicating with Solr with a database in the
picture?   How would the entire database be initially indexed?  How
would changes to the database trigger Solr updates?   I'm not quite
clear on what it would mean for Solr to work with a database directly
so I'm curious.

        Erik




Reply via email to