On 1/11/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:
On 1/11/07, Ryan McKinley <[EMAIL PROTECTED]> wrote:
> I'd like to be able to add/update documents from an SQL query.

Me too... it's been on the todo list a long time.
A lot of people have data in databases, and it's a shame to require
code to index their data if it can be expressed in SQL.

 If it were less common, I'd say it would be better as a standalone
app talking to Solr over XML/HTTP, but given that it's *such* a common
case, I'd support it going into the core.

I'd envision query args instead of XML though... something that could
be generated by a browser.

overwrite=true&sql=SELECT * FROM my_stats_table&etc

To keep /update a clean, perhaps updating from SQL should get its own servlet.
/updateSQL?overwrite=true&sql=SELECT * FROM my_stats_table&etc

I think connection settings and driver should be set by the request,
not through configuration.

People would need to add an sql driver (mysql-connector-java-5.0.4.jar
or whatever) to the /lib directory for anything to work.


The big question in my mind is if the database schema is simple enough
for something like this to work... esp w.r.t multi-valued fields.

I like the idea of adding a 'separator' token.  that could split a
single string into multiple fields.

Multiple values in a database may be in multiple rows... can we handle
that case somehow?

If the rows are sorted by the ID, we could keep building a document
until the ID is different from the previous one.  Multiple rows would
keep adding to the same document.

Requiring fields to be sorted by document ID seems like an ok
restriction - the alternative is to load everything into memory until
you hit the end of the result set.


What types of joins can we handle?


Anything you can pass to:
 ResultSet resultSet = stmt.executeQuery( query );

If this throws an exception, it will be passed to the user.

I can't think of any (or don't know about any) specific join functions
that would be problematic

- - - - - -

In my real use cases, documents are made from some fields that are
directly from SQL and others that have complex logic behind them.  The
SQL fields (stats) must be updated frequently and the others only
once.  This is why i also need to implement an 'update' mode.

Reply via email to