: now I have simple Lucene Indexes that basically re-created once daily and : that simply isn't doing the job for about 30% of my content.
do you mean it takes to long to index all your content so you can only do part of it, or do you mean it's not indexing some of your conent "well" ? : For indexing news articles for instance, I want the article, all reader : comments, photos, links, multimedia files associated with the article to be : indexed together as one entity so that if Chris Hostetter commented on the : "high cost of heating oil in Maine" article, I can find the article by : searching on your name, etc.... this is a great example of the last 20% of the problem i was talking about ... knowing *when* to reindex a modified record, even if you have a perfect mechanism for identifing/flattening all of the data that should go in a Document, and a perfect method for detecting when any of that data has changed, it probably isn't practical/efficient to reindex every time .. you might want to say that creating/deleteing or modifying the "core" aspects of a news article (ie: title, dek, byline, body, categories, publish date) should trigger an immediate index update, but for things like user comments it might make more sense to have a batch process that runs every N minutes and reindexes any article that has had comments added in the last N minutes ... except maybe you want to be more responsive to comments added to "recent" articles, so maybe you configur two seperate instances of that cron job, one where N is small but it only looks at articles published today, and another where N is larger and it looks at older articles. ...these are the kinds of tradoffs that typically have to be made between indexing data quickly and getting good performance out of your index, and it's why i've never tried to build a "general purpose" indexer for Solr -- the needs of different indexes are too differnet for it to make much sense. Besides: if it were that easy, google would have a hosted solution with a REST API and everyone would just use them to search their sites. :) -Hoss