On Sunday, November 30, 2003, at 02:36 PM, Stefano Mazzocchi wrote:
Lucene scalability is not impaired by the number of documents. You basically create a matrix document/token and then create an hashtable of the tokens and you get the documents (modulo how ranking is performed, thru, I believe, sorting euclidean distance in the document vector space between the query and the documents found)

That's nice, has been used for decades in all full-text search engines and can be optimized a lot (and lucene is a nice implementation of those algorithms).

But how do I use this for something that looks a lot like a relational query?

The more we discuss it the more I am coming to the conclusion that Lucene for properties may not be the right approach, but I cannot say for sure. It would at least have to be such that content indexing is separate from property indexing since properties are more likely to change than content and to update a document in a Lucene index it must be removed and re-added.


My biggest fear is hitting the O(n) complexity: it might still run like a breeze with 100 documents, but could crawl on its knees if you reach 10000... and by the time you realize this, it's where you need the repository the most because your data gets big and unmanageable without a repository!

Eric suggests that there could be ways to index documents and its properties into lucene and then use DASL on it. What I want to understand is the algorithmical complexity of such an approach.

if it can be made O(1) or even O(log(n)), I'm sold. but if this gets O(f(n)) where n is the document number and f(n) grows more than log(n), well, we have a problem.

Like you said in the first sentence above though, Lucene scalability is not impaired by the number of documents. It is by terms. And for a property that has only a few values (like your workflow example earlier), it merely finds the term being queried for and then returns all the document id's that match (not even the full documents, just their id initially).


Erik


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to