Various Ideas from ApacheCon

2007-05-07 Thread Grant Ingersoll
Hey Gang, Back from ApacheCon in Amsterdam, and thought I would give a bit of a report on a few things that were interesting related to Lucene. First off, there was a very high level of interest in Lucene and Solr, which was great to see. In doing a training and a talk, couple of things t

Re: Various Ideas from ApacheCon

2007-05-07 Thread robert engels
I think the 'updating documents' issue is almost always related to unique document updates, where there exists some "primary unique key" for the document. Is this true? If so, maybe a de-facto standard like a indexed/stored/non-tokenized field of OID should be used. if so, it would be eas

Re: Various Ideas from ApacheCon

2007-05-07 Thread Ian Holsman
Grant Ingersoll wrote: 2. How does Lucene search compare w/ using built in DB search? Has anyone done a study comparing Lucene performance/quality to the likes of MySQL/Postgres/Oracle? Related question is always on how to integrate the two. Hi Grant. when we initially investigated using

Re: Various Ideas from ApacheCon

2007-05-07 Thread Grant Ingersoll
Yep, my advice always is use a db for what a db is designed for (set manipulation) and use Lucene for what it is good for, but some people were commenting that DB text search is improving in terms of quality. I could see that if flex. indexing gets implemented, that we could implement other

Re: Various Ideas from ApacheCon

2007-05-07 Thread Chris Hostetter
: I think the 'updating documents' issue is almost always related to : unique document updates, where there exists some "primary unique key" : for the document. Is this true? : if so, it would be easy to add the following to IndexModifer: : : addDocument(Document) : updateDocument(Document) the

Re: Various Ideas from ApacheCon

2007-05-07 Thread Chris Hostetter
: Yep, my advice always is use a db for what a db is designed for (set : manipulation) and use Lucene for what it is good for, but some people careful how you word that advice ... Solr's first use case was faceted browsing because using Lucene to generate BitSets and computing the intersection co

Re: Various Ideas from ApacheCon

2007-05-07 Thread robert engels
I am not sure I agree with that. Document management systems are quite common these days, and people are used to "checking out" a document, making changes, and checking the entire document back in. In many ways Lucene can be viewed as a self-contained document mngt system if you store eve

Re: Various Ideas from ApacheCon

2007-05-07 Thread Chris Hostetter
: I am not sure I agree with that. i don't think i understand what part you don't agree with :) : Document management systems are quite common these days, and people : are used to "checking out" a document, making changes, and checking : the entire document back in. : : In many ways Lucene can b

Re: Various Ideas from ApacheCon

2007-05-08 Thread Grant Ingersoll
I agree with you characterization. We love the speed and performance of Lucene, but the updating process just doesn't feel right in that context. I think the common use case that comes to mind is tagging a document. Every time a doc gets tagged, you have to rebuild it, or manage multip

Re: Various Ideas from ApacheCon

2007-05-08 Thread Doron Cohen
> : If the user is savvy enough to 'rebuild' their documents from an > : external source, then the fields do not need to be stored (just the > : OID field for convenience). > > it's this rebuilding that people tend to dislike about the delete/re-add > process that's currently neccessary to "update"

Re: Various Ideas from ApacheCon

2007-05-09 Thread James liu
I think the topest thing lucene/solr should do: 1: more easy use and less code 2: distributed index and search 3: manage these index and search server 4: test method or tool i don't agree 2007/5/8, Grant Ingersoll <[EMAIL PROTECTED]>:Yep, my advice always is use a db for what a db is designed fo

Re: Various Ideas from ApacheCon

2007-05-10 Thread J. Delgado
The ever growing presence of mingled structured and unstructured data is a fact of life and modern systems we have to deal with. Clearly, the tendency is that full-text indexing is moving towards DB functionality, i.e. fields for projection/filtering, sorting, faceted queries, transactional CRUD