Re: Some new SOLR features

Jason Rutherglen Tue, 16 Sep 2008 07:12:35 -0700

Hello Ryan,

>  SQL database such as H2


Mainly to offer joins and be able to perform hierarchical queries.
Also any other types of queries a hybrid SQL search system would
offer.  This is something that is best built into SOLR rather than
Lucene.  It seems like a lot of the users of SOLR work with SQL
databases as well.  It would seem natural to integrate the two.  Also
the Summize realtime search system that Twitter purchased worked by
integrating with Mysql.  The way to do something similar in Lucene
would be to integrate with a Java SQL database.  Also hierarchical
queries could be performed faster using this method (though I could be
wrong, if there is a better way).

> to have multiple lucene indexes within a single SolrCore?

I don't like the whole multi core thing from an administrative
perspective.  That means each index needs a separate schema and
configuration etc.  That becomes hard to manage if there are 10+
indexes required and is definitely not as simple as an SQL database
does not require so many separate directories and manual
configuration.  It would be simple to add this into SOLR.  In general
though I have trouble figuring out many of the design decisions of
SOLR though and so hesitate to implement things that seem to go
against the SOLR design model (is there one?).

> 9. Distributed search and updates using a object serialization which

Where would I start with integrating this into SOLR?  Need some help
on that part of it.  Tell me what's best and I'll integrate it, it
should be the easiest on the list.

Jason

On Mon, Sep 15, 2008 at 11:44 AM, Ryan McKinley <[EMAIL PROTECTED]> wrote:
>>
>
> Here are my gut reactions to this list... in general, most of this comes
> down to "sounds great, if someone did the work I'm all for it"!
>
> Also, no need to post to solr-user AND solr-dev, probably better to think of
> solr-user as a superset of solr-dev.
>
>
>> 1. Machine learning based suggest feature
>> https://issues.apache.org/jira/browse/LUCENE-626 which is implemented
>> as is similar to what Google in their suggest implementation.  The
>> Fuzzy based spellchecker is ok, but it would be better to incorporate
>> use behavior.
>> 2. Realtime updates https://issues.apache.org/jira/browse/LUCENE-1313
>> and work being planned for IndexWriter
>> 3. Realtime untokenized field updates
>> https://issues.apache.org/jira/browse/LUCENE-1292
>
> Without knowing the details of these patches, everything sounds great.
>
> In my view, SOLR should offer a nice interface to anything in lucene
> core/contrib
>
>>
>> 4. BM25 Scoring
>
> Again, no idea, but if implement in lucene yes
>
>>
>> 5. Integration with an open source SQL database such as H2.  This
>> would mean under the hood, SOLR would enable storing data in a
>> relational database to allow for joins and things.  It would need to
>> be combined with realtime updates.  H2 has Lucene integration but it
>> is the usual index everything at once, non-incrementally.  The new
>> system would simply index as a new row in a table is added.  The SOLR
>> schema could allow for certain fields being stored in an SQL database.
>
> Sounds interesting -- what is the basic problem you are addressing?
>
> (It seems you are pointing to something specific, and describing your
> solution)
>
>
>>
>> 6. SOLR schema allowing for multiple indexes without using the
>> multicore.  The indexes could be defined like SQL tables in the
>> schema.xml file.
>
> Is this just a configuration issue?  I defiantly hope we can make
> configuration easier in the future.
>
> As is, a custom handler can look at multiple indexes... why is their a need
> to have multiple lucene indexes within a single SolrCore?
>
>
>>
>> 6. Crowd by feature ala GBase
>> http://code.google.com/apis/base/attrs-queries.html#crowding which is
>> similar to Field Collapsing.  I am thinking it is advantageous from a
>> performance perspective to obtain an excessive amount of results, then
>> filter down the result set, rather than first sort a result set.
>
> Again, sounds great!  I would love to see it.
>
>>
>> 7. Improved relevance based on user clicks of individual query results
>> for individual queries.  This can be thought of as similar to what
>> Digg does.  I'm sure Google does something similar.  It is a feature
>> that would be of value to almost any SOLR implementation.
>
> Agreed -- if there is a good way to quickly update a field used for
> sorting/scoring, this would happen
>
>>
>> 8. Integration of LocalSolr into the standard SOLR distribution.
>> Location is something many sites use these days and is standard in
>> GBase and most likely other products like FAST.
>
> I'm working on it....  will be a lucene contrib package and cooked into the
> core solr distribution.
>
>
>>
>> 9. Distributed search and updates using a object serialization which
>> could use.  https://issues.apache.org/jira/browse/LUCENE-1336  This
>> allows span queries, custom payload queries, custom similarities,
>> custom analyzers, without compiling and deploying and a new SOLR war
>> file to individual servers.
>
>
> sounds good (but I have no technical basis to say so)
>
>
> ryan
>
>

Re: Some new SOLR features

Reply via email to