[jira] Commented: (SOLR-193) General SolrDocument interface to manage field values.

Ryan McKinley (JIRA) Tue, 12 Jun 2007 19:17:48 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504095
 ]


Ryan McKinley commented on SOLR-193:
------------------------------------


For background.  This class has functionality used for other issues including 
SOLR-104, SOLR-139.  For a while i tried keeping the functionality in different 
patches, but it became too much of a nightmare to maintain.  Perhaps it would 
be better to leave out the edge cases and just focus on the SolrDocument 
interface now...


> what is setDistinctByDefault, or setDistinctOrderMatters ?
> 

These options let you say if the field values should be backed by a Map<String> 
or a List<String>, the DistinctOrderMatters says if it should be Map<String> or 
LinkedHashMap<String>

These were useful for SOLR-104 when you SQL join a table and may get duplicate 
rows, but only want the distinct values to keep fields.

Now that you point it out, (and there is a good chance it will be in trunk 
soon) It would make more sense to implement these features as different 
subclasses of SimpleSolrDocument.


> Also, what is the purpose/use of DocumentBuilder.build and 
> DocumentBuilder.loadStoredFields 

This is for SOLR-139.  to 'modify' a document, you load the existing Document - 
change it - then store it back.

These two functions can happily live in a new class, and could be attached to 
SOLR-139.


>   2) I thought the SolrDocument API was for incoming documents ... 

I hope it is also useful for modifying existing Documents and transforming 
incoming/outgoing documents (but I'll raise that issue later ;)


> I think it's a mistake to try and have one single Interface for all three. 
> ... At the very least there should be a seperate API for the indexing side 
> and the query side (because of the boost issue) which can be  
> subclass/superclass relationships.
> 

This sounds fine.  We should *defiantly* solve any know problems with the 
Lucene document interface.  Just using an interface (rather then a concrete 
class) will be a huge help.

Is the only difference between the input Document and output Document that it 
has boosts?

Should we have:
 SolrDocument
   + BoostedSolrDocument

 or

 SolrDocument
   + IndexSolrDocument

Any thoughts on the common use case where I want to pull a document out of the 
index (no boosts) change it, then put it back?  Do i need to make a new class 
and copy all the fields?  Should SOLR-20 be able to index a SolrDocument (no 
boosts) as well as a BoostedSolrDocument?  I think so...


Thanks for looking at this!  


> General SolrDocument interface to manage field values.
> ------------------------------------------------------
>
>                 Key: SOLR-193
>                 URL: https://issues.apache.org/jira/browse/SOLR-193
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Ryan McKinley
>         Attachments: SOLR-193-SolrDocument.patch, 
> SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch, 
> SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch, 
> SOLR-193-SolrDocument.patch
>
>
> In an effort to make SOLR-139 (the "modify" command) more manageable, i 
> extracted out a large chunk.  This patch adds a general SolrDocument 
> interface and includes a concrete implementation (SimpleSolrDoc)
> SOLR-139 needs some way to transport document values independent of the 
> lucene Document.  This is required for the INCREMENT command and useful for 
> modifying documents.  SolrDocument is also generally useful for SOLR-20
> - - - - - -
> The one (potentially) controversial part is that I added a function to 
> FieldType:
>  public Object toExternalValue(Fieldable f);
> This asks each field type to convert its Fieldable into its real type, for 
> example IntField.java has:
>  public Integer toExternalValue(Fieldable f) {
>    return Integer.valueOf( toExternal(f) );
>  }
> By default, it returns a string value.  If this addition is too much, there 
> are other (less clean) ways to handle the INCREMENT command.  My real 
> motivation for this addition is that it makes it possible to implement an 
> embeddable SOLR-20 client that does not need an HTTP connection. 
> - - - -
> The SimpleSolrDoc implementation was written for SOLR-20.  It needs to play 
> nice with EL, so it implements a few extra map function that may not seem 
> necessary:
>  ${doc.values['name']]} gets a collection
>  ${doc.valueMap['name']]} gets a single value for the field
> - - - -
> The tests cover all "toExternalValue" changes in schema.*  
> SimpleSolrDoc and DocumentBuilder have 100% test coverage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-193) General SolrDocument interface to manage field values.

Reply via email to