[ 
https://issues.apache.org/jira/browse/SOLR-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-810.
---------------------------------

    Resolution: Won't Fix

SPRING_CLEANING_2013 we can reopen if necessary. 
                
> changes for javabin format
> --------------------------
>
>                 Key: SOLR-810
>                 URL: https://issues.apache.org/jira/browse/SOLR-810
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Noble Paul
>
> For storage purposes javabin can be quite inefficient assuming that we write 
> one document at a time. The field names may be written for each document 
> which makes it inefficient. 
> javabin can be as efficient as a format like say thrift/protocol buffers if 
> we do not pay the price of a string per name. We can easily achieve it using 
> a new type  KNOWN_STRING. 
> KNOWN_STRING can be like an EXTERN_STRING but it is just that these are 
> preconfigured string names which is a map of index -> string . The known 
> string list can probably have a version . The client must be using a newer 
> version known string list than the server . 
> an example looks like
> {code}
> 1:responseHeader
> 2:QTime
> 3:status
> {code}
> A newer version of the string list can add a new string at a new index but it 
> must never change the index of an existing string. This is similar to an IDL 
> file of thrift/protocol buffers but w/o any of those complexities
> So when an EXTERN_STRING is written it first looks up in the KNOWN_STRING 
> map. If it is present , it is written as a KNOWN_STRING instead of an 
> EXTERN_STRING . The value will be the index
> Another addition could be a zip string type. This is useful when javabin is 
> used for storing data . In storage, the performance cost of 
> serialization/deserialization may not be as important as the space itself.  
> This may also have a minimum size to compress . Only large strings (say > 
> 2KB?) may need to be serialized

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to