IndexableBinaryStringTools (was FieldCache)

Mathias Walter Tue, 02 Nov 2010 09:24:35 -0700

Hi,

> > [...] I tried to use IndexableBinaryStringTools to re-encode my 11 byte
> > array. The size was increased to 7 characters (= 14 bytes)
> > which is still a gain of more than 50 percent compared to the UTF8
> > encoding. BTW: I found no sample how to use the
> > IndexableBinaryStringTools class except in the unit tests.
> 
> IndexableBinaryStringTools will eventually be deprecated and then dropped, in 
> favor of native
> indexable/searchable binary terms.  More work is required before these are 
> possible, though.
> 
> Well-maintained unit tests are not a bad way to describe functionality...


Sure, but there is no unit test for Solr.

> > I assume that the char[] returned form IndexableBinaryStringTools.encode
> > is encoded in UTF-8 again and then stored. At some point
> > the information is lost and cannot be recovered.
> 
> Can you give an example?  This should not happen.

It's hard to give an example output, because the binary string representation 
contains unprintiple characters. I'll try to explain what I'm doing.

My character array returned by IndexableBinaryStringTools.encode looks like 
following:

char[] encoded = new char[] {0, 8508, 3392, 64, 0, 8, 0, 0};

Then I add it to a SolrInputDocument:

SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", new String(encoded));

If I now print the SolrInputDocument using System.out.println(doc), the String 
representation of the character array is correct.

Then I add it to a RAMDirectory:

ArrayList<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();
docs.add(doc);
solrServer.add(docs);
solrServer.commit();

... and immediately retrieve it like follows:

SolrQuery query = new SolrQuery();
query.setQuery("*:*");
QueryResponse rsp = solrServer.query(query);
SolrDocumentList docList = rsp.getResults();
System.out.println(docList);

Now the string representation of the SolrDocuments ID looks different than that 
of the SolrInputDocument.

If I do not create a new string in doc.addField, just the string representation 
of the array address will be added the the SolrInputDocument.

BTW: I've tested it with EmbeddedSolrServer and Solr/Lucene trunk.

Why has the string representation changed? From the changed string I cannot 
decode the correct ID.

--
Kind regards,
Mathias

IndexableBinaryStringTools (was FieldCache)

Reply via email to