Hi, > > [...] I tried to use IndexableBinaryStringTools to re-encode my 11 byte > > array. The size was increased to 7 characters (= 14 bytes) > > which is still a gain of more than 50 percent compared to the UTF8 > > encoding. BTW: I found no sample how to use the > > IndexableBinaryStringTools class except in the unit tests. > > IndexableBinaryStringTools will eventually be deprecated and then dropped, in > favor of native > indexable/searchable binary terms. More work is required before these are > possible, though. > > Well-maintained unit tests are not a bad way to describe functionality...
Sure, but there is no unit test for Solr. > > I assume that the char[] returned form IndexableBinaryStringTools.encode > > is encoded in UTF-8 again and then stored. At some point > > the information is lost and cannot be recovered. > > Can you give an example? This should not happen. It's hard to give an example output, because the binary string representation contains unprintiple characters. I'll try to explain what I'm doing. My character array returned by IndexableBinaryStringTools.encode looks like following: char[] encoded = new char[] {0, 8508, 3392, 64, 0, 8, 0, 0}; Then I add it to a SolrInputDocument: SolrInputDocument doc = new SolrInputDocument(); doc.addField("id", new String(encoded)); If I now print the SolrInputDocument using System.out.println(doc), the String representation of the character array is correct. Then I add it to a RAMDirectory: ArrayList<SolrInputDocument> docs = new ArrayList<SolrInputDocument>(); docs.add(doc); solrServer.add(docs); solrServer.commit(); ... and immediately retrieve it like follows: SolrQuery query = new SolrQuery(); query.setQuery("*:*"); QueryResponse rsp = solrServer.query(query); SolrDocumentList docList = rsp.getResults(); System.out.println(docList); Now the string representation of the SolrDocuments ID looks different than that of the SolrInputDocument. If I do not create a new string in doc.addField, just the string representation of the array address will be added the the SolrInputDocument. BTW: I've tested it with EmbeddedSolrServer and Solr/Lucene trunk. Why has the string representation changed? From the changed string I cannot decode the correct ID. -- Kind regards, Mathias