Sorry, didn't think this through. You're right, still the same problem.. On 16 Apr 2014 17:40, "Alexandre Rafalovitch" <arafa...@gmail.com> wrote:
> Why? I want stored=false, at which point multivalued field is just offset > values in the dictionary. Still have to reconstruct from offsets. > > Or am I missing something? > > Regards, > Alex > On 16/04/2014 10:59 pm, "Ramkumar R. Aiyengar" <andyetitmo...@gmail.com> > wrote: > > > Logically if you tokenize and put the results in a multivalued field, you > > should be able to get all values in sequence? > > On 16 Apr 2014 16:51, "Alexandre Rafalovitch" <arafa...@gmail.com> > wrote: > > > > > Hello, > > > > > > If I use very basic tokenizers, e.g. space based and no filters, can I > > > reconstruct the text from the tokenized form? > > > > > > So, "This is a test" -> "This", "is", "a", "test" -> "This is a test"? > > > > > > I know we store enough information, but I don't know internal API > > > enough to know what I should be looking at for reconstruction > > > algorithm. > > > > > > Any hints? > > > > > > The XY problem is that I want to store large amount of very repeatable > > > text into Solr. I want the index to be as small as possible, so > > > thought if I just pre-tokenized, my dictionary will be quite small. > > > And I will be reconstructing some final form anyway. > > > > > > The other option is to just use compressed fields on stored field, but > > > I assume that does not take cross-document efficiencies into account. > > > And, it will be a read-only index after build, so I don't care about > > > updates messing things up. > > > > > > Regards, > > > Alex > > > > > > Personal website: http://www.outerthoughts.com/ > > > Current project: http://www.solr-start.com/ - Accelerating your Solr > > > proficiency > > > > > >