Sorry, my fault, I bypassed this excerpt of yours: " do I get the file name included in each snippet fragment - this again needs exploring on my end". No, the solution I proposed doesn't address that. :(
Edward Em seg, 17 de fev de 2020 14:03, Srijan <shree...@gmail.com> escreveu: > You know what, I think I missed a major description in my earlier email. I > want to be able to return additional data from stored fields alongside the > snippets during highlighting. In this case, the filename where this snippet > came from. Not sure your approach would address that. > > On Mon, Feb 17, 2020, 10:44 Edward Ribeiro <edward.ribe...@gmail.com> > wrote: > > > Hi, > > > > You may try to create two kinds of docs forming a parent-child > relationship > > without nesting. Like > > > > <doc> > > <id>894</id> > > <type>parent</type> > > > > ... > > <doc/> > > > > <doc> > > <id>3213</id> > > <type>child</type> > > <parent_id>894</parent_id> > > <metadata field 1>xxx > > <file_content_en_US> portion of file 1 > > <file_content_en_US> remaining portion of file 1 > > ... > > <doc/> > > > > Then you can add metadata for each child doc. The search can be done on > > child docs but if you need to group you can use the join query parser (it > > has some limitations though) or grouping by parent_id. > > > > Cheers, > > Edward > > > > > > Em seg, 17 de fev de 2020 12:25, Srijan <shree...@gmail.com> escreveu: > > > > > Hi, > > > > > > I have a data model where the operational "Object" can have one or more > > > files attached. Indexing these objects in Solr means indexing all > > metadata > > > info and the contents of the files. For file contents what I have right > > now > > > is a single multi-valued field (for each locale) > > > > > > Example: > > > <doc> > > > <metadata field 1>xxx > > > <metadata field 2>yyy > > > <file_content_en_US> portion of file 1 > > > <file_content_en_US> remaining portion of file 1 > > > <file_content_en_US> portion of file 2 > > > <file_content_en_US> contents from file 2 again... > > > ... > > > </doc> > > > > > > Search is easy and everything's been working fine. We recently > introduced > > > highlighting functionality on these file content fields. Again, > straight > > > forward use-case. Next requirement is where things get a little tricky. > > We > > > want to be able to return the name of the file ( generalizing this - or > > > some other metadata info related to the file content field). If our > data > > > model had a 1:1 relation between our operational object and the file it > > > contains, the file name would have been just another field on the main > > doc > > > but unfortunately that's not the case - each file content field could > > > belong to any file. > > > > > > There are a couple of potential solutions I have been thinking of: > > > 1. Use nested docs to preserve the logical grouping of file content and > > the > > > file info where this content is coming from. This could potentially > work > > > but I haven't done any testing yet (I know highlighting doesn't work on > > > nested docs for example) > > > > > > 2. Encode the file name in the file content fields themselves. The file > > > name will be removed during indexing but will be stored. How do I get > the > > > file name included in each snippet fragment - this again needs > exploring > > on > > > my end > > > > > > Another approach I have been thinking is extending the StoredField to > > also > > > store additional meta data information. So basically when a stored > field > > is > > > retrieved, or a fragment is returned, I also have additional > information > > > associated with the stored field. Can someone tell me this is a > terrible > > > idea and I should not be pursuing. > > > > > > Is there something else I can try? > > > > > > Thanks a lot, > > > Srijan > > > > > >