You know what, I think I missed a major description in my earlier email. I
want to be able to return additional data from stored fields alongside the
snippets during highlighting. In this case, the filename where this snippet
came from. Not sure your approach would address that.

On Mon, Feb 17, 2020, 10:44 Edward Ribeiro <edward.ribe...@gmail.com> wrote:

> Hi,
>
> You may try to create two kinds of docs forming a parent-child relationship
> without nesting. Like
>
> <doc>
> <id>894</id>
> <type>parent</type>
>
> ...
> <doc/>
>
> <doc>
> <id>3213</id>
> <type>child</type>
> <parent_id>894</parent_id>
> <metadata field 1>xxx
> <file_content_en_US> portion of file 1
> <file_content_en_US> remaining portion of file 1
> ...
> <doc/>
>
> Then you can add metadata for each child doc. The search can be done on
> child docs but if you need to group you can use the join query parser (it
> has some limitations though) or grouping by parent_id.
>
> Cheers,
> Edward
>
>
> Em seg, 17 de fev de 2020 12:25, Srijan <shree...@gmail.com> escreveu:
>
> > Hi,
> >
> > I have a data model where the operational "Object" can have one or more
> > files attached. Indexing these objects in Solr means indexing all
> metadata
> > info and the contents of the files. For file contents what I have right
> now
> > is a single multi-valued field (for each locale)
> >
> > Example:
> > <doc>
> > <metadata field 1>xxx
> > <metadata field 2>yyy
> > <file_content_en_US> portion of file 1
> > <file_content_en_US> remaining portion of file 1
> > <file_content_en_US> portion of file 2
> > <file_content_en_US> contents from file 2 again...
> > ...
> > </doc>
> >
> > Search is easy and everything's been working fine. We recently introduced
> > highlighting functionality on these file content fields. Again, straight
> > forward use-case. Next requirement is where things get a little tricky.
> We
> > want to be able to return the name of the file ( generalizing this - or
> > some other metadata info related to the file content field). If our data
> > model had a 1:1 relation between our operational object and the file it
> > contains, the file name would have been just another field on the main
> doc
> > but unfortunately that's not the case - each file content field could
> > belong to any file.
> >
> > There are a couple of potential solutions I have been thinking of:
> > 1. Use nested docs to preserve the logical grouping of file content and
> the
> > file info where this content is coming from. This could potentially work
> > but I haven't done any testing yet (I know highlighting doesn't work on
> > nested docs for example)
> >
> > 2. Encode the file name in the file content fields themselves. The file
> > name will be removed during indexing but will be stored. How do I get the
> > file name included in each snippet fragment - this again needs exploring
> on
> > my end
> >
> > Another approach I have been thinking is extending the StoredField to
> also
> > store additional meta data information. So basically when a stored field
> is
> > retrieved, or a fragment is returned, I also have additional information
> > associated with the stored field. Can someone tell me this is a terrible
> > idea and I should not be pursuing.
> >
> > Is there something else I can try?
> >
> > Thanks a lot,
> > Srijan
> >
>

Reply via email to