Hi,

I have a data model where the operational "Object" can have one or more
files attached. Indexing these objects in Solr means indexing all metadata
info and the contents of the files. For file contents what I have right now
is a single multi-valued field (for each locale)

Example:
<doc>
<metadata field 1>xxx
<metadata field 2>yyy
<file_content_en_US> portion of file 1
<file_content_en_US> remaining portion of file 1
<file_content_en_US> portion of file 2
<file_content_en_US> contents from file 2 again...
...
</doc>

Search is easy and everything's been working fine. We recently introduced
highlighting functionality on these file content fields. Again, straight
forward use-case. Next requirement is where things get a little tricky. We
want to be able to return the name of the file ( generalizing this - or
some other metadata info related to the file content field). If our data
model had a 1:1 relation between our operational object and the file it
contains, the file name would have been just another field on the main doc
but unfortunately that's not the case - each file content field could
belong to any file.

There are a couple of potential solutions I have been thinking of:
1. Use nested docs to preserve the logical grouping of file content and the
file info where this content is coming from. This could potentially work
but I haven't done any testing yet (I know highlighting doesn't work on
nested docs for example)

2. Encode the file name in the file content fields themselves. The file
name will be removed during indexing but will be stored. How do I get the
file name included in each snippet fragment - this again needs exploring on
my end

Another approach I have been thinking is extending the StoredField to also
store additional meta data information. So basically when a stored field is
retrieved, or a fragment is returned, I also have additional information
associated with the stored field. Can someone tell me this is a terrible
idea and I should not be pursuing.

Is there something else I can try?

Thanks a lot,
Srijan

Reply via email to