[
https://issues.apache.org/jira/browse/SOLR-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795943#action_12795943
]
Lance Norskog commented on SOLR-380:
------------------------------------
Please ask this on solr-user. Issues are for discussing implementations.
Lucene payloads are supported by Solr, and a rectangle per term can be stored
as a payload. This allows the text to be indexed as a text field, and all
queries including phrases will work as normal.
> There's no way to convert search results into page-level hits of a
> "structured document".
> -----------------------------------------------------------------------------------------
>
> Key: SOLR-380
> URL: https://issues.apache.org/jira/browse/SOLR-380
> Project: Solr
> Issue Type: New Feature
> Components: search
> Reporter: Tricia Williams
> Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-380-XmlPayload.patch, SOLR-380-XmlPayload.patch,
> xmlpayload-example.zip, xmlpayload-src.jar, xmlpayload.jar
>
>
> "Paged-Text" FieldType for Solr
> A chance to dig into the guts of Solr. The problem: If we index a monograph
> in Solr, there's no way to convert search results into page-level hits. The
> solution: have a "paged-text" fieldtype which keeps track of page divisions
> as it indexes, and reports page-level hits in the search results.
> The input would contain page milestones: <page id="234"/>. As Solr processed
> the tokens (using its standard tokenizers and filters), it would concurrently
> build a structural map of the item, indicating which term position marked the
> beginning of which page: <page id="234" firstterm="14324"/>. This map would
> be stored in an unindexed field in some efficient format.
> At search time, Solr would retrieve term positions for all hits that are
> returned in the current request, and use the stored map to determine page ids
> for each term position. The results would imitate the results for
> highlighting, something like:
> <lst name="pages">
> <lst name="doc1">
> <int name="pageid">234</int>
> <int name="pageid">236</int>
> </lst>
> <lst name="doc2">
> <int name="pageid">19</int>
> </lst>
> </lst>
> <lst name="hitpos">
> <lst name="doc1">
> <lst name="234">
> <int
> name="pos">14325</int>
> </lst>
> </lst>
> ...
> </lst>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.