There is an old, open Jira, SOLR-380 - "There's no way to convert search
results into page-level hits of a "structured document".", but no recent
activity on it. It does have a lot of interesting commentary though. I
wouldn't get my hopes up.
See:
https://issues.apache.org/jira/browse/SOLR-380
The short answer is that you would have to re-parse the document yourself
since Tika/POI called from SolrCell simply parses the document into a
linear, unstructured stream of text, with no markers for pages. The SOLR-380
Jira issue may give you some clues.
I do have a related question: Would you want strictly integer page numbers
where the first page of any front matter is "1" or the actual literal page
numbers (e.g. "iii" or "A-1"). The former is simpler but incorrect if the
user thinks they can simply look for that page number in the document.
-- Jack Krupansky
-----Original Message-----
From: debdoot
Sent: Monday, August 06, 2012 9:13 AM
To: solr-user@lucene.apache.org
Subject: Returning page numbers where match occurs
Suppose, we are provisioning search over large text documents (e.g., Word,
PPT). It would be nice to have the highlighter component to return the page
numbers where the matches are found so that the same may be included in the
search result summaries. What is the most efficient way to accomplish this?
Thanks
Debdoot
--
View this message in context:
http://lucene.472066.n3.nabble.com/Returning-page-numbers-where-match-occurs-tp3999370.html
Sent from the Solr - User mailing list archive at Nabble.com.