There is an old, open Jira, SOLR-380 - "There's no way to convert search results into page-level hits of a "structured document".", but no recent activity on it. It does have a lot of interesting commentary though. I wouldn't get my hopes up.

See:
https://issues.apache.org/jira/browse/SOLR-380

The short answer is that you would have to re-parse the document yourself since Tika/POI called from SolrCell simply parses the document into a linear, unstructured stream of text, with no markers for pages. The SOLR-380 Jira issue may give you some clues.

I do have a related question: Would you want strictly integer page numbers where the first page of any front matter is "1" or the actual literal page numbers (e.g. "iii" or "A-1"). The former is simpler but incorrect if the user thinks they can simply look for that page number in the document.

-- Jack Krupansky

-----Original Message----- From: debdoot
Sent: Monday, August 06, 2012 9:13 AM
To: solr-user@lucene.apache.org
Subject: Returning page numbers where match occurs

Suppose, we are provisioning search over large text documents (e.g., Word,
PPT). It would be nice to have the highlighter component to return the page
numbers where the matches are found so that the same may be included in the
search result summaries. What is the most efficient way to accomplish this?

Thanks
Debdoot



--
View this message in context: http://lucene.472066.n3.nabble.com/Returning-page-numbers-where-match-occurs-tp3999370.html Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to