Good pointer. Thank you, that is exactly what I had in mind. To the second point, yes, sort of.
I've managed to take apart a sample of the ePub documents (there are a finite number). Inside the ePub are single HTML documents that are a single page of the overall book. It would be super to be able to parse the title (originally formed from the page number) to set up a dynamically generated documented and include that as part of the results. Combing the wiki now since that's where every answers seems to be! Pointers welcome though. Thanks! -- Dan McGinn-Combs On Dec 16, 2011, at 11:52 PM, Otis Gospodnetic <otis_gospodne...@yahoo.com> wrote: > Hi Dan, > > 1) Are you looking for > http://wiki.apache.org/solr/HighlightingParameters#hl.fragsize ? > > 2) Hundreds of words in a field should not be a problem for highlighting. > But it sounds like this long field may contain content that corresponds to N > different pages in a publication and you would like to inform the searcher > which page the match was on, and not just that a match was somewhere in that > big piece of text. One way to deal with that is to break your document into > N smaller documents - one document for each page. > > Otis > ---- > > Performance Monitoring SaaS for Solr - > http://sematext.com/spm/solr-performance-monitoring/index.html > > > >> ________________________________ >> From: Dan McGinn-Combs <dgco...@gmail.com> >> To: solr-user@lucene.apache.org >> Sent: Friday, December 16, 2011 4:33 PM >> Subject: Retrieving Documents >> >> I've been doing a fair amount of reading and experimenting with Solr >> lately. I find that it does a good job of indexing very structured >> documents. However, the application I have in mind is build around >> long EPUB documents. >> >> Of course, I found the Extract components useful for indexing the >> EPUBs. However, I would like to be able to >> >> * Size the "highlight" portion of text around the query parameters >> (i.e. show 20 or 30 words) and >> >> * Retrieve a location within the document so I can display that "page" >> from the EPUB. >> >> What is common practice for these? I notice that if I have a list of >> (short) text segments in fields, they are stored without too much fuss >> and are retrievable. However, I'm talking about a field of potentially >> hundreds of words. >> >> Thanks for any pointers, >> Dan >> >> -- >> Dan McGinn-Combs >> dgco...@gmail.com >> Peachtree City, Georgia USA >> >>