Hi Dan, I don't follow the second paragraph. Not sure what you are trying to do, what you've tried, what didn't work and how...
Otis ---- Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html >________________________________ > From: Dan McGinn-Combs <dgco...@gmail.com> >To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> >Sent: Saturday, December 17, 2011 9:30 AM >Subject: Re: Retrieving Documents > >Good pointer. Thank you, that is exactly what I had in mind. To the >second point, yes, sort of. > >I've managed to take apart a sample of the ePub documents (there are a >finite number). Inside the ePub are single HTML documents that are a >single page of the overall book. It would be super to be able to parse >the title (originally formed from the page number) to set up a >dynamically generated documented and include that as part of the >results. Combing the wiki now since that's where every answers seems >to be! Pointers welcome though. >Thanks! >-- >Dan McGinn-Combs > >On Dec 16, 2011, at 11:52 PM, Otis Gospodnetic ><otis_gospodne...@yahoo.com> wrote: > >> Hi Dan, >> >> 1) Are you looking for >> http://wiki.apache.org/solr/HighlightingParameters#hl.fragsize ? >> >> 2) Hundreds of words in a field should not be a problem for highlighting. >> But it sounds like this long field may contain content that corresponds to N >> different pages in a publication and you would like to inform the searcher >> which page the match was on, and not just that a match was somewhere in that >> big piece of text. One way to deal with that is to break your document into >> N smaller documents - one document for each page. >> >> Otis >> ---- >> >> Performance Monitoring SaaS for Solr - >> http://sematext.com/spm/solr-performance-monitoring/index.html >> >> >> >>> ________________________________ >>> From: Dan McGinn-Combs <dgco...@gmail.com> >>> To: solr-user@lucene.apache.org >>> Sent: Friday, December 16, 2011 4:33 PM >>> Subject: Retrieving Documents >>> >>> I've been doing a fair amount of reading and experimenting with Solr >>> lately. I find that it does a good job of indexing very structured >>> documents. However, the application I have in mind is build around >>> long EPUB documents. >>> >>> Of course, I found the Extract components useful for indexing the >>> EPUBs. However, I would like to be able to >>> >>> * Size the "highlight" portion of text around the query parameters >>> (i.e. show 20 or 30 words) and >>> >>> * Retrieve a location within the document so I can display that "page" >>> from the EPUB. >>> >>> What is common practice for these? I notice that if I have a list of >>> (short) text segments in fields, they are stored without too much fuss >>> and are retrievable. However, I'm talking about a field of potentially >>> hundreds of words. >>> >>> Thanks for any pointers, >>> Dan >>> >>> -- >>> Dan McGinn-Combs >>> dgco...@gmail.com >>> Peachtree City, Georgia USA >>> >>> > > >