Good pointer. Thank you, that is exactly what I had in mind. To the
second point, yes, sort of.

I've managed to take apart a sample of the ePub documents (there are a
finite number). Inside the ePub are single HTML documents that are a
single page of the overall book. It would be super to be able to parse
the title (originally formed from the page number) to set up a
dynamically generated documented and include that as part of the
results. Combing the wiki now since that's where every answers seems
to be! Pointers welcome though.
Thanks!
--
Dan McGinn-Combs

On Dec 16, 2011, at 11:52 PM, Otis Gospodnetic
<otis_gospodne...@yahoo.com> wrote:

> Hi Dan,
>
> 1) Are you looking for 
> http://wiki.apache.org/solr/HighlightingParameters#hl.fragsize ?
>
> 2) Hundreds of words in a field should not be a problem for highlighting.  
> But it sounds like this long field may contain content that corresponds to N 
> different pages in a publication and you would like to inform the searcher 
> which page the match was on, and not just that a match was somewhere in that 
> big piece of text.  One way to deal with that is to break your document into 
> N smaller documents - one document for each page.
>
> Otis
> ----
>
> Performance Monitoring SaaS for Solr - 
> http://sematext.com/spm/solr-performance-monitoring/index.html
>
>
>
>> ________________________________
>> From: Dan McGinn-Combs <dgco...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Friday, December 16, 2011 4:33 PM
>> Subject: Retrieving Documents
>>
>> I've been doing a fair amount of reading and experimenting with Solr
>> lately. I find that it does a good job of indexing very structured
>> documents. However, the application I have in mind is build around
>> long EPUB documents.
>>
>> Of course, I found the Extract components useful for indexing the
>> EPUBs. However, I would like to be able to
>>
>> * Size the "highlight" portion of text around the query parameters
>> (i.e. show 20 or 30 words) and
>>
>> * Retrieve a location within the document so I can display that "page"
>> from the EPUB.
>>
>> What is common practice for these? I notice that if I have a list of
>> (short) text segments in fields, they are stored without too much fuss
>> and are retrievable. However, I'm talking about a field of potentially
>> hundreds of words.
>>
>> Thanks for any pointers,
>> Dan
>>
>> --
>> Dan McGinn-Combs
>> dgco...@gmail.com
>> Peachtree City, Georgia USA
>>
>>

Reply via email to