Re: Retrieving Documents

Otis Gospodnetic Sat, 17 Dec 2011 11:59:35 -0800

Hi Dan,

I don't follow the second paragraph.  Not sure what you are trying to do, what 
you've tried, what didn't work and how...


Otis
----

Performance Monitoring SaaS for Solr - 
http://sematext.com/spm/solr-performance-monitoring/index.html



>________________________________
> From: Dan McGinn-Combs <dgco...@gmail.com>
>To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> 
>Sent: Saturday, December 17, 2011 9:30 AM
>Subject: Re: Retrieving Documents
> 
>Good pointer. Thank you, that is exactly what I had in mind. To the
>second point, yes, sort of.
>
>I've managed to take apart a sample of the ePub documents (there are a
>finite number). Inside the ePub are single HTML documents that are a
>single page of the overall book. It would be super to be able to parse
>the title (originally formed from the page number) to set up a
>dynamically generated documented and include that as part of the
>results. Combing the wiki now since that's where every answers seems
>to be! Pointers welcome though.
>Thanks!
>--
>Dan McGinn-Combs
>
>On Dec 16, 2011, at 11:52 PM, Otis Gospodnetic
><otis_gospodne...@yahoo.com> wrote:
>
>> Hi Dan,
>>
>> 1) Are you looking for 
>> http://wiki.apache.org/solr/HighlightingParameters#hl.fragsize ?
>>
>> 2) Hundreds of words in a field should not be a problem for highlighting.  
>> But it sounds like this long field may contain content that corresponds to N 
>> different pages in a publication and you would like to inform the searcher 
>> which page the match was on, and not just that a match was somewhere in that 
>> big piece of text.  One way to deal with that is to break your document into 
>> N smaller documents - one document for each page.
>>
>> Otis
>> ----
>>
>> Performance Monitoring SaaS for Solr - 
>> http://sematext.com/spm/solr-performance-monitoring/index.html
>>
>>
>>
>>> ________________________________
>>> From: Dan McGinn-Combs <dgco...@gmail.com>
>>> To: solr-user@lucene.apache.org
>>> Sent: Friday, December 16, 2011 4:33 PM
>>> Subject: Retrieving Documents
>>>
>>> I've been doing a fair amount of reading and experimenting with Solr
>>> lately. I find that it does a good job of indexing very structured
>>> documents. However, the application I have in mind is build around
>>> long EPUB documents.
>>>
>>> Of course, I found the Extract components useful for indexing the
>>> EPUBs. However, I would like to be able to
>>>
>>> * Size the "highlight" portion of text around the query parameters
>>> (i.e. show 20 or 30 words) and
>>>
>>> * Retrieve a location within the document so I can display that "page"
>>> from the EPUB.
>>>
>>> What is common practice for these? I notice that if I have a list of
>>> (short) text segments in fields, they are stored without too much fuss
>>> and are retrievable. However, I'm talking about a field of potentially
>>> hundreds of words.
>>>
>>> Thanks for any pointers,
>>> Dan
>>>
>>> --
>>> Dan McGinn-Combs
>>> dgco...@gmail.com
>>> Peachtree City, Georgia USA
>>>
>>>
>
>
>

Re: Retrieving Documents

Reply via email to