Question on method visibility in Highlighter - WeightedSpanTermExtractor class

2012-10-17 Thread Dawn Zoë Raison
Hi folks, Is there a reason why the setMaxDocCharsToAnalyze() method of WeightedSpanTermExtractor() is protected? The class is a perfect fit for my requirement (enumerating the list of terms present in a document that match the current query for subsequent highlighting in a PDF file) with th

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-24 Thread Dawn Zoë Raison
Did you consider using shingles? It solves the "to be or not to be" problem quite nicely. Dawn On 24/07/2013 12:34, Ankit Murarka wrote: I tried using Phrase Query with slops. Now since I am specifying the slop I also need to specify the 2nd term. In my case the 2nd term is not present. The w

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-19 Thread Dawn Zoë Raison
On 18/01/2011 21:04, Grant Ingersoll wrote: [X] ASF Mirrors (linked in our release announcements or via the Lucene website) [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [] I/we build them from source via an SVN/Git checkout. [] Other (someone in your company mirrors them

Re: Highlight Wildcard Queries

2011-01-26 Thread Dawn Zoë Raison
Removing redundant calls to rewrite was the key when I had this issue moving from 2.3.x to 3.0.x... Dawn On 25/01/2011 20:04, Uwe Schindler wrote: And: you don't need to rewrite queries before highlighting, highlighter does this automatically internally if needed. - Uwe Schindler

Re: Fwd: Lucene Problems

2011-02-03 Thread Dawn Zoë Raison
We use the contrib package 'Highlighter' to do exactly that on our PDF newspaper website. Dawn On 03/02/2011 17:31, Gong Li wrote: Hi, I am developing an advanced pdf search engine in java by using pdfbox and lucene. And I must display the context of each keyword in the user interface, but i

Grouping...

2011-03-22 Thread Dawn Zoë Raison
Hi Folks, Before I run off and reinvent the wheel here - has anyone done any form of result grouping with lucene? My use case looks something like this: Newspaper pages are stored as documents in the lucene index. I need to list the newpapers that match my criteria in date order, so that I ca

Re: Grouping...

2011-03-25 Thread Dawn Zoë Raison
On 23/03/2011 17:55, Grant Ingersoll wrote: Have you looked at Solr and date faceting capabilities? Also, it has result grouping, but I think you are just describing faceting/filtering. SOLR is not an option, we are already have the index (>2 million pages some with 100,000 terms). What I'

Re: PDF Highlighting using PDF Highlight File

2011-05-12 Thread Dawn Zoë Raison
On 12/05/2011 15:47, Wulf Berschin wrote: I think support for highlighting documents would be a very welcome feature. Highlighting HTML documents is already possible with the org.apache.solr.analysis.HTMLStripCharFilter and a NullFragmenter, but ther seems to be nothing for highlighting PDF fi

Re: Strange StopFilter and stop words behaviour

2011-07-26 Thread Dawn Zoë Raison
Are you using QueryAnalyser...? If so remember that NOT is a reserved word. Dawn On 26/07/2011 04:25, SBS wrote: If I enter a query of just the word "not" I get no matches. If I run a query with just the word "included" I get lots of matches. If I run the query "not included" (without surroun

Analysers for newspaper pages...

2011-11-28 Thread Dawn Zoë Raison
Hi folks, I'm researching the best options to use for analysing/storing newspaper pages in out online archive, and wondered if anyone has any good hints or tips on good practice for this type of media? I'm currently thinking alone the lines of using a customised StandardAnalyser (no stop wor

Re: Analysers for newspaper pages...

2011-11-28 Thread Dawn Zoë Raison
Hi Steve, On 28/11/2011 19:43, Steven A Rowe wrote: I assume that when you refer to "the impact of stop words," you're concerned about query-time performance? You should consider the possibility that performance without removing stop words is good enough that you won't have to take any steps

Highlighter and Shingles...

2012-04-20 Thread Dawn Zoë Raison
Hi, Are there any notes on making the highlighter work consistently with a shingle generated index? I have a situation where complete matches highlight OK, but partial matches do not - leading to a number of blank previews... Our analyser look like: TokenStream result =

Re: Highlighter and Shingles...

2012-04-21 Thread Dawn Zoë Raison
Steve, Exactly the right question... Prompted by your question, further investigation reveals that I need to move the "access" part of my lucene query into a filter to prevent non-matching documents getting scored. In that situation of course the highlighter finds nothing to highlight - that'