You could also try splitting the document into paragraphs and use Carrot2's
Lingo algorithm (www.carrot2.org) on a paragraph-level to extract clusters.
Labelling routine in Lingo should extract 'key' phrases; this analysis is
heavily frequency-based, but... you know, you may want to try it.
How do you specify cutoff on search results? If I want to sort the
search result, on other than relevancy, I don't want non related stuff
showing up at the top. Is there way to set a cutoff, so only result
that falls between certain range are displayed?
Thanks.
Looking over the implementation of SpanNearQuery I came upon what looked
like a bug. Below is a test which fails due to it. SpanNearQuery doesn't
return all matching spans; once it's found a span it always increments the
span of the clause appearing first in that span (ie. in the example below
Well, falls between a certain range is problematical. There's
nothing hard and fast about scoring. That is, scores between, say,
two different queries are not comparable.
But I really don't understand the question. You won't get
unrelated stuff in your result set as far as I know. Everything
has
Moti,
I tried your test and it fails in the way you describe, however, I don't think
the test shows a bug.
Below is the javadoc comment for the package private class NearSpansOrdered.
Would that be sufficient documentation for the ordered case?
/** A Spans that is formed from the ordered
Here's what is going wrong for me :
I have 10 documents, each with 10 fields with parameterName and
parameterValue. Now, When i search for some term and I get 5 hits, how do I
find out which paramName-Value pair matched ? Very simple a problem, but I
could find no information on the forum for
Hello everyone,
Whenever I search a word in my web application, I search in some default
fields,
e.g. I search the word hello, I generate these queries :
title:hello
headlines:hello
summary:hello
content:hello
Which I add in a BooleanQuery (BooleanClause.Occur.SHOULD)
What I want to achieve
On 5/6/07, Erick Erickson [EMAIL PROTECTED] wrote:
On 5/5/07, Daniel Einspanjer [EMAIL PROTECTED] wrote:
The query syntax reference page talks about the NOT and the - operators,
but
it wasn't clear to me what exactly the difference is between
them. Could
someone tell me briefly what that
On Monday 07 May 2007 06:19:47 makkhar wrote:
Here's what is going wrong for me :
I have 10 documents, each with 10 fields with parameterName and
parameterValue. Now, When i search for some term and I get 5 hits, how do
I find out which paramName-Value pair matched ? Very simple a problem,
Hi Mark,
Do you know of a good paid product that does this?
Thanks,
Arsen
- Original Message
From: Mark Miller [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Wednesday, May 2, 2007 7:52:36 AM
Subject: Re: Keyphrase Extraction
From what I know you generally have to pay if
Erick,
Thanks for the advice. I will take a look at
PerFieldAnalyzerWrapper to see if I want to take this on. For my
case, I have to use mexed case for a couple of fields since case
really does matter for them (ie apple is not the same as Apple), and I
actually don't want users to find the
Arsen,
I already mentioned it (see below) - LingPipe - http://alias-i.com .
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/ - Tag - Search - Share
- Original Message
From: [EMAIL PROTECTED] [EMAIL PROTECTED]
To:
Well, the approach you suggested is what we use now. We regex use pattern
matching to find the search term out. However, due to this we cannot use
some of the very sophisticated queries which lucene supports (like boolean
query etc). We sure can use highlighting to find out this information. But
13 matches
Mail list logo