Case B -- I believe the more inbound anchor text, the better the match.
Right now I'm also boosting the documents by calling
setBoost( log( numInboundLinks+1 ) + 1 )
which seems to be quite effective; is there some sort of guidebook for this?
I'm also interested in figuring out how to rank the
Could you get what you need combining the BoostingTermQuery with a
SpanNearQuery to produce a score? Just guessing here..
At some point, I would like to see more Query classes around the
payload stuff, so please submit patches/feedback if and when you get
a solution
On Jun 27, 2007, at 1
You cannot do it because TermPositions is read in the
PhraseWeight.scorer(IndexReader) method (or MultiPhraseWeight) and
loaded into an array which is passed to PhraseScorer. Extend the Weight
as well and pass the payload to the Scorer as well is a possibility.
- Mark
Peter Keegan wrote:
I'm
Well, to quote the great wise one, "that depends". The reason I'm
being flippant here is because what it depends on is what you want
the result to be.
I'm asking for a use-case scenario here. Something like
"I want the docs to score equally no matter how many
links with 'United States' exist in t
I am in need of some help with the following problem. I have a single
index that I am currently searching against, but it has the property
that a small set of the documents get updated frequently while a large
majority of them are very static and are rarely updated. Documents can
move from be
Hi,
I'm trying to index some fairly standard html documents. For each of the
documents, there is a unique (which I believe is generally of
high quality), some content, and some anchor text from the
linking documents (which is of good but more variable quality).
I'm indexing them in "title"
: (AFACT, however, their approach does not address multi-word synonyms).
: Although a query-time analyzer is not directly discussed, they do say
Solr's has a SynonymFilter that does handle multi-word synonyms, and it
can handle query-time synonyms, but there are some caveats to both of
those use
I have not looked at any highlighting code yet. Is there already an extension
of PhraseQuery that has getSpans() ?
Currently I am using this code originally by M. Harwood:
Term[] phraseQueryTerms = ((PhraseQuery) query).getTerms();
int i;
SpanQuery[] clauses
On Jun 27, 2007, at 8:51 AM, Samuel LEMOINE wrote:
Hi everyone !
I'm working on bibliographical researches on Lucene as an intern in
Lingway (which uses Lucene in its main product), and I'm currently
studying Lucene's file system.
There are several things I don't catch in Lucene's file sys
Please take the time, before asking others "what's going on" to at
least format your mail so we can tell what's what. For instance,
what's a field and what's a value in what you sent? I sure can't
tell because there are so many colons. Remember that you're
asking people to contribute time to solve
On Wednesday 27 June 2007 17:17, mark harwood wrote:
> >>you would still have the major problem of which matches do you keep
information for
>
> Yes, doing this efficiently is the main issue. Some vague thoughts I had:
>...
> 3) For each call to scorer.next() on the top level query, the
Highligh
Hi,
>Have you used Luke to examine your index and try queries? This will tell you a
>LOT about what's *really* happening.
>Google 'lucene' 'luke' and try it.
I've tried Luke but still have no clue what is going on:
I have the following entry:
2007-06-26T10:56:20-05:00 globus-gatekeeper:
>>you would still have the major problem of which matches do you keep
>>information for
Yes, doing this efficiently is the main issue. Some vague thoughts I had:
1) A special HighlightObserverQuery could wrap any query and use it's rewrite
method to further wrap child component queries if necess
Hi Aliaksandr,
Aliaksandr Radzivanovich wrote:
> What if I need to search for synonyms, but synonyms can be expanded to
> phrases of several words?
> For example, user enters query "tcp", then my application should also
> find documents containing phrase "Transmission Control Protocol". And
> conv
The synonym analyzer shown in Lucene In Action is a good place
to start. You need to change *all* occurrences of one form into
another, both an index and search time to get consistent results.
There are some "interesting" implications for this, though, but they
only really need to be considered i
I'm looking at the new Payload api and would like to use it in the following
manner. Meta-data is indexed as a special phrase (all terms at same
position) and a payload is stored with the first term of each phrase. I
would like to create a custom query class that extends PhraseQuery and uses
its P
What if I need to search for synonyms, but synonyms can be expanded to
phrases of several words?
For example, user enters query "tcp", then my application should also
find documents containing phrase "Transmission Control Protocol". And
conversely, user enters "Transmission Control Protocol", then
Hi,
I don't know how to access the CA certificate for the web server at
javacc.dev.java.net - my browser automatically does this for me.
Here's an alternate route - I found another javacc-4.0.zip at another
location, and the file I downloaded from there yesterday matched exactly
the version I got
Depending on what these guys are doing, here is another possibility if
TermOffests and Ronnie's highlighter are not an option.
If you are highlighting whole documents (NullFragmenter) or are not very
concerned about the fragments you get back, you can change the line in
the Highlighter at abou
Hi everyone !
I'm working on bibliographical researches on Lucene as an intern in
Lingway (which uses Lucene in its main product), and I'm currently
studying Lucene's file system.
There are several things I don't catch in Lucene's file system, and I
thought here was the right place to ask abou
markharw00d wrote:
I was thinking along the lines of wrapping some core classes such as
IndexReader to somehow observe the query matching process and deduce
from that what to highlight (avoiding the need for MemoryIndex) but
I'm not sure that is viable. It would be nice to get some more ma
Hi karl,
we did something like hibernate to map an object (Entity) with lucene by
defining a bunch of annotations just like the Limax project (as far as I
know it is led by you),
the only problem we had was how to make relationship between two or more
separate indexes. I managed to resolve it but
In effect, IndexWriter's updateDocument() will first delete the document
containing specific term, then add the document. It just wrap delete&add
as a thread safe method.
Andy
-Original Message-
From: Doron Cohen [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 27, 2007 3:58 PM
To: java-u
WATHELET Thomas wrote:
> Is-it possible to update a document's field without deleting the
> document and add it again into the index?
Not really... see the FAQ, especially "How do I update a document or a set
of documents that are already indexed?", and also see javadocs for
IndexWriter's updateD
Perhaps it is not possible if you have written the document to index.
Andy
-Original Message-
From: WATHELET Thomas [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 27, 2007 3:46 PM
To: java-user@lucene.apache.org
Subject: Update documents
High,
Is-it possible to update a document's fiel
High,
Is-it possible to update a document's field without deleting the
document and add it again into the index?
How can I access to Certificate of this site?
Steven Rowe wrote:
>
> I don't think you need to register - I am not registered and I can
> download from there.
>
> My guess is that Mahdi Rahimi's browser doesn't know how to speak the
> HTTPS protocol.
>
> Here's an invocation of wget (I have
27 matches
Mail list logo