Re: indexing anchor text

2007-06-27 Thread Tim Sturge
Case B -- I believe the more inbound anchor text, the better the match. Right now I'm also boosting the documents by calling setBoost( log( numInboundLinks+1 ) + 1 ) which seems to be quite effective; is there some sort of guidebook for this? I'm also interested in figuring out how to rank the

Re: Payloads and PhraseQuery

2007-06-27 Thread Grant Ingersoll
Could you get what you need combining the BoostingTermQuery with a SpanNearQuery to produce a score? Just guessing here.. At some point, I would like to see more Query classes around the payload stuff, so please submit patches/feedback if and when you get a solution On Jun 27, 2007, at 1

Re: Payloads and PhraseQuery

2007-06-27 Thread Mark Miller
You cannot do it because TermPositions is read in the PhraseWeight.scorer(IndexReader) method (or MultiPhraseWeight) and loaded into an array which is passed to PhraseScorer. Extend the Weight as well and pass the payload to the Scorer as well is a possibility. - Mark Peter Keegan wrote: I'm

Re: indexing anchor text

2007-06-27 Thread Erick Erickson
Well, to quote the great wise one, "that depends". The reason I'm being flippant here is because what it depends on is what you want the result to be. I'm asking for a use-case scenario here. Something like "I want the docs to score equally no matter how many links with 'United States' exist in t

breaking a single index in to two indexes

2007-06-27 Thread Les Fletcher
I am in need of some help with the following problem. I have a single index that I am currently searching against, but it has the property that a small set of the documents get updated frequently while a large majority of them are very static and are rarely updated. Documents can move from be

indexing anchor text

2007-06-27 Thread Tim Sturge
Hi, I'm trying to index some fairly standard html documents. For each of the documents, there is a unique (which I believe is generally of high quality), some content, and some anchor text from the linking documents (which is of good but more variable quality). I'm indexing them in "title"

Re: Rewrite one phrase to another in search query

2007-06-27 Thread Chris Hostetter
: (AFACT, however, their approach does not address multi-word synonyms). : Although a query-time analyzer is not directly discussed, they do say Solr's has a SynonymFilter that does handle multi-word synonyms, and it can handle query-time synonyms, but there are some caveats to both of those use

Re: Highlighter that works with phrase and span queries

2007-06-27 Thread Mark Miller
I have not looked at any highlighting code yet. Is there already an extension of PhraseQuery that has getSpans() ? Currently I am using this code originally by M. Harwood: Term[] phraseQueryTerms = ((PhraseQuery) query).getTerms(); int i; SpanQuery[] clauses

Re: several existential issues about Lucene's filesystem

2007-06-27 Thread Grant Ingersoll
On Jun 27, 2007, at 8:51 AM, Samuel LEMOINE wrote: Hi everyone ! I'm working on bibliographical researches on Lucene as an intern in Lingway (which uses Lucene in its main product), and I'm currently studying Lucene's file system. There are several things I don't catch in Lucene's file sys

Re: Question about search

2007-06-27 Thread Erick Erickson
Please take the time, before asking others "what's going on" to at least format your mail so we can tell what's what. For instance, what's a field and what's a value in what you sent? I sure can't tell because there are so many colons. Remember that you're asking people to contribute time to solve

Re: Highlighter that works with phrase and span queries

2007-06-27 Thread Paul Elschot
On Wednesday 27 June 2007 17:17, mark harwood wrote: > >>you would still have the major problem of which matches do you keep information for > > Yes, doing this efficiently is the main issue. Some vague thoughts I had: >... > 3) For each call to scorer.next() on the top level query, the Highligh

Re: Question about search

2007-06-27 Thread tanya
Hi, >Have you used Luke to examine your index and try queries? This will tell you a >LOT about what's *really* happening. >Google 'lucene' 'luke' and try it. I've tried Luke but still have no clue what is going on: I have the following entry: 2007-06-26T10:56:20-05:00 globus-gatekeeper:

Re: Highlighter that works with phrase and span queries

2007-06-27 Thread mark harwood
>>you would still have the major problem of which matches do you keep >>information for Yes, doing this efficiently is the main issue. Some vague thoughts I had: 1) A special HighlightObserverQuery could wrap any query and use it's rewrite method to further wrap child component queries if necess

Re: Rewrite one phrase to another in search query

2007-06-27 Thread Steven Rowe
Hi Aliaksandr, Aliaksandr Radzivanovich wrote: > What if I need to search for synonyms, but synonyms can be expanded to > phrases of several words? > For example, user enters query "tcp", then my application should also > find documents containing phrase "Transmission Control Protocol". And > conv

Re: Rewrite one phrase to another in search query

2007-06-27 Thread Erick Erickson
The synonym analyzer shown in Lucene In Action is a good place to start. You need to change *all* occurrences of one form into another, both an index and search time to get consistent results. There are some "interesting" implications for this, though, but they only really need to be considered i

Payloads and PhraseQuery

2007-06-27 Thread Peter Keegan
I'm looking at the new Payload api and would like to use it in the following manner. Meta-data is indexed as a special phrase (all terms at same position) and a payload is stored with the first term of each phrase. I would like to create a custom query class that extends PhraseQuery and uses its P

Rewrite one phrase to another in search query

2007-06-27 Thread Aliaksandr Radzivanovich
What if I need to search for synonyms, but synonyms can be expanded to phrases of several words? For example, user enters query "tcp", then my application should also find documents containing phrase "Transmission Control Protocol". And conversely, user enters "Transmission Control Protocol", then

Re: JavaCC Download

2007-06-27 Thread Steven Rowe
Hi, I don't know how to access the CA certificate for the web server at javacc.dev.java.net - my browser automatically does this for me. Here's an alternate route - I found another javacc-4.0.zip at another location, and the file I downloaded from there yesterday matched exactly the version I got

Re: Highlighter that works with phrase and span queries

2007-06-27 Thread Mark Miller
Depending on what these guys are doing, here is another possibility if TermOffests and Ronnie's highlighter are not an option. If you are highlighting whole documents (NullFragmenter) or are not very concerned about the fragments you get back, you can change the line in the Highlighter at abou

several existential issues about Lucene's filesystem

2007-06-27 Thread Samuel LEMOINE
Hi everyone ! I'm working on bibliographical researches on Lucene as an intern in Lingway (which uses Lucene in its main product), and I'm currently studying Lucene's file system. There are several things I don't catch in Lucene's file system, and I thought here was the right place to ask abou

Re: Highlighter that works with phrase and span queries

2007-06-27 Thread Mark Miller
markharw00d wrote: I was thinking along the lines of wrapping some core classes such as IndexReader to somehow observe the query matching process and deduce from that what to highlight (avoiding the need for MemoryIndex) but I'm not sure that is viable. It would be nice to get some more ma

Re: Lucene as primary object storage

2007-06-27 Thread Mohammad Norouzi
Hi karl, we did something like hibernate to map an object (Entity) with lucene by defining a bunch of annotations just like the Limax project (as far as I know it is led by you), the only problem we had was how to make relationship between two or more separate indexes. I managed to resolve it but

RE: Update documents

2007-06-27 Thread Liu_Andy2
In effect, IndexWriter's updateDocument() will first delete the document containing specific term, then add the document. It just wrap delete&add as a thread safe method. Andy -Original Message- From: Doron Cohen [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 27, 2007 3:58 PM To: java-u

Re: Update documents

2007-06-27 Thread Doron Cohen
WATHELET Thomas wrote: > Is-it possible to update a document's field without deleting the > document and add it again into the index? Not really... see the FAQ, especially "How do I update a document or a set of documents that are already indexed?", and also see javadocs for IndexWriter's updateD

RE: Update documents

2007-06-27 Thread Liu_Andy2
Perhaps it is not possible if you have written the document to index. Andy -Original Message- From: WATHELET Thomas [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 27, 2007 3:46 PM To: java-user@lucene.apache.org Subject: Update documents High, Is-it possible to update a document's fiel

Update documents

2007-06-27 Thread WATHELET Thomas
High, Is-it possible to update a document's field without deleting the document and add it again into the index?

Re: JavaCC Download

2007-06-27 Thread Mahdi Rahimi
How can I access to Certificate of this site? Steven Rowe wrote: > > I don't think you need to register - I am not registered and I can > download from there. > > My guess is that Mahdi Rahimi's browser doesn't know how to speak the > HTTPS protocol. > > Here's an invocation of wget (I have