Re: Term Weights and Clustering

2005-02-24 Thread Dawid Weiss
Hi Owen, I'm from the Carrot2 project, so I feel called to the blackboard: One source for how to do this is the thesis of Stanislaw Osinski and others like it: http://www.dcs.shef.ac.uk/teaching/eproj/msc2004/abs/m3so.htm And the Carrot2 project which uses similar techniques. http://www.cs

Re: Term Weights and Clustering

2005-02-23 Thread David Spencer
is I found that carrot2 tends to not scale beyond 200 or so docs, though this probably depends on length of docs & the # of different tokens. I was able to use the above to integ w/ a lucene search results page in just an hour or so. Owen Densmore wrote: I'm building a TDM (Term D

Term Weights and Clustering

2005-02-23 Thread Owen Densmore
I'm building a TDM (Term Document Matrix) from my lucene index. As part of this, it would be useful to have the document term weights (the TF*IDF-weight) if they are already available. Naturally I can compute them, but I suspect they are lurking behind an API I've not discovered

Re: AW: *term

2005-02-02 Thread Brisbart Franck
Trying searching in the mail archives: "*term search" It works for me with the double quotes Franck Gast, Thorsten IZ/HZA-IOL wrote: Hi, same for me. It was not able to search the lucene archives. It said: "Text search not available for this list" Any hints?? -- Fran

AW: *term

2005-02-02 Thread Gast, Thorsten IZ/HZA-IOL
st Betreff: AW: *term Hi, I was not able to find anything. Does anybody have a link? --Tim > -Ursprüngliche Nachricht- > Von: sergiu gordea [mailto:[EMAIL PROTECTED] > Gesendet: Mittwoch, 2. Februar 2005 15:04 > An: Lucene Users List > Betreff: Re: *term > > >

AW: *term

2005-02-02 Thread Tim Lebedkov \(UPK\)
Hi, I was not able to find anything. Does anybody have a link? --Tim > -Ursprüngliche Nachricht- > Von: sergiu gordea [mailto:[EMAIL PROTECTED] > Gesendet: Mittwoch, 2. Februar 2005 15:04 > An: Lucene Users List > Betreff: Re: *term > > > Tim Lebedkov (UPK) wr

Re: *term

2005-02-02 Thread sergiu gordea
Tim Lebedkov (UPK) wrote: Hi, is there a way to make QueryParser accept *term? yes, if you apply a patch the lucene sources. Search for "*term search" in lucene archive. Best, Sergiu thank you --Tim - To unsubscri

*term

2005-02-02 Thread Tim Lebedkov \(UPK\)
Hi, is there a way to make QueryParser accept *term? thank you --Tim - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: query term frequency

2005-01-28 Thread markharw00d
This from the highlighter package will give you the IDF : WeightedTerm[] QueryTermExtractor.getIdfWeightedTerms(Query query, IndexReader reader, String fieldName) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional comman

Re: query term frequency

2005-01-28 Thread Grant Ingersoll
o wrote: > No, the number of occurrences of a term in a Query. Nothing built-in gives you this. You'd have to dissect the Query clause-by-clause and cast each clause to the proper type to pull the terms from them. The Highlighter code does this. If there is a better way, I

Re: query term frequency

2005-01-28 Thread Erik Hatcher
On Jan 27, 2005, at 10:24 PM, Jonathan Lasko wrote: No, the number of occurrences of a term in a Query. Nothing built-in gives you this. You'd have to dissect the Query clause-by-clause and cast each clause to the proper type to pull the terms from them. The Highlighter code does this

Re: query term frequency

2005-01-27 Thread Jonathan Lasko
No, the number of occurrences of a term in a Query. Jonathan Quoting David Spencer <[EMAIL PROTECTED]>: > Jonathan Lasko wrote: > > > What do I call to get the term frequencies for terms in the Query? I > > can't seem to find it in the Javadoc... > > Do

Re: LuceneReader.delete (term t) Failure ?

2005-01-27 Thread Erik Hatcher
? thanks atul PS we worked together for Darden project From: Erik Hatcher <[EMAIL PROTECTED]> Date: 2005/01/27 Thu PM 07:46:40 EST To: "Lucene Users List" Subject: Re: LuceneReader.delete (term t) Failure ? How did you index the "uid" field? Field.Keyword? If not, that

Re: Re: LuceneReader.delete (term t) Failure ?

2005-01-27 Thread akedar
> To: "Lucene Users List" > Subject: Re: LuceneReader.delete (term t) Failure ? > > How did you index the "uid" field? Field.Keyword? If not, that may be > the problem in that the field was analyzed. For a key field like this, > it needs to be unanalyzed/untoken

Re: LuceneReader.delete (term t) Failure ?

2005-01-27 Thread Erik Hatcher
a document from Lucene index using: Term aTerm = new Term( "uid", path ); aReader.delete( aTerm ); aReader.close(); If the variable path="xxx/foo.txt" then I am able to delete the document. However, if path variable has "-" in the string, the dele

LuceneReader.delete (term t) Failure ?

2005-01-27 Thread akedar
Hi, I am trying to delete a document from Lucene index using: Term aTerm = new Term( "uid", path ); aReader.delete( aTerm ); aReader.close(); If the variable path="xxx/foo.txt" then I am able to delete the document. However, if path variable has "-&q

Re: query term frequency

2005-01-27 Thread David Spencer
Jonathan Lasko wrote: What do I call to get the term frequencies for terms in the Query? I can't seem to find it in the Javadoc... Do you mean the # of docs that have a term? http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#docFreq(org.apache.lucene.index

query term frequency

2005-01-27 Thread Jonathan Lasko
What do I call to get the term frequencies for terms in the Query? I can't seem to find it in the Javadoc... Thanks. Jonathan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: *term search

2004-10-07 Thread sergiu gordea
[EMAIL PROTECTED] wrote: .. and here is the way to do it: (See attached file: SUPPOR~1.RAR) Hi all, I got from iouli the solution to enable prefix queries (*term). In fact you can find the solution in lucene source, in QueryParser.jj is said in a comment how to enable prefix queries. I did

Re: accessing Term Vector info

2004-10-04 Thread Grant Ingersoll
See IndexReader#getTermFreqVector() in the javadocs >>> [EMAIL PROTECTED] 10/4/2004 10:29:30 AM >>> hi all i am indexing documents consisting of fields for a database id, and text the text field is created as new Field("FULL_TEXT",text, false,true, true, true)

accessing Term Vector info

2004-10-04 Thread Rupinder Singh Mazara
hi all i am indexing documents consisting of fields for a database id, and text the text field is created as new Field("FULL_TEXT",text, false,true, true, true) in order to store the Term Vector info, how do I access this ? regards

Re: Term highlighting and Term vector patch

2004-09-17 Thread Otis Gospodnetic
Sorry about the 'bad' link (my bookmark index can't be searched by those without Simpy accounts). So you can see 'similar' and Term Vectors in action here: http://www.simpy.com/simpy/Authenticate.do?username=demo&password=demo&rememberme=-1 If you then search

Re: Term highlighting and Term vector patch

2004-09-17 Thread Grant Ingersoll
ww.simpy.com/simpy/Search.do?op=user&username=otis&q=lucene You will see 'similar' link next to each search result item. Finding similar web pages can be implemented using term vectors. Otis --- Terry Steichen <[EMAIL PROTECTED]> wrote: > Christoph, > > Just

Re: Term highlighting and Term vector patch

2004-09-16 Thread Otis Gospodnetic
If you look at this: http://www.simpy.com/simpy/Search.do?op=user&username=otis&q=lucene You will see 'similar' link next to each search result item. Finding similar web pages can be implemented using term vectors. Otis --- Terry Steichen <[EMAIL PROTECTED]> wrote:

Re: Term highlighting and Term vector patch

2004-09-16 Thread Terry Steichen
Christoph, Just curious - how are you currently using Term Vectors? They seem to be a neat feature with lots of future promise, but I'm not sure how to best use them now. Regards, Terry - Original Message - From: Christoph Goller To: Lucene Developers List Sent: Thu

Re: *term search

2004-09-08 Thread iouli . golovatyi
olovatyi/X/GP/Novartis) 08.09.2004 12:46 Subject: Re: *term search Please respond to "Lucen

Re: *term search

2004-09-08 Thread Morus Walter
sergiu gordea writes: > > > Hi all, > > I want to discuss a little problem, lucene doesn't support *Term like > queries. > I know that this can bring a lot of results in the memory and therefore > it is restricted. > That's not the reason for the rest

Re: *term search

2004-09-08 Thread Erik Hatcher
On Sep 8, 2004, at 6:26 AM, sergiu gordea wrote: I want to discuss a little problem, lucene doesn't support *Term like queries. First of all, this is untrue. WildcardQuery itself most definitely supports wildcards at the beginning. I would like to use "*schreiben". The

*term search

2004-09-08 Thread sergiu gordea
Hi all, I want to discuss a little problem, lucene doesn't support *Term like queries. I know that this can bring a lot of results in the memory and therefore it is restricted. I think that allowing this kind of search and limiting the amount of returned results would be a more us

Re: term frequency data of terms of all documents

2004-08-24 Thread Bernhard Messer
Serkan, it's easier using the IndexReader class to get the information you need. If you just need the doc frequency of each term you could use the sample. IndexReader ir = null; try { if (!IndexReader.indexExists("tmp/index")) return

term frequency data of terms of all documents

2004-08-24 Thread Serkan Oktar
I want to build a list of terms of all documents and their frequency data. It seems the information I need is in "tis" and "tii" files. However I havent't found a way to handle them till now. How can I get the term frequency data? Thanks , Serkan

Limiting Term Queries

2004-07-20 Thread Shawn Konopinsky
Is it possible to limit a term query? For example: I am indexing documents with (amongst other things) a string in one field and with a number in another field. All combinations of strings and numbers are allowed and neither field is unique. I would like a way to query Lucene to pull out all

Is it possible to delete a term?

2004-07-11 Thread clibois
Hello, I am still working on my categorizing tools and I am implementing a dimensional reduction. I would like to reduce my index to a subset of his terms. So i was asking me if it's possible to delete not a document but a term? Maybe is there any other solution to reduce my number of

Re: Searching for asterisk in a term

2004-07-07 Thread Erik Hatcher
On Jul 7, 2004, at 3:41 PM, [EMAIL PROTECTED] wrote: Can you recommend an analyzer that doesn't discard '*' or '/'? WhitespaceAnalyzer :) Check the wiki AnalysisParalysis page also. Erik - To unsubscribe, e-mail: [EMAIL PRO

Re: Searching for asterisk in a term

2004-07-07 Thread yahootintin . 1247688
dAnalyzer, for > instance, will discard it. Check one of Erik Hatcher's articles that > includes a tool that helps you see what your Analyzer does with the any > given text input. You can also use Luke to see what your index > contains. > > Otis > > --- [

Re: Searching for asterisk in a term

2004-07-07 Thread Otis Gospodnetic
h the any given text input. You can also use Luke to see what your index contains. Otis --- [EMAIL PROTECTED] wrote: > Hi, > > I'm trying to search for a term that contains an asterisk. > > This > is the field that I indexed: > - new Field("testField", "

Searching for asterisk in a term

2004-07-07 Thread yahootintin . 1247688
Hi, I'm trying to search for a term that contains an asterisk. This is the field that I indexed: - new Field("testField", "Hello *foo bar", true, true, true); I'm trying to find this document by matching '*foo': - new TermQuery(new Term(&qu

unlimited wildcard term expansion

2004-06-30 Thread John Z
file that I created earlier. I read out each term from the file and create a TermQuery, then get the scorer object from this TermQuery and collect the score for it. Then the bucketTable will do collectHits of everything. I have tested out my changes with small indexes with about 2 terms in

[bug?] term frequency and empty content

2004-06-26 Thread Stefan Groschupf
Hi, I notice some thing strange: (1.4-rc4) Until I add a empty text to my index: where text is "" or null; IndexWriter indexWriter = getIndexWriter(); document.add(Field.Text(Corpus.TEXT, text, true)); indexWriter.addDocument(document); I see this in std.out: "No tvx file" Furthermore IndexReade

term frequence in hits

2004-06-26 Thread Stefan Groschupf
Hi, another question, but first many thanks for the last hint, the new term frequency functionality of lucene is just GREAT! ;) I have index a set of documents with different meta data, Language = DE or Language = EN. Now i wish to get Term frequencies for DE and EN. The easiest solution would

Re: term vector

2004-06-23 Thread Erik Hatcher
On Jun 23, 2004, at 7:57 PM, Stefan Groschupf wrote: Is there a best practice to get the term vector of an document? You get term vectors by "field", not document. As for best practice, there is really only one way to go about getting the term vectors: TermFreqVector term

term vector

2004-06-23 Thread Stefan Groschupf
Hi, sorry, a stupid question, Is there a best practice to get the term vector of an document? Is there any experience to do any kind of feature selection for dimension reducing like zipf laws or getting tf/idf of a term for the complete corpora. Thanks for any hints. Stefan

Re: Q: how to parse query and add boost a term

2004-06-15 Thread Erik Hatcher
On Jun 15, 2004, at 5:04 AM, Zilverline wrote: Hi, How can I use Lucene's API best to parse a query and then boost the query with the 'title' given the relevant parts of the query (the default Field)? To give you an example: I want to change: 'java "method invocation" +type:HTML' into

Q: how to parse query and add boost a term

2004-06-15 Thread Zilverline
Hi, How can I use Lucene's API best to parse a query and then boost the query with the 'title' given the relevant parts of the query (the default Field)? To give you an example: I want to change: 'java "method invocation" +type:HTML' into '+contents:java +contents:"method invocation" +

RE: a list of matching search term

2004-06-02 Thread Anson Lau
Thanks Erik I'll give that a try. Anson -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 02, 2004 7:28 PM To: Lucene Users List Subject: Re: a list of matching search term On Jun 1, 2004, at 9:19 PM, Anson Lau wrote: > Further to my previo

Re: a list of matching search term

2004-06-02 Thread Erik Hatcher
On Jun 1, 2004, at 9:19 PM, Anson Lau wrote: Further to my previous email: The highlighter package should be able to pick up the matching search terms. Can some experienced highlighter package users tell me if I should look down that line? Yes, Highlighter (available in the sandbox) picks out mat

RE: a list of matching search term

2004-06-01 Thread Anson Lau
: Tuesday, June 01, 2004 5:20 PM To: 'Lucene Users List' Subject: a list of matching search term Hi All, Eg. Lets say someone do a search on the terms 'apple orange banana'. In the search results, is it possible to find out for each hit, which of those terms did match? Ie. T

a list of matching search term

2004-06-01 Thread Anson Lau
Hi All, Eg. Lets say someone do a search on the terms 'apple orange banana'. In the search results, is it possible to find out for each hit, which of those terms did match? Ie. The document with the highest score has all three words so the matching terms are all of those words. A lesser documen

Re: Native Hits or Term Highlighting

2004-05-16 Thread Otis Gospodnetic
/highlighter/ Otis --- Alfred Ostermeier <[EMAIL PROTECTED]> wrote: > Hello, > > the change-log of lucene says: "Also added Term Highlighting to Misc > Section. (carlson)" > > Sounds like native term highlighting or am i wrong? How can i > implement > th

Native Hits or Term Highlighting

2004-05-16 Thread Alfred Ostermeier
Hello, the change-log of lucene says: "Also added Term Highlighting to Misc Section. (carlson)" Sounds like native term highlighting or am i wrong? How can i implement this feature? (Where is the "Misc" section?

Re: "phrase search" AND term

2004-04-27 Thread Ioan Miftode
Thank you Doug, the latest CVS works fine. ioan At 12:23 PM 4/27/2004, you wrote: Ioan Miftode wrote: I recently upgraded to lucene 1.4 RC2 because I needed some sorting capabilities. However some phrase searches don't work anymore (the hits don't even have the term's I'm searching on). Try the

Re: "phrase search" AND term

2004-04-27 Thread Doug Cutting
Ioan Miftode wrote: I recently upgraded to lucene 1.4 RC2 because I needed some sorting capabilities. However some phrase searches don't work anymore (the hits don't even have the term's I'm searching on). Try the latest CVS. There were some bugs in 1.4RC2 that have been fixed. (We'll probably do

Re: "phrase search" AND term

2004-04-27 Thread Erik Hatcher
x27;t even have the term's I'm searching on). They were fine when using 1.3final. I noticed it happens when I combine a phrase search with a simple term like this: field1:"some phrase search" AND field2:term Has anyone experienced anything simila

"phrase search" AND term

2004-04-27 Thread Ioan Miftode
I recently upgraded to lucene 1.4 RC2 because I needed some sorting capabilities. However some phrase searches don't work anymore (the hits don't even have the term's I'm searching on). They were fine when using 1.3final. I noticed it happens when I combine a phrase searc

Index Level Term Frequency

2004-04-06 Thread Alan Smith
Hi, Is there a way to get frequency of a term within the full index. There are a few methods in IndexReader to get the frequency of a term within a document and it seems reasonable that way because the index is per document. My application, however, needs to get the frequency of all terms in a

Re: Dimitry's Term Vector Patch for lucene 1.2

2004-04-06 Thread Hannah c
I working with java 1.1 so the earlier version of lucene is more compatable. Hannah From: Otis Gospodnetic <[EMAIL PROTECTED]> Reply-To: "Lucene Users List" <[EMAIL PROTECTED]> To: Lucene Users List <[EMAIL PROTECTED]> Subject: Re: Dimitry's Term Vector Patc

Re: Dimitry's Term Vector Patch for lucene 1.2

2004-04-06 Thread Otis Gospodnetic
Get the latest Lucene version, it includes that patch. Otis --- Hannah c <[EMAIL PROTECTED]> wrote: > Hi, > > I am working with lucene 1.2 and want to try the Term Vector Patch. I > have > not been able to install the diff patch I downloaded from the mail > archives &g

Dimitry's Term Vector Patch for lucene 1.2

2004-04-06 Thread Hannah c
Hi, I am working with lucene 1.2 and want to try the Term Vector Patch. I have not been able to install the diff patch I downloaded from the mail archives as I do not have CVS. Is there anywhere I could get the full source of all the files changed or would someone be able to send me a zip of

Re: phrase search with slop seem to ignore term order

2004-04-02 Thread Ioan Miftode
Answering this myself: I just realized that SpanQuery does everything I need. ioan At 11:17 AM 4/2/2004, you wrote: Hi everybody I'm trying to do some phrase searches with slop > 0. I noticed that if you set the slop to anything higher than 0 the order of the terms does not matter anymore. EG.

phrase search with slop seem to ignore term order

2004-04-02 Thread Ioan Miftode
Hi everybody I'm trying to do some phrase searches with slop > 0. I noticed that if you set the slop to anything higher than 0 the order of the terms does not matter anymore. EG. The field is: The quick brown fox jumps over the lazy dog if I search on "fox brown" with slop = 0 the document is

Re: Performance of hit highlighting and finding term positions for

2004-04-01 Thread Kevin A. Burton
cost of JUST StandardTokenizer (no highlighting) StandardAnalyzer uses StandardTokenizer so is probably used in a lot of apps. It \ tries to keep certain text eg email addresses as one term. I can live without it and \ I suspect most apps can too. I haven't looked into why its slow but I notice it

Re: Performance of hit highlighting and finding term positions for

2004-04-01 Thread markharw00d
StandardTokenizer (no highlighting) StandardAnalyzer uses StandardTokenizer so is probably used in a lot of apps. It \ tries to keep certain text eg email addresses as one term. I can live without it and \ I suspect most apps can too. I haven't looked into why its slow but I notice it does \ make use of Vecto

Re: Performance of hit highlighting and finding term positions for

2004-03-31 Thread Doug Cutting
Kevin A. Burton wrote: Doug Cutting wrote: http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=1413989 According to these, if your documents average 16k, then a 10-hit result page would require just 66ms to generate highlights using SimpleAnalyzer. The whole search takes only 300ms... t

Re: RE : Performance of hit highlighting and finding term positions for a specific document

2004-03-31 Thread Kevin A. Burton
hives to get a background on some of the idea's we've tossed around ('Dmitry's Term Vector stuff, plus some' and 'Demoting results' come to mind as threads that touch this topic). I would be nice if CachingRewrittenQueryWrapper.java that I sent to lucene-

Re: Performance of hit highlighting and finding term positions for

2004-03-31 Thread Kevin A. Burton
Doug Cutting wrote: http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=1413989 According to these, if your documents average 16k, then a 10-hit result page would require just 66ms to generate highlights using SimpleAnalyzer. The whole search takes only 300ms... this means that if I hig

Re: RE : Performance of hit highlighting and finding term positions for a specific document

2004-03-31 Thread Kevin A. Burton
Rasik Pandey wrote: Kevin, http://home.clara.net/markharwood/lucene/highlight.htm Trying to do hit highlighting. This implementation uses another Analyzer to find the positions for the result terms. This seems that it's very inefficient since lucene already knows the frequency

Re: Performance of hit highlighting and finding term positions for

2004-03-31 Thread Doug Cutting
Doug Cutting wrote: According to these, if your documents average 16k, then a 10-hit result page would require just 66ms to generate highlights using SimpleAnalyzer. Oops. That should be 110ms. Doug - To unsubscribe, e-mail: [E

Re: Performance of hit highlighting and finding term positions for

2004-03-31 Thread Doug Cutting
[EMAIL PROTECTED] wrote: As a note of warning: I did find StandardTokenizer to be the major culprit in my tokenizing benchmarks (avg 75ms for 16k sized docs). I have found I can live without StandardTokenizer in my apps. FYI, the message with Mark's timings can be found at: http://nagoya.apache.o

RE: Performance of hit highlighting and finding term positions for a specific document

2004-03-31 Thread Jochen Frey
ady knows the > > frequency and position of given terms in the index. > > Lucene indexes record that a term is the nth term, not that it occurs at > the nth character in the text. The latter is needed for highlighting, > but storing this would make indexes much larger and slower t

Re: Performance of hit highlighting and finding term positions for

2004-03-31 Thread markharw00d
>>Folks have benchmarked this, and, for documents less than 10k characters or so, >>re-tokenizing is fast enough. As a note of warning: I did find StandardTokenizer to be the major culprit in my tokenizing benchmarks (avg 75ms for 16k sized docs). I have found I can live without StandardTokenize

Re: Performance of hit highlighting and finding term positions for a specific document

2004-03-31 Thread Doug Cutting
lest is to not scan past the first 10k or so for snippets unless nothing relevant is found in the first 10k. I don't think Mark's highlighter yet does this, but I might be mistaken. since lucene already knows the frequency and position of given terms in the index. Lucene indexes recor

RE : Performance of hit highlighting and finding term positions for a specific document

2004-03-31 Thread mark harwood
highlighter already has an abstraction from the list of terms that are needed to be highlighted - see TextHighlighter. The only change I plan here is to introduce the notion of a WeightedTerm that associates a weight with each term to be highlighted in order to influence selection of the best

RE : Performance of hit highlighting and finding term positions for a specific document

2004-03-31 Thread Rasik Pandey
Kevin, > http://home.clara.net/markharwood/lucene/highlight.htm > > Trying to do hit highlighting. This implementation uses > another > Analyzer to find the positions for the result terms. > > This seems that it's very inefficient since lucene already > knows the > frequency and position of giv

RE : Performance of hit highlighting and finding term positions for a specific document

2004-03-31 Thread Rasik Pandey
to get a background on some of the > idea's we've tossed around ('Dmitry's Term Vector stuff, plus > some' and 'Demoting results' come to > mind as threads that touch this topic). I would be nice if CachingRewrittenQueryWrapper.java that I sent to lucene-

Re: Performance of hit highlighting and finding term positions for a specific document

2004-03-30 Thread Bruce Ritchie
eady knows the frequency and position of given terms in the index. My question is whether it's hard to find a TermPosition for a given term in a given document rather than the whole index. IndexReader.termPositions( Term term ) is term specific not term and document specific. As far as

Re: Performance of hit highlighting and finding term positions for a specific document

2004-03-30 Thread Kevin A. Burton
Erik Hatcher wrote: On Mar 30, 2004, at 7:56 PM, Kevin A. Burton wrote: Trying to do hit highlighting. This implementation uses another Analyzer to find the positions for the result terms. This seems that it's very inefficient since lucene already knows the frequency and position of given term

Re: Performance of hit highlighting and finding term positions for a specific document

2004-03-30 Thread Stephane James Vaucher
terms. > > This seems that it's very inefficient since lucene already knows the > frequency and position of given terms in the index. > > My question is whether it's hard to find a TermPosition for a given term > in a given document rather than the whole index.

Re: Performance of hit highlighting and finding term positions for a specific document

2004-03-30 Thread Erik Hatcher
On Mar 30, 2004, at 7:56 PM, Kevin A. Burton wrote: Trying to do hit highlighting. This implementation uses another Analyzer to find the positions for the result terms. This seems that it's very inefficient since lucene already knows the frequency and position of given terms in the index. What i

Performance of hit highlighting and finding term positions for a specific document

2004-03-30 Thread Kevin A. Burton
quency and position of given terms in the index. My question is whether it's hard to find a TermPosition for a given term in a given document rather than the whole index. IndexReader.termPositions( Term term ) is term specific not term and document specific. Also it seems that after all this t

Re: Term Vector support

2004-03-02 Thread Grant Ingersoll
>>> [EMAIL PROTECTED] 02/27/04 12:09PM >>> Hi folks, I'm trying to get a better understanding of term vector support. Looking at lucene-dev I'm understanding that with each document you store the list of terms and their frequencies. Is this correct? What uses are ther

Term Vector support

2004-02-27 Thread Dror Matalon
Hi folks, I'm trying to get a better understanding of term vector support. Looking at lucene-dev I'm understanding that with each document you store the list of terms and their frequencies. Is this correct? What uses are there for term vector other than "more like this"? On

Re: Query Term Questions

2004-01-21 Thread Erik Hatcher
On Jan 21, 2004, at 4:21 PM, Terry Steichen wrote: PS: Is this in the docs? If not, maybe it should be mentioned. Depends on what you consider "the docs". I looked at QueryParser.jj to see what it parses. Also, on it has an example

Re: Query Term Questions

2004-01-21 Thread Terry Steichen
nesday, January 21, 2004 2:04 PM Subject: Re: Query Term Questions > On Jan 21, 2004, at 1:07 PM, Terry Steichen wrote: > > Unfortunately, using positive boost factors less than 1 causes the > > parser to > > barf the same as do negative boost factors. > > Are you sur

Re: Query Term Questions

2004-01-21 Thread Erik Hatcher
On Jan 21, 2004, at 1:07 PM, Terry Steichen wrote: Unfortunately, using positive boost factors less than 1 causes the parser to barf the same as do negative boost factors. Are you sure about that? Works for me. QueryParser just isn't set up to deal with a minus sign, but "term^0.5&qu

Re: Query Term Questions

2004-01-21 Thread Terry Steichen
: Wednesday, January 21, 2004 10:54 AM Subject: Re: Query Term Questions > Erik Hatcher writes: > > > > > > TS==>I've not been able to get negative boosting to work at all. Maybe > > > there's a problem with my syntax. > > > If, for example, I do

Re: Query Term Questions

2004-01-21 Thread Morus Walter
ot; followed by numeric > characters). So QueryParser has the problem with negative boosts, but > not Query itself. He said he wants to have one term less important than others (at least that's what I understood). That's done by positive boost factors smaller than 1.0 (e.g. 0.5 or

Re: Query Term Questions

2004-01-21 Thread Doug Cutting
Terry Steichen wrote: 1) Is there a way to set the query boost factor depending not on the presence of a term, but on the presence of two specific terms? For example, I may want to boost the relevance of a document that contains both "iraq" and "clerics", but not boost the re

Re: Query Term Questions

2004-01-21 Thread Erik Hatcher
On Jan 21, 2004, at 10:01 AM, Terry Steichen wrote: But doesn't the query itself take this into account? If there are multiple matching terms then the overlap (coord) factor kicks in. TS==>Except that I'd like to be able to choose to do this on a query-by-query basis. In other words, it's desirab

Re: Query Term Questions

2004-01-21 Thread Terry Steichen
OTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, January 21, 2004 9:31 AM Subject: Re: Query Term Questions > On Jan 20, 2004, at 10:22 AM, Terry Steichen wrote: > > 1) Is there a way to set the query boost factor depending not on the > >

Re: Query Term Questions

2004-01-21 Thread Erik Hatcher
On Jan 20, 2004, at 10:22 AM, Terry Steichen wrote: 1) Is there a way to set the query boost factor depending not on the presence of a term, but on the presence of two specific terms? For example, I may want to boost the relevance of a document that contains both "iraq" and "cl

Re: Query Term Questions

2004-01-21 Thread Erik Hatcher
On Jan 21, 2004, at 7:44 AM, Terry Steichen wrote: By the silence, I gather that the answers to my questions are "no", "no" and "no". Silence should not be interpreted this way. Perhaps folks don't know, or are busy, or any number of possibilities. I'll reply to your original message in a sec.

Re: Query Term Questions

2004-01-21 Thread Morus Walter
want to reduce > the relevance of a matching document that also included the term "iowa". ( > The idea is for an easier and more discriminating way than simply increasing > the relevance of all other terms besides "iowa"). >

Re: Query Term Questions

2004-01-21 Thread Terry Steichen
By the silence, I gather that the answers to my questions are "no", "no" and "no". Regards, Terry - Original Message - From: "Terry Steichen" <[EMAIL PROTECTED]> To: "Lucene Users Group" <[EMAIL PROTECTED]> Sent: Tuesday, J

Query Term Questions

2004-01-20 Thread Terry Steichen
1) Is there a way to set the query boost factor depending not on the presence of a term, but on the presence of two specific terms? For example, I may want to boost the relevance of a document that contains both "iraq" and "clerics", but not boost the relevance of documen

Re: Term weighting and Term boost

2004-01-16 Thread Andrzej Bialecki
Karl Koch wrote: Hello Andrzej, sorry. I mistakenly run it under Java 1.2.2 which cannot work :-) Then you get Threat Exceptions... Anyway, solved now. Thank you, Karl Thanks for the report - it's my bad, too, because the JNLP file mistakenly says . I'll correct it. -- Best regards, Andrzej Bial

Re: Term weighting and Term boost

2004-01-16 Thread Karl Koch
Hello Andrzej, sorry. I mistakenly run it under Java 1.2.2 which cannot work :-) Then you get Threat Exceptions... Anyway, solved now. Thank you, Karl > Karl Koch wrote: > > > Hello and thank you for this link. I think this is a very usefull tool > to > > analyse Lucene internals. > > > > > >

Re: Term weighting and Term boost

2004-01-16 Thread Andrzej Bialecki
Karl Koch wrote: Hello and thank you for this link. I think this is a very usefull tool to analyse Lucene internals. I realize this is not exactly the answer, but you may want to try one of the new features of Luke (http://www.getopt.org/luke), namely the query result explanation. When I star

Re: Term weighting and Term boost

2004-01-16 Thread Morus Walter
Karl Koch writes: > > If this is not the case, how is the term weight of the query calculated > then? Formula? Are there parts in it which I cannot influence? Does this formular > depend on the type of Query or is it independent. Maybe somebody can provide > a small code example?

Re: Term weighting and Term boost

2004-01-16 Thread Karl Koch
Hello and thank you for this link. I think this is a very usefull tool to analyse Lucene internals. > I realize this is not exactly the answer, but you may want to try one of > the new features of Luke (http://www.getopt.org/luke), namely the query > result explanation. When I start it accordi

Re: Term weighting and Term boost

2004-01-16 Thread Andrzej Bialecki
Karl Koch wrote: Hello all, I am new to the Lucene scene and have a few questions regarding the term boost physolophy: Is the term boost equal to a term weight? Example: If I boost a term with 0.2 does this mean the term has a weight of 0.2 then? If this is not the case, how is the term weight

  1   2   3   >