Re: Relative cpu cost of fetching term frequency during scoring

2023-06-26 Thread Adrien Grand
ast 5x compared to old code. > Is there any thoughts on why term frequency calls on PostingsEnum are that > slow ? > > > > *Thanks and Regards,* > *Vimal Jain* > > > On Wed, Jun 21, 2023 at 1:43 PM Adrien Grand wrote: > > > As far as your performance problem i

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-21 Thread Vimal Jain
an you please provide more details on what do you mean by dynamic > > > pruning > > > > in context of custom term query ? > > > > > > > > On Tue, 20 Jun, 2023, 9:45 pm Adrien Grand, > wrote: > > > > > > > > > Int

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-21 Thread Adrien Grand
> > pruning > > > in context of custom term query ? > > > > > > On Tue, 20 Jun, 2023, 9:45 pm Adrien Grand, wrote: > > > > > > > Intuitively replacing a disjunction across multiple fields with a > > single > > > > term query should

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-20 Thread Vimal Jain
20 Jun, 2023, 9:45 pm Adrien Grand, wrote: > > > > > Intuitively replacing a disjunction across multiple fields with a > single > > > term query should always be faster. > > > > > > You're saying that you're storing the type of token as part of th

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-20 Thread Adrien Grand
le > > term query should always be faster. > > > > You're saying that you're storing the type of token as part of the term > > frequency. This doesn't sound like something that would play well with > > dynamic pruning, so I wonder if this is the reason why you

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-20 Thread Vimal Jain
: > Intuitively replacing a disjunction across multiple fields with a single > term query should always be faster. > > You're saying that you're storing the type of token as part of the term > frequency. This doesn't sound like something that would play well with >

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-20 Thread Adrien Grand
Intuitively replacing a disjunction across multiple fields with a single term query should always be faster. You're saying that you're storing the type of token as part of the term frequency. This doesn't sound like something that would play well with dynamic pruning, so I wonder

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-20 Thread Vimal Jain
instead of creating multiple term queries , we create only 1 term query for the merged field and the scorer of this term query ( on merged field ) makes use of custom term frequency info to deduce type of token ( during indexing we store this info ) and hence the score that we were using earlier. So

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-20 Thread Adrien Grand
i, > > I want to understand if fetching the term frequency of a term during > > scoring is relatively cpu bound operation ? > > Context - I am storing custom term frequency during indexing and later > > using it for scoring during query execution time ( in Scorer'

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-19 Thread Vimal Jain
Note - i am using lucene 7.7.3 *Thanks and Regards,* *Vimal Jain* On Tue, Jun 20, 2023 at 12:26 PM Vimal Jain wrote: > Hi, > I want to understand if fetching the term frequency of a term during > scoring is relatively cpu bound operation ? > Context - I am storing custom term freq

Relative cpu cost of fetching term frequency during scoring

2023-06-19 Thread Vimal Jain
Hi, I want to understand if fetching the term frequency of a term during scoring is relatively cpu bound operation ? Context - I am storing custom term frequency during indexing and later using it for scoring during query execution time ( in Scorer's score() method ). I noticed a performance

Re: term frequency in solr

2017-01-05 Thread Ahmet Arslan
et Arslan wrote: > Hi, > > I think you are missing the main query parameter? q=*:* > > By the way you may get more response in the sole-user mailing list. > > Ahmet > > > On Wednesday, January 4, 2017 4:59 PM, huda barakat < > eng.huda.bara...@gmail.

Re: term frequency in solr

2017-01-05 Thread huda barakat
uary 4, 2017 4:59 PM, huda barakat < > eng.huda.bara...@gmail.com> wrote: > Please help me with this: > > > I have this code which return term frequency from techproducts example: > > ///

Re: term frequency in solr

2017-01-05 Thread Ahmet Arslan
Hi, I think you are missing the main query parameter? q=*:* By the way you may get more response in the sole-user mailing list. Ahmet On Wednesday, January 4, 2017 4:59 PM, huda barakat wrote: Please help me with this: I have this code which return term frequency from techproducts example

term frequency in solr

2017-01-04 Thread huda barakat
Please help me with this: I have this code which return term frequency from techproducts example: / import java.util.List; import org.apache.solr.client.solrj.SolrClient; import

Re: Altering Term Frequency in Similarity

2016-12-15 Thread Robert Muir
same class. > > Now, I want to handle the term frequency. As far as I can tell, raw TF is > given to the similarity class by score(int doc, float freq). Which class > does provide that freq? Or what can I change to provide a different freq > value, practically changing the document re

Altering Term Frequency in Similarity

2016-12-14 Thread Mossaab Bagdouri
Hi, I'm using Lucene 6.3.0, and trying to handle synonyms at query time. I think I've handled DF correctly with BlendedTermQuery (by returning the max DF of the synonyms). TTF is also handled by the same class. Now, I want to handle the term frequency. As far as I can tell, raw TF i

Re: term frequency

2016-11-28 Thread huda barakat
ery query = new SolrQuery(); query.setQuery("*:*"); SolrRequest req = new QueryRequest(query); QueryResponse rsp = req.process(solr); System.out.println("numFound: " + rsp.getResults().getNumFound()); I get results but the problem I want to get term frequen

Re: term frequency

2016-11-24 Thread Jason Wee
the exception line does not match the code you pasted, but do make sure your object actually not null before accessing its method. On Thu, Nov 24, 2016 at 5:42 PM, huda barakat wrote: > I'm using SOLRJ to find term frequency for each term in a field, I wrote > this code but it is

term frequency

2016-11-24 Thread huda barakat
I'm using SOLRJ to find term frequency for each term in a field, I wrote this code but it is not working: 1. String urlString = "http://localhost:8983/solr/huda";; 2. SolrClient solr = new HttpSolrClient.Builder(urlString).build(); 3. 4. SolrQu

Re: Calculate Term Frequency

2014-08-22 Thread Bianca Pereira
ermFreqValueSource... > > Maybe not helpful at all, but... > Erick > > On Tue, Aug 19, 2014 at 7:04 AM, Bianca Pereira > wrote: > > Hi everybody, >> >> I would like to know your suggestions to calculate Term Frequency > in a >

Re: Calculate Term Frequency

2014-08-19 Thread Tri Cao
     > Hi everybody,        >        > I would like to know your suggestions to calculate Term Frequency in a        > Lucene document. Currently I am using MultiFields.getTermDocsEnum,        > iterating through the DocsEnum 'de' returned and getting the frequency wit

Re: Calculate Term Frequency

2014-08-19 Thread Michael Sokolov
like to know your suggestions to calculate Term Frequency in a Lucene document. Currently I am using MultiFields.getTermDocsEnum, iterating through the DocsEnum 'de' returned and getting the frequency with de.freq() for the desired document. My solution gives me the result I want

Re: Calculate Term Frequency

2014-08-19 Thread Erick Erickson
rick On Tue, Aug 19, 2014 at 7:04 AM, Bianca Pereira wrote: > Hi everybody, > > I would like to know your suggestions to calculate Term Frequency in a > Lucene document. Currently I am using MultiFields.getTermDocsEnum, > iterating through the DocsEnum 'de' returned and g

Calculate Term Frequency

2014-08-19 Thread Bianca Pereira
Hi everybody, I would like to know your suggestions to calculate Term Frequency in a Lucene document. Currently I am using MultiFields.getTermDocsEnum, iterating through the DocsEnum 'de' returned and getting the frequency with de.freq() for the desired document. My solution gi

Re: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency

2014-08-07 Thread Bianca Pereira
he aalyzer > yourself. The stemming is very likely the culprit here. > > -- Jack Krupansky > > -Original Message- From: Uwe Schindler > Sent: Thursday, August 7, 2014 9:00 AM > To: java-user@lucene.apache.org > Subject: RE: EnglishAnalyzer vs WhiteSpaceAnalyzer in

Re: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency

2014-08-07 Thread Jack Krupansky
rsday, August 7, 2014 9:00 AM To: java-user@lucene.apache.org Subject: RE: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency Hi, if you create the term yourself, it is not going through the analyzer: public int getTermFrequency(String term, String id) (you create a BytesRef out of it).

RE: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency

2014-08-07 Thread Uwe Schindler
thout also stemming the term before you StandardAnalyzer does not do stemming, so terms (mostly) stay as they are. But also for this analyzer, you theoretically has to pass the term through the analyzer before you can do a term frequency lookup. Just think about that the term was not lowercased, in

Re: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency

2014-08-07 Thread Bianca Pereira
-- Jack Krupansky > > -Original Message- From: Bianca Pereira > Sent: Thursday, August 7, 2014 7:28 AM > To: java-user@lucene.apache.org > Subject: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency > > > Hi, > > I am new in the list and I have b

Re: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency

2014-08-07 Thread Jack Krupansky
need to manually filter your query terms. Sounds like maybe a term got stemmed. -- Jack Krupansky -Original Message- From: Bianca Pereira Sent: Thursday, August 7, 2014 7:28 AM To: java-user@lucene.apache.org Subject: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency Hi

EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency

2014-08-07 Thread Bianca Pereira
Hi, I am new in the list and I have been working on a problem for some time already. I would like to know if someone has any idea of how I can solve it. Given a term, I want to get the term frequency in a lucene document. When I use the WhiteSpaceAnalyzer my code works properly but when I use

Re: Building term frequency matrix over 6 million documents...

2014-01-24 Thread Marcio Napoli
ve over 6 million documents in our index, and would like to construct > a term frequency matrix over all 6 million documents as quickly as > possible. Each document has a numeric date field, so we would like to > build a time series which contains values which are the sum of all > freq

Building term frequency matrix over 6 million documents...

2014-01-24 Thread Witdouck, Xavier
Hi all, We have over 6 million documents in our index, and would like to construct a term frequency matrix over all 6 million documents as quickly as possible. Each document has a numeric date field, so we would like to build a time series which contains values which are the sum of all

supply term frequency directly

2013-07-02 Thread Michael Sokolov
Is there a way to add a document to the index by supplying terms and term frequencies directly, rather than via Analysis and/or TokenStream? I ask because I want to model some data where I know the term frequencies, but there is no underlying text document to be analyzed. I could create one b

How is the term frequency calculated if I have to add a user-generated document.

2013-04-18 Thread Gaurav Ranjan
I am a student and studying the functionality of Lucene for my project work. If I have to add a new user-generated document in lucene with a term having a particular frequency just like any text file, how do I do it? For eg, say I have to add the following documents analyzed from an image doc1 =

Re: Indexing Term Frequency Vectors

2013-04-09 Thread Adrien Grand
Hi, On Tue, Apr 9, 2013 at 5:24 PM, Sharon Tam wrote: > I tried following following this payloads tutorial to attach the term > frequencies as payloads: > http://searchhub.org/2009/08/05/getting-started-with-payloads/ > > But I'm confused as to where I need to override the te

Re: Indexing Term Frequency Vectors

2013-04-02 Thread Adrien Grand
On Tue, Apr 2, 2013 at 4:10 PM, Sharon W Tam wrote: > Are there any other ideas? Since scoring seems to be what you are interested in, you could have a look to payloads: there can store arbitrary data and can be used to score matches. -- Adrien

Re: Indexing Term Frequency Vectors

2013-04-02 Thread Sharon W Tam
ounts for a > > term by counting how many times the term appears in a particular > document. > > Instead of having Lucene do the counting, I want to do my own counting > and > > feed a term-frequency vector representation of a document directly into > the > > in

Re: Indexing Term Frequency Vectors

2013-03-28 Thread Adrien Grand
Hi, On Thu, Mar 28, 2013 at 8:25 PM, Sharon Tam wrote: > I believe that when Lucene indexes documents, it generates counts for a > term by counting how many times the term appears in a particular document. > Instead of having Lucene do the counting, I want to do my own counting and >

Indexing Term Frequency Vectors

2013-03-28 Thread Sharon Tam
I believe that when Lucene indexes documents, it generates counts for a term by counting how many times the term appears in a particular document. Instead of having Lucene do the counting, I want to do my own counting and feed a term-frequency vector representation of a document directly into the

Re: Querying with Term Frequency Vectors

2013-03-04 Thread lukai
Store the term value as payload, and score with it. On Mon, Mar 4, 2013 at 10:10 AM, Sharon Tam wrote: > Hi, > > I have generated my own term-frequency vector representations of documents > and would like to be able to query these with term-frequency vector queries > instead o

Querying with Term Frequency Vectors

2013-03-04 Thread Sharon Tam
Hi, I have generated my own term-frequency vector representations of documents and would like to be able to query these with term-frequency vector queries instead of a text-string query. Is there anyway to bypass the Lucene preprocessing that occurs in the indexing of documents and queryparsing

Re: filter by term frequency

2012-06-17 Thread Mike Sokolov
inal Message- From: Mike Sokolov Sent: Saturday, June 16, 2012 2:33 PM To: java-user@lucene.apache.org Subject: filter by term frequency I imagine this is a question that comes up from time to time, but I haven't been able to find a definitive answer anywhere, so... I'm wondering whe

Re: filter by term frequency

2012-06-16 Thread Jack Krupansky
2012 2:33 PM To: java-user@lucene.apache.org Subject: filter by term frequency I imagine this is a question that comes up from time to time, but I haven't been able to find a definitive answer anywhere, so... I'm wondering whether there is some type of Lucene query that filters by term fr

filter by term frequency

2012-06-16 Thread Mike Sokolov
I imagine this is a question that comes up from time to time, but I haven't been able to find a definitive answer anywhere, so... I'm wondering whether there is some type of Lucene query that filters by term frequency. For example, suppose I want to find all documents that have

Change IndexFiles to record term frequency as well?

2011-11-09 Thread Daniel Quach
I am currently using Lucene to index a dump of Wikipedia. I'm using the demo's IndexFiles function for the most part, but I also want to store the term frequency of a document in the index as well, is this possible? Right now, the index just stores the (term -> document pathna

Re: reusing the term-frequency count while indexing

2011-10-25 Thread prasenjit mukherjee
Thanks, this is helpful. Is the affect ( in ranking ) gonna be the same as passing multiple terms ? I will try it out definitely. On Tue, Oct 25, 2011 at 3:21 PM, Rene Hackl-Sommer wrote: > Use term boosts? "solr^3 rocks^2 apache" > > http://lucene.apache.org/java/3_4_0/queryparsersyntax.html#Bo

Re: reusing the term-frequency count while indexing

2011-10-25 Thread Rene Hackl-Sommer
Use term boosts? "solr^3 rocks^2 apache" http://lucene.apache.org/java/3_4_0/queryparsersyntax.html#Boosting%20a%20Term Am 25.10.2011 11:19, schrieb prasenjit mukherjee: During search time I get the following input ( only for 1 field ) = "solr:3 rocks:2 apache:1" . For this I have to create the

Re: reusing the term-frequency count while indexing

2011-10-25 Thread prasenjit mukherjee
t search time. > > hu? I don't understand, if you provide the terms at indexing time > lucene keeps track of the term frequency etc. why would you want to do > this at search time? During search time I get the following input ( only for 1 field ) = "solr:3 rocks:2 apache:1" . Fo

Re: reusing the term-frequency count while indexing

2011-10-25 Thread Simon Willnauer
ucene keeps track of the term frequency etc. why would you want to do this at search time? simon > > On Mon, Oct 24, 2011 at 1:05 PM, Simon Willnauer > wrote: >> so you are saying you got (uniqueTerm, freq) tuples and you want to >> make lucene use this directly? I think t

Re: reusing the term-frequency count while indexing

2011-10-24 Thread prasenjit mukherjee
e this directly? I think the easiest way is to write a > simple tokenFilter that emit the term X times where X is the term > frequency. There is no easy way to pass these tuples to lucene > directly. > > simon > > On Mon, Oct 24, 2011 at 3:28 AM, prasenjit mukherjee > wrote: >

Re: reusing the term-frequency count while indexing

2011-10-24 Thread Simon Willnauer
so you are saying you got (uniqueTerm, freq) tuples and you want to make lucene use this directly? I think the easiest way is to write a simple tokenFilter that emit the term X times where X is the term frequency. There is no easy way to pass these tuples to lucene directly. simon On Mon, Oct 24

Re: reusing the term-frequency count while indexing

2011-10-23 Thread prasenjit mukherjee
Can you tell me how I can feed the lucene index by using the term frequency directly ? Actually I am getting the documents along with their term-frequency and don't want to write any additional code to expand them. On 10/23/11, ppp c wrote: > Of curse, it can be reused. > But from

Re: reusing the term-frequency count while indexing

2011-10-23 Thread ppp c
Of curse, it can be reused. But from my point of view, it's meaningless, since the analysis process has to be performed to collect such as prox, offset, or syno, payload and so on. On Sun, Oct 23, 2011 at 11:22 PM, prasenjit mukherjee wrote: > I already have the term-frequency-count for

reusing the term-frequency count while indexing

2011-10-23 Thread prasenjit mukherjee
I already have the term-frequency-count for all the terms in a document. Is there a way I can re-use that info while indexing. I would like to use solr for this. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org

Re: term frequency on a particular query

2011-06-07 Thread Ian Lea
ord: great } > doc { question 1, response: 2, word: bad } > doc { question 1, response: 2, word: excellent} > doc { question 2, response: 1, word: car} > doc { question 2, response: 2, word: bike} > etc. > > I would like to get the word which is the most used for question 1. >

term frequency on a particular query

2011-06-07 Thread G.Long
, response: 2, word: bike} etc. I would like to get the word which is the most used for question 1. I learned something about term frequency but all the code samples I found on the internet deals about the entire index (with indexReader.terms). Any idea ? Thank you

Re: Applying term frequency thresholds on indexing time

2010-05-25 Thread Michael McCandless
quencies of the terms > before they are stored, but i guess there could be some way to work it > around??? > > All hellp appreciated! > > Thank you! > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Applying-term-frequency-thresholds-on-indexing-time

Re: Applying term frequency thresholds on indexing time

2010-05-24 Thread Erick Erickson
??? > > All hellp appreciated! > > Thank you! > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Applying-term-frequency-thresholds-on-indexing-time-tp839449p839449.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > >

Applying term frequency thresholds on indexing time

2010-05-24 Thread Xaida
.nabble.com/Applying-term-frequency-thresholds-on-indexing-time-tp839449p839449.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional

Re: Term Frequency for phrases

2010-01-08 Thread Erick Erickson
you detect that they are phrases? During indexing or during > > search? > > > > On Jan 8, 2010, at 5:16 AM, hrishim wrote: > > > >> > >> Hi . > >> I have phrases like brain natriuretic peptide indexed as a single token > >> using

Re: Term Frequency for phrases

2010-01-08 Thread Jason Rutherglen
); > double tf = termDocs.freq(); > > Regards, > Hrishi > > > Grant Ingersoll-6 wrote: >> >> When do you detect that they are phrases?  During indexing or during >> search? >> >> On Jan 8, 2010, at 5:16 AM, hrishim wrote: >> >>>

Re: Term Frequency for phrases

2010-01-08 Thread hrishim
t; > On Jan 8, 2010, at 5:16 AM, hrishim wrote: > >> >> Hi . >> I have phrases like brain natriuretic peptide indexed as a single token >> using Lucene. >> When I calculate the term frequency for the same the count is 0 since >> the >> tokens from th

Re: Term Frequency for phrases

2010-01-08 Thread Grant Ingersoll
When do you detect that they are phrases? During indexing or during search? On Jan 8, 2010, at 5:16 AM, hrishim wrote: > > Hi . > I have phrases like brain natriuretic peptide indexed as a single token > using Lucene. > When I calculate the term frequency for the same the count

Re: Term Frequency for phrases

2010-01-08 Thread Erick Erickson
Fri, Jan 8, 2010 at 5:16 AM, hrishim wrote: > > Hi . > I have phrases like brain natriuretic peptide indexed as a single token > using Lucene. > When I calculate the term frequency for the same the count is 0 since the > tokens from the text are indexed separately i.e. brain , n

Re: Term Frequency for phrases

2010-01-08 Thread Michael McCandless
a single token > using Lucene. > When I calculate the term frequency for the same  the count is 0 since the > tokens from the text are indexed separately i.e. brain , natriuretic , > peptide. > Is there a way to solve this problem and get the term frequency for the > entire phr

Term Frequency for phrases

2010-01-08 Thread hrishim
Hi . I have phrases like brain natriuretic peptide indexed as a single token using Lucene. When I calculate the term frequency for the same the count is 0 since the tokens from the text are indexed separately i.e. brain , natriuretic , peptide. Is there a way to solve this problem and get the

Re: Using TermVectorMapper to compute term frequency across documents

2009-10-15 Thread Thomas D'Silva
ed by >> running a query using a TermVectorMapper. >> I was wondering if anyone knew if there was a faster way to do this rather >> than using a HashMap with a TermVectorMapper to store the counts of the >> terms and calling getTermFreqVector(). >> I do not require the term freq

Re: Using TermVectorMapper to compute term frequency across documents

2009-10-15 Thread Karl Wettin
ather than using a HashMap with a TermVectorMapper to store the counts of the terms and calling getTermFreqVector(). I do not require the term frequency within a document. I think that is as fast as its going to get unless you have some other restrictions that would allow you to use a Field

Re: Using TermVectorMapper to compute term frequency across documents

2009-10-14 Thread Grant Ingersoll
apper to store the counts of the terms and calling getTermFreqVector(). I do not require the term frequency within a document. I think that is as fast as its going to get unless you have some other restrictions that would allow you to use a FieldCache.Can you describe the bigger proble

Using TermVectorMapper to compute term frequency across documents

2009-10-12 Thread Thomas D'Silva
getTermFreqVector(). I do not require the term frequency within a document. Thanks, Thomas HashMap termDocCount = new HashMap(); TermQuery tagQuery = new TermQuery(tagTerm); TopDocs docs = searcher.search(tagQuery, numDocs); for (int i=0 ; i public void map(String term, int frequency

Re: Term Frequency vector consumes memory

2009-07-02 Thread Grant Ingersoll
ant Ingersoll" To: Sent: Tuesday, June 30, 2009 9:48 PM Subject: Re: Term Frequency vector consumes memory In Lucene, a Term Vector is a specific thing that is stored on disk when creating a Document and Field. It is optional and off by default. It is separate from being able to get th

Re: Term Frequency vector consumes memory

2009-06-30 Thread Ganesh
er to load term vector. I want to switch off this feature? Is that possible without re-indexing? Regards Ganesh - Original Message - From: "Grant Ingersoll" To: Sent: Tuesday, June 30, 2009 9:48 PM Subject: Re: Term Frequency vector consumes memory > In Lucene, a Term Ve

Re: Term Frequency vector consumes memory

2009-06-30 Thread Grant Ingersoll
I am not clear on your question. Cheers, Grant On Jun 30, 2009, at 3:37 AM, Ganesh wrote: At the end of the day, I used to build the stats of top indexed terms. I enabled term frequency for the single field. It is working fine. I could able to get the top terms and its frequencies. It con

Term Frequency vector consumes memory

2009-06-30 Thread Ganesh
At the end of the day, I used to build the stats of top indexed terms. I enabled term frequency for the single field. It is working fine. I could able to get the top terms and its frequencies. It consumes huge amount of RAM. My index size is 5 GB and has 8 million records. If i didn't e

Re: term frequency normalization

2009-02-12 Thread Chris Hostetter
: The easiest way to change the tf calculation would be overwriting : tf in an own implementation of Similarity like it's done in : SweetSpotSimilarity. But the average term frequency of the : document is missing. Is there a simple way to get or calc this : number? there was quite a b

term frequency normalization

2009-02-03 Thread Jochen Wersdörfer
Hi, i'd like to use the term frequency normalization described in http://wiki.apache.org/lucene-java/TREC%202007%20Million%20Queries%20Track%20-%20IBM%20Haifa%20Team so that the term frequency tf becomes tf(f, d) = log(1 + feq(t, d)) / log(1 + avgFreq(d)) The easiest way to change t

term frequency normalization

2009-02-03 Thread Jochen Wersdörfer
Hi, i'd like to use the term frequency normalization described in http://wiki.apache.org/lucene-java/TREC%202007%20Million%20Queries%20Track%20-%20IBM%20Haifa%20Team so that the term frequency tf becomes tf(f, d) = log(1 + feq(t, d)) / log(1 + avgFreq(d)) The easiest way to change t

Re: Term Frequency and IndexSearcher

2009-01-16 Thread Chris Hostetter
: References: : : <1998.130.159.185.12.1232021837.squir...@webmail.cis.strath.ac.uk> : Date: Thu, 15 Jan 2009 04:49:49 -0800 (PST) : Subject: Term Frequency and IndexSearcher http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting

Re: Term Frequency and IndexSearcher

2009-01-15 Thread Murat Yakici
Hi Paul, I am tempted to suggest the following ( I am assuming here that the document and the particular fields are TFVed when indexing): For every doc in the result set: - get the doc id - using the doc id, get the TermFreqVector of this document from the index reader (tfv=ireader.getTermFr

Term Frequency and IndexSearcher

2009-01-15 Thread Paul Lynch
Hi,   I know it is very easy to get the frequency of a given term using the indexReader but I am looking to perform an index search and would like to get the frequency of the given term in the result set. Is this possible?   Thanks in advance, Paul ---

RE: Term Frequency for more complex terms

2008-07-03 Thread John Griffin
ok for explain in the API docs by clicking on Index at the top of the docs. They're all there. -Original Message- From: Matthew Hall [mailto:[EMAIL PROTECTED] Sent: Thursday, July 03, 2008 10:20 AM To: lucene Subject: Term Frequency for more complex terms I have a quick question,

Term Frequency for more complex terms

2008-07-03 Thread Matthew Hall
I have a quick question, could someone point me towards where in the API I'll have to investigate in order to figure out the term frequencies of more complex terms? For example I want to know the tf of "kit ligand" treated as a phrase. I see that luke has access to this information in its exp

Re: Scoring on Number of Unique Terms Hit, Not Term Frequency Counts

2007-05-25 Thread Grant Ingersoll
I know you have a solution already that I agree with, but I do think the DisjunctionMaxQuery could serve as the start for writing your own Query that did what you want. Why would you want to? Well, maybe you have other ways you want to search as well and don't want to mess with custom Sim

Re: Scoring on Number of Unique Terms Hit, Not Term Frequency Counts

2007-05-25 Thread Yonik Seeley
On 5/25/07, Walt Stoneburner <[EMAIL PROTECTED]> wrote: In reading the math for scoring at the bottom of: http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Similarity.html It appears that if I can make tf() and idf(), term frequency and i

Re: Scoring on Number of Unique Terms Hit, Not Term Frequency Counts

2007-05-25 Thread Walt Stoneburner
In reading the math for scoring at the bottom of: http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Similarity.html It appears that if I can make tf() and idf(), term frequency and inverse document frequency respectively, both return 1, then coord

Re: Scoring on Number of Unique Terms Hit, Not Term Frequency Counts

2007-05-25 Thread Walt Stoneburner
Grant writes: Have a look at the DisjunctionMaxQuery, I think it might help, although I am not sure it will fully cover your case. The definition for DisjunctionMaxQuery is provided at this URL: http://incubator.apache.org/lucene.net/docs/2.1/Lucene.Net.Search.DisjunctionMaxQuery.html, Grossly

Re: Scoring on Number of Unique Terms Hit, Not Term Frequency Counts

2007-05-24 Thread Grant Ingersoll
umber of unique search terms that are hit, rather than term frequency counts. A quick example. If I'm searching for "BIRD CAT DOG" (all should clauses), then I want ...a document with BIRD, CAT, and DOG terms, each only appearing once, in it to score higher than ...a document

Scoring on Number of Unique Terms Hit, Not Term Frequency Counts

2007-05-24 Thread Walt Stoneburner
Hi, I'm trying to figure what I need to do with Lucene to score a document higher when it has a larger number of unique search terms that are hit, rather than term frequency counts. A quick example. If I'm searching for "BIRD CAT DOG" (all should clauses), then I want

Term Frequency for Partial Index

2007-05-15 Thread Saravana
field values. We are forming a RangeQuery for time and normal query for other field values. Now I am able to find Term Frequency per index i.e for the whole 24 hours. But I want to find the Term Frequency for 1 hour i.e between 01:00:00 to 02:00:00. Will it be possible? Is there any API to find

Re: term frequency calculation in Lucene

2007-04-30 Thread karl wettin
29 apr 2007 kl. 18.33 skrev saikrishna venkata pendyala: Where does the lucene compute term frequency vector ? {filename,function name} DocumentWriter.java private final void invertDocument(Document doc) Actually the task is to replace the all term frequencies with some constant number

Re : term frequency calculation in Lucene

2007-04-29 Thread saikrishna venkata pendyala
Hai , Where does the lucene compute term frequency vector ? {filename,function name} Actually the task is to replace the all term frequencies with some constant number(integer), how to do this ? Any kind of help is appreciated . Thanks in advance.

How to get term frequency of multi terms and TimeRange?

2007-04-24 Thread SK R
Hi, How to get term frequency of multi terms in particular document? Any API method other than using TermVector may help? Also How to calculate termfreq. of time range. i.e : If my index have a field "TIME" with values in millis (like 1176281188000)., and I want to calculate ter

Re: Term frequency

2007-04-12 Thread Doron Cohen
karl wettin <[EMAIL PROTECTED]> wrote on 12/04/2007 00:25:47: > > 12 apr 2007 kl. 09.12 skrev sai hariharan: > > > Thanx for replying. In my scenario i'm not going to index any of my > > docs. > > So is there a way to find out term frequencies of the terms in a doc > > without doing the indexing p

Re: Term frequency

2007-04-12 Thread karl wettin
12 apr 2007 kl. 09.12 skrev sai hariharan: Thanx for replying. In my scenario i'm not going to index any of my docs. So is there a way to find out term frequencies of the terms in a doc without doing the indexing part? Using an analyzer (Tokenstream) and a Map? while ((t = ts.next)!=null)

Re: Term frequency

2007-04-12 Thread sai hariharan
Hi, Thanx for replying. In my scenario i'm not going to index any of my docs. So is there a way to find out term frequencies of the terms in a doc without doing the indexing part? Thanx in advance, Hari On 4/12/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: Add Term Vectors to your Field durin

Re: Term frequency

2007-04-11 Thread Grant Ingersoll
Add Term Vectors to your Field during indexing. See the Field constructors. To get a Term Vector out, see IndexReader.getTermFreqVector method. -Grant On Apr 11, 2007, at 3:23 PM, sai hariharan wrote: Hi, I've just started using Lucene. Can anybody assist me in calculating the term frequ

Term frequency

2007-04-11 Thread sai hariharan
Hi, I've just started using Lucene. Can anybody assist me in calculating the term frequencies of the terms(words) that occur in a document(*.txt), when a particular doc is submitted. Say when i submit sample.txt , i should first analyze the document with a standard anlyzer, then the term frequenc

Re: Get the total term frequency vector of a specific field from the hit results

2007-04-11 Thread Grant Ingersoll
On Apr 11, 2007, at 9:07 AM, karl wettin wrote: 11 apr 2007 kl. 04.21 skrev Grant Ingersoll: Would some sort of caching strategy work? How big is your overall collection? Also, lately there have been a few threads on TV (term vector) performance. I don't recall anyone having actively

Re: Get the total term frequency vector of a specific field from the hit results

2007-04-11 Thread karl wettin
11 apr 2007 kl. 04.21 skrev Grant Ingersoll: Would some sort of caching strategy work? How big is your overall collection? Also, lately there have been a few threads on TV (term vector) performance. I don't recall anyone having actively profiled or examined it for improvements, so perh

  1   2   >