Re: Relative cpu cost of fetching term frequency during scoring

2023-06-26 Thread Adrien Grand
least 5x compared to old code. > Is there any thoughts on why term frequency calls on PostingsEnum are that > slow ? > > > > *Thanks and Regards,* > *Vimal Jain* > > > On Wed, Jun 21, 2023 at 1:43 PM Adrien Grand wrote: > > > As far as your performance problem i

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-21 Thread Vimal Jain
rovide more details on what do you mean by dynamic > > > pruning > > > > in context of custom term query ? > > > > > > > > On Tue, 20 Jun, 2023, 9:45 pm Adrien Grand, > wrote: > > > > > > > > > Intuitively replac

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-21 Thread Adrien Grand
ng > > > in context of custom term query ? > > > > > > On Tue, 20 Jun, 2023, 9:45 pm Adrien Grand, wrote: > > > > > > > Intuitively replacing a disjunction across multiple fields with a > > single > > > > term query should alw

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-21 Thread Vimal Jain
n, 2023, 9:45 pm Adrien Grand, wrote: > > > > > Intuitively replacing a disjunction across multiple fields with a > single > > > term query should always be faster. > > > > > > You're saying that you're storing the type of token as part of the term > >

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-20 Thread Adrien Grand
t; > term query should always be faster. > > > > You're saying that you're storing the type of token as part of the term > > frequency. This doesn't sound like something that would play well with > > dynamic pruning, so I wonder if this is the reason why you are seeing > >

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-20 Thread Vimal Jain
: > Intuitively replacing a disjunction across multiple fields with a single > term query should always be faster. > > You're saying that you're storing the type of token as part of the term > frequency. This doesn't sound like something that would play well with > dynamic pr

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-20 Thread Adrien Grand
Intuitively replacing a disjunction across multiple fields with a single term query should always be faster. You're saying that you're storing the type of token as part of the term frequency. This doesn't sound like something that would play well with dynamic pruning, so I wonder

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-20 Thread Vimal Jain
and instead of creating multiple term queries , we create only 1 term query for the merged field and the scorer of this term query ( on merged field ) makes use of custom term frequency info to deduce type of token ( during indexing we store this info ) and hence the score that we were using earlier. So

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-20 Thread Adrien Grand
; Hi, > > I want to understand if fetching the term frequency of a term during > > scoring is relatively cpu bound operation ? > > Context - I am storing custom term frequency during indexing and later > > using it for scoring during query execution time ( in Scorer's sc

Re: Relative cpu cost of fetching term frequency during scoring

2023-06-20 Thread Vimal Jain
Note - i am using lucene 7.7.3 *Thanks and Regards,* *Vimal Jain* On Tue, Jun 20, 2023 at 12:26 PM Vimal Jain wrote: > Hi, > I want to understand if fetching the term frequency of a term during > scoring is relatively cpu bound operation ? > Context - I am storing custom term freq

Relative cpu cost of fetching term frequency during scoring

2023-06-20 Thread Vimal Jain
Hi, I want to understand if fetching the term frequency of a term during scoring is relatively cpu bound operation ? Context - I am storing custom term frequency during indexing and later using it for scoring during query execution time ( in Scorer's score() method ). I noticed a performance drop

Re: term frequency in solr

2017-01-05 Thread Ahmet Arslan
nuary 2017 at 18:25, Ahmet Arslan <iori...@yahoo.com.invalid> wrote: > Hi, > > I think you are missing the main query parameter? q=*:* > > By the way you may get more response in the sole-user mailing list. > > Ahmet > > > On Wednesday, January 4, 2017 4:59 PM, hud

Re: term frequency in solr

2017-01-05 Thread huda barakat
; > > On Wednesday, January 4, 2017 4:59 PM, huda barakat < > eng.huda.bara...@gmail.com> wrote: > Please help me with this: > > > I have this code which return term frequency from techproducts example: > > //

Re: term frequency in solr

2017-01-05 Thread Ahmet Arslan
Hi, I think you are missing the main query parameter? q=*:* By the way you may get more response in the sole-user mailing list. Ahmet On Wednesday, January 4, 2017 4:59 PM, huda barakat <eng.huda.bara...@gmail.com> wrote: Please help me with this: I have this code which retur

term frequency in solr

2017-01-04 Thread huda barakat
Please help me with this: I have this code which return term frequency from techproducts example: / import java.util.List; import org.apache.solr.client.solrj.SolrClient; import

Re: Altering Term Frequency in Similarity

2016-12-15 Thread Robert Muir
onyms). TTF is also handled by the same class. > > Now, I want to handle the term frequency. As far as I can tell, raw TF is > given to the similarity class by score(int doc, float freq). Which class > does provide that freq? Or what can I change to provide a different freq > value, practi

Altering Term Frequency in Similarity

2016-12-14 Thread Mossaab Bagdouri
Hi, I'm using Lucene 6.3.0, and trying to handle synonyms at query time. I think I've handled DF correctly with BlendedTermQuery (by returning the max DF of the synonyms). TTF is also handled by the same class. Now, I want to handle the term frequency. As far as I can tell, raw TF is given

Re: term frequency

2016-11-28 Thread huda barakat
uery = new SolrQuery(); query.setQuery("*:*"); SolrRequest req = new QueryRequest(query); QueryResponse rsp = req.process(solr); System.out.println("numFound: " + rsp.getResults().getNumFound()); I get results but the problem I want to get term frequency in

Re: term frequency

2016-11-24 Thread Jason Wee
the exception line does not match the code you pasted, but do make sure your object actually not null before accessing its method. On Thu, Nov 24, 2016 at 5:42 PM, huda barakat <eng.huda.bara...@gmail.com> wrote: > I'm using SOLRJ to find term frequency for each term in a field

term frequency

2016-11-24 Thread huda barakat
I'm using SOLRJ to find term frequency for each term in a field, I wrote this code but it is not working: 1. String urlString = "http://localhost:8983/solr/huda;; 2. SolrClient solr = new HttpSolrClient.Builder(urlString).build(); 3. 4. SolrQuery query

Re: Calculate Term Frequency

2014-08-22 Thread Bianca Pereira
On Tue, Aug 19, 2014 at 7:04 AM, Bianca Pereira aivykar...@gmail.com wrote: Hi everybody, I would like to know your suggestions to calculate Term Frequency in a Lucene document. Currently I am using MultiFields.getTermDocsEnum, iterating through

Calculate Term Frequency

2014-08-19 Thread Bianca Pereira
Hi everybody, I would like to know your suggestions to calculate Term Frequency in a Lucene document. Currently I am using MultiFields.getTermDocsEnum, iterating through the DocsEnum 'de' returned and getting the frequency with de.freq() for the desired document. My solution gives me

Re: Calculate Term Frequency

2014-08-19 Thread Erick Erickson
, 2014 at 7:04 AM, Bianca Pereira aivykar...@gmail.com wrote: Hi everybody, I would like to know your suggestions to calculate Term Frequency in a Lucene document. Currently I am using MultiFields.getTermDocsEnum, iterating through the DocsEnum 'de' returned and getting the frequency with de.freq

Re: Calculate Term Frequency

2014-08-19 Thread Michael Sokolov
to know your suggestions to calculate Term Frequency in a Lucene document. Currently I am using MultiFields.getTermDocsEnum, iterating through the DocsEnum 'de' returned and getting the frequency with de.freq() for the desired document. My solution gives me the result I want but I am having

Re: Calculate Term Frequency

2014-08-19 Thread Tri Cao
...@gmail.com         wrote:         Hi everybody,                 I would like to know your suggestions to calculate Term Frequency in a         Lucene document. Currently I am using MultiFields.getTermDocsEnum,         iterating through the DocsEnum 'de' returned and getting the frequency

EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency

2014-08-07 Thread Bianca Pereira
Hi, I am new in the list and I have been working on a problem for some time already. I would like to know if someone has any idea of how I can solve it. Given a term, I want to get the term frequency in a lucene document. When I use the WhiteSpaceAnalyzer my code works properly but when I use

Re: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency

2014-08-07 Thread Jack Krupansky
need to manually filter your query terms. Sounds like maybe a term got stemmed. -- Jack Krupansky -Original Message- From: Bianca Pereira Sent: Thursday, August 7, 2014 7:28 AM To: java-user@lucene.apache.org Subject: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency Hi

Re: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency

2014-08-07 Thread Bianca Pereira
Message- From: Bianca Pereira Sent: Thursday, August 7, 2014 7:28 AM To: java-user@lucene.apache.org Subject: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency Hi, I am new in the list and I have been working on a problem for some time already. I would like to know

Re: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency

2014-08-07 Thread Jack Krupansky
9:00 AM To: java-user@lucene.apache.org Subject: RE: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency Hi, if you create the term yourself, it is not going through the analyzer: public int getTermFrequency(String term, String id) (you create a BytesRef out of it). So you have

Re: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency

2014-08-07 Thread Bianca Pereira
the aalyzer yourself. The stemming is very likely the culprit here. -- Jack Krupansky -Original Message- From: Uwe Schindler Sent: Thursday, August 7, 2014 9:00 AM To: java-user@lucene.apache.org Subject: RE: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency Hi

Building term frequency matrix over 6 million documents...

2014-01-24 Thread Witdouck, Xavier
Hi all, We have over 6 million documents in our index, and would like to construct a term frequency matrix over all 6 million documents as quickly as possible. Each document has a numeric date field, so we would like to build a time series which contains values which are the sum of all

supply term frequency directly

2013-07-02 Thread Michael Sokolov
Is there a way to add a document to the index by supplying terms and term frequencies directly, rather than via Analysis and/or TokenStream? I ask because I want to model some data where I know the term frequencies, but there is no underlying text document to be analyzed. I could create one

How is the term frequency calculated if I have to add a user-generated document.

2013-04-19 Thread Gaurav Ranjan
I am a student and studying the functionality of Lucene for my project work. If I have to add a new user-generated document in lucene with a term having a particular frequency just like any text file, how do I do it? For eg, say I have to add the following documents analyzed from an image doc1 =

Re: Indexing Term Frequency Vectors

2013-04-09 Thread Adrien Grand
frequency counter so that it uses my term frequencies. I think term frequency counts are calculated during indexing, so I don't think I can just write my own Similarity class? This is correct, frequencies are computed at indexing time. I just wanted to mention that you can influence scores based

Re: Indexing Term Frequency Vectors

2013-04-02 Thread Sharon W Tam
, it generates counts for a term by counting how many times the term appears in a particular document. Instead of having Lucene do the counting, I want to do my own counting and feed a term-frequency vector representation of a document directly into the indexer which will take my counts

Re: Indexing Term Frequency Vectors

2013-04-02 Thread Adrien Grand
On Tue, Apr 2, 2013 at 4:10 PM, Sharon W Tam s...@mit.edu wrote: Are there any other ideas? Since scoring seems to be what you are interested in, you could have a look to payloads: there can store arbitrary data and can be used to score matches. -- Adrien

Indexing Term Frequency Vectors

2013-03-28 Thread Sharon Tam
I believe that when Lucene indexes documents, it generates counts for a term by counting how many times the term appears in a particular document. Instead of having Lucene do the counting, I want to do my own counting and feed a term-frequency vector representation of a document directly

Re: Indexing Term Frequency Vectors

2013-03-28 Thread Adrien Grand
and feed a term-frequency vector representation of a document directly into the indexer which will take my counts and proceed to do the other processing such as generating inverse document frequency. My term-frequencies may not all be integers. Is there a way to do this? You could provide

Querying with Term Frequency Vectors

2013-03-04 Thread Sharon Tam
Hi, I have generated my own term-frequency vector representations of documents and would like to be able to query these with term-frequency vector queries instead of a text-string query. Is there anyway to bypass the Lucene preprocessing that occurs in the indexing of documents and queryparsing

Re: Querying with Term Frequency Vectors

2013-03-04 Thread lukai
Store the term value as payload, and score with it. On Mon, Mar 4, 2013 at 10:10 AM, Sharon Tam sharon...@gmail.com wrote: Hi, I have generated my own term-frequency vector representations of documents and would like to be able to query these with term-frequency vector queries instead

Re: filter by term frequency

2012-06-17 Thread Mike Sokolov
:33 PM To: java-user@lucene.apache.org Subject: filter by term frequency I imagine this is a question that comes up from time to time, but I haven't been able to find a definitive answer anywhere, so... I'm wondering whether there is some type of Lucene query that filters by term frequency

filter by term frequency

2012-06-16 Thread Mike Sokolov
I imagine this is a question that comes up from time to time, but I haven't been able to find a definitive answer anywhere, so... I'm wondering whether there is some type of Lucene query that filters by term frequency. For example, suppose I want to find all documents that have exactly 2

Re: filter by term frequency

2012-06-16 Thread Jack Krupansky
frequency I imagine this is a question that comes up from time to time, but I haven't been able to find a definitive answer anywhere, so... I'm wondering whether there is some type of Lucene query that filters by term frequency. For example, suppose I want to find all documents that have exactly

Change IndexFiles to record term frequency as well?

2011-11-09 Thread Daniel Quach
I am currently using Lucene to index a dump of Wikipedia. I'm using the demo's IndexFiles function for the most part, but I also want to store the term frequency of a document in the index as well, is this possible? Right now, the index just stores the (term - document pathname) mappings

Re: reusing the term-frequency count while indexing

2011-10-25 Thread Simon Willnauer
lucene keeps track of the term frequency etc. why would you want to do this at search time? simon On Mon, Oct 24, 2011 at 1:05 PM, Simon Willnauer simon.willna...@googlemail.com wrote: so you are saying you got (uniqueTerm, freq) tuples and you want to make lucene use this directly? I think

Re: reusing the term-frequency count while indexing

2011-10-25 Thread prasenjit mukherjee
that at search time. hu? I don't understand, if you provide the terms at indexing time lucene keeps track of the term frequency etc. why would you want to do this at search time? During search time I get the following input ( only for 1 field ) = solr:3 rocks:2 apache:1 . For this I have to create

Re: reusing the term-frequency count while indexing

2011-10-25 Thread Rene Hackl-Sommer
Use term boosts? solr^3 rocks^2 apache http://lucene.apache.org/java/3_4_0/queryparsersyntax.html#Boosting%20a%20Term Am 25.10.2011 11:19, schrieb prasenjit mukherjee: During search time I get the following input ( only for 1 field ) = solr:3 rocks:2 apache:1 . For this I have to create the

Re: reusing the term-frequency count while indexing

2011-10-25 Thread prasenjit mukherjee
Thanks, this is helpful. Is the affect ( in ranking ) gonna be the same as passing multiple terms ? I will try it out definitely. On Tue, Oct 25, 2011 at 3:21 PM, Rene Hackl-Sommer rene.a.ha...@gmx.de wrote: Use term boosts? solr^3 rocks^2 apache

Re: reusing the term-frequency count while indexing

2011-10-24 Thread Simon Willnauer
so you are saying you got (uniqueTerm, freq) tuples and you want to make lucene use this directly? I think the easiest way is to write a simple tokenFilter that emit the term X times where X is the term frequency. There is no easy way to pass these tuples to lucene directly. simon On Mon, Oct 24

Re: reusing the term-frequency count while indexing

2011-10-24 Thread prasenjit mukherjee
to make lucene use this directly? I think the easiest way is to write a simple tokenFilter that emit the term X times where X is the term frequency. There is no easy way to pass these tuples to lucene directly. simon On Mon, Oct 24, 2011 at 3:28 AM, prasenjit mukherjee prasen@gmail.com

reusing the term-frequency count while indexing

2011-10-23 Thread prasenjit mukherjee
I already have the term-frequency-count for all the terms in a document. Is there a way I can re-use that info while indexing. I would like to use solr for this. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org

Re: reusing the term-frequency count while indexing

2011-10-23 Thread ppp c
Of curse, it can be reused. But from my point of view, it's meaningless, since the analysis process has to be performed to collect such as prox, offset, or syno, payload and so on. On Sun, Oct 23, 2011 at 11:22 PM, prasenjit mukherjee prasen@gmail.comwrote: I already have the term-frequency

Re: reusing the term-frequency count while indexing

2011-10-23 Thread prasenjit mukherjee
Can you tell me how I can feed the lucene index by using the term frequency directly ? Actually I am getting the documents along with their term-frequency and don't want to write any additional code to expand them. On 10/23/11, ppp c peter.c.e...@gmail.com wrote: Of curse, it can be reused

term frequency on a particular query

2011-06-07 Thread G.Long
, response: 2, word: bike} etc. I would like to get the word which is the most used for question 1. I learned something about term frequency but all the code samples I found on the internet deals about the entire index (with indexReader.terms). Any idea ? Thank you

Re: Applying term frequency thresholds on indexing time

2010-05-25 Thread Michael McCandless
before they are stored, but i guess there could be some way to work it around??? All hellp appreciated! Thank you! -- View this message in context: http://lucene.472066.n3.nabble.com/Applying-term-frequency-thresholds-on-indexing-time-tp839449p839449.html Sent from the Lucene - Java Users

Applying term frequency thresholds on indexing time

2010-05-24 Thread Xaida
.nabble.com/Applying-term-frequency-thresholds-on-indexing-time-tp839449p839449.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional

Re: Applying term frequency thresholds on indexing time

2010-05-24 Thread Erick Erickson
! -- View this message in context: http://lucene.472066.n3.nabble.com/Applying-term-frequency-thresholds-on-indexing-time-tp839449p839449.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e

Term Frequency for phrases

2010-01-08 Thread hrishim
Hi . I have phrases like brain natriuretic peptide indexed as a single token using Lucene. When I calculate the term frequency for the same the count is 0 since the tokens from the text are indexed separately i.e. brain , natriuretic , peptide. Is there a way to solve this problem and get

Re: Term Frequency for phrases

2010-01-08 Thread Michael McCandless
indexed as a single token using Lucene. When I calculate the term frequency for the same  the count is 0 since the tokens from the text are indexed separately i.e. brain , natriuretic , peptide. Is there a way to solve this problem and get the term frequency for the entire phrase ? Regards

Re: Term Frequency for phrases

2010-01-08 Thread Erick Erickson
On a quick read, your statements are contradictory I have phrases like brain natriuretic peptide indexed as a single token When I calculate the term frequency for the same the count is 0 since the tokens from the text are indexed separately i.e. brain , natriuretic , peptide. Either brain

Re: Term Frequency for phrases

2010-01-08 Thread Grant Ingersoll
When do you detect that they are phrases? During indexing or during search? On Jan 8, 2010, at 5:16 AM, hrishim wrote: Hi . I have phrases like brain natriuretic peptide indexed as a single token using Lucene. When I calculate the term frequency for the same the count is 0 since

Re: Term Frequency for phrases

2010-01-08 Thread hrishim
, hrishim wrote: Hi . I have phrases like brain natriuretic peptide indexed as a single token using Lucene. When I calculate the term frequency for the same the count is 0 since the tokens from the text are indexed separately i.e. brain , natriuretic , peptide. Is there a way to solve

Re: Term Frequency for phrases

2010-01-08 Thread Jason Rutherglen
that they are phrases?  During indexing or during search? On Jan 8, 2010, at 5:16 AM, hrishim wrote: Hi . I have phrases like brain natriuretic peptide indexed as a single token using Lucene. When I calculate the term frequency for the same  the count is 0 since the tokens from the text are indexed

Re: Term Frequency for phrases

2010-01-08 Thread Erick Erickson
the term frequency for the same the count is 0 since the tokens from the text are indexed separately i.e. brain , natriuretic , peptide. Is there a way to solve this problem and get the term frequency for the entire phrase ? Regards, Hrishi -- View this message in context

Re: Using TermVectorMapper to compute term frequency across documents

2009-10-15 Thread Karl Wettin
than using a HashMap with a TermVectorMapper to store the counts of the terms and calling getTermFreqVector(). I do not require the term frequency within a document. I think that is as fast as its going to get unless you have some other restrictions that would allow you to use a FieldCache

Re: Using TermVectorMapper to compute term frequency across documents

2009-10-15 Thread Thomas D'Silva
a TermVectorMapper. I was wondering if anyone knew if there was a faster way to do this rather than using a HashMap with a TermVectorMapper to store the counts of the terms and calling getTermFreqVector(). I do not require the term frequency within a document. I think that is as fast as its

Re: Using TermVectorMapper to compute term frequency across documents

2009-10-14 Thread Grant Ingersoll
to store the counts of the terms and calling getTermFreqVector(). I do not require the term frequency within a document. I think that is as fast as its going to get unless you have some other restrictions that would allow you to use a FieldCache.Can you describe the bigger problem you

Using TermVectorMapper to compute term frequency across documents

2009-10-12 Thread Thomas D'Silva
getTermFreqVector(). I do not require the term frequency within a document. Thanks, Thomas HashMap termDocCount = new HashMap(); TermQuery tagQuery = new TermQuery(tagTerm); TopDocs docs = searcher.search(tagQuery, numDocs); for (int i=0 ; idocs.scoreDocs.length; ++i) { ScoreDoc sdoc

Re: Term Frequency vector consumes memory

2009-07-02 Thread Grant Ingersoll
...@apache.org To: java-user@lucene.apache.org Sent: Tuesday, June 30, 2009 9:48 PM Subject: Re: Term Frequency vector consumes memory In Lucene, a Term Vector is a specific thing that is stored on disk when creating a Document and Field. It is optional and off by default. It is separate from being

Term Frequency vector consumes memory

2009-06-30 Thread Ganesh
At the end of the day, I used to build the stats of top indexed terms. I enabled term frequency for the single field. It is working fine. I could able to get the top terms and its frequencies. It consumes huge amount of RAM. My index size is 5 GB and has 8 million records. If i didn't enable

Re: Term Frequency vector consumes memory

2009-06-30 Thread Grant Ingersoll
not clear on your question. Cheers, Grant On Jun 30, 2009, at 3:37 AM, Ganesh wrote: At the end of the day, I used to build the stats of top indexed terms. I enabled term frequency for the single field. It is working fine. I could able to get the top terms and its frequencies. It consumes

Re: Term Frequency vector consumes memory

2009-06-30 Thread Ganesh
to load term vector. I want to switch off this feature? Is that possible without re-indexing? Regards Ganesh - Original Message - From: Grant Ingersoll gsing...@apache.org To: java-user@lucene.apache.org Sent: Tuesday, June 30, 2009 9:48 PM Subject: Re: Term Frequency vector consumes memory

Re: term frequency normalization

2009-02-12 Thread Chris Hostetter
: The easiest way to change the tf calculation would be overwriting : tf in an own implementation of Similarity like it's done in : SweetSpotSimilarity. But the average term frequency of the : document is missing. Is there a simple way to get or calc this : number? there was quite a bit

term frequency normalization

2009-02-03 Thread Jochen Wersdörfer
Hi, i'd like to use the term frequency normalization described in http://wiki.apache.org/lucene-java/TREC%202007%20Million%20Queries%20Track%20-%20IBM%20Haifa%20Team so that the term frequency tf becomes tf(f, d) = log(1 + feq(t, d)) / log(1 + avgFreq(d)) The easiest way to change the tf

term frequency normalization

2009-02-03 Thread Jochen Wersdörfer
Hi, i'd like to use the term frequency normalization described in http://wiki.apache.org/lucene-java/TREC%202007%20Million%20Queries%20Track%20-%20IBM%20Haifa%20Team so that the term frequency tf becomes tf(f, d) = log(1 + feq(t, d)) / log(1 + avgFreq(d)) The easiest way to change the tf

Re: Term Frequency and IndexSearcher

2009-01-16 Thread Chris Hostetter
: References: : offfa5f4d3.751e9148-on8525753f.003e1216-8525753f.003e6...@us.ibm.com : 1998.130.159.185.12.1232021837.squir...@webmail.cis.strath.ac.uk : Date: Thu, 15 Jan 2009 04:49:49 -0800 (PST) : Subject: Term Frequency and IndexSearcher http://people.apache.org/~hossman

Term Frequency and IndexSearcher

2009-01-15 Thread Paul Lynch
Hi,   I know it is very easy to get the frequency of a given term using the indexReader but I am looking to perform an index search and would like to get the frequency of the given term in the result set. Is this possible?   Thanks in advance, Paul

Re: Term Frequency and IndexSearcher

2009-01-15 Thread Murat Yakici
Hi Paul, I am tempted to suggest the following ( I am assuming here that the document and the particular fields are TFVed when indexing): For every doc in the result set: - get the doc id - using the doc id, get the TermFreqVector of this document from the index reader

Term Frequency for more complex terms

2008-07-03 Thread Matthew Hall
I have a quick question, could someone point me towards where in the API I'll have to investigate in order to figure out the term frequencies of more complex terms? For example I want to know the tf of kit ligand treated as a phrase. I see that luke has access to this information in its

RE: Term Frequency for more complex terms

2008-07-03 Thread John Griffin
docs by clicking on Index at the top of the docs. They're all there. -Original Message- From: Matthew Hall [mailto:[EMAIL PROTECTED] Sent: Thursday, July 03, 2008 10:20 AM To: lucene Subject: Term Frequency for more complex terms I have a quick question, could someone point me towards

Re: Scoring on Number of Unique Terms Hit, Not Term Frequency Counts

2007-05-25 Thread Yonik Seeley
On 5/25/07, Walt Stoneburner [EMAIL PROTECTED] wrote: In reading the math for scoring at the bottom of: http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Similarity.html It appears that if I can make tf() and idf(), term frequency and inverse

Scoring on Number of Unique Terms Hit, Not Term Frequency Counts

2007-05-24 Thread Walt Stoneburner
Hi, I'm trying to figure what I need to do with Lucene to score a document higher when it has a larger number of unique search terms that are hit, rather than term frequency counts. A quick example. If I'm searching for BIRD CAT DOG (all should clauses), then I want ...a document

Re: Scoring on Number of Unique Terms Hit, Not Term Frequency Counts

2007-05-24 Thread Grant Ingersoll
of unique search terms that are hit, rather than term frequency counts. A quick example. If I'm searching for BIRD CAT DOG (all should clauses), then I want ...a document with BIRD, CAT, and DOG terms, each only appearing once, in it to score higher than ...a document with BIRD, CAT, CAT

Term Frequency for Partial Index

2007-05-16 Thread Saravana
values. We are forming a RangeQuery for time and normal query for other field values. Now I am able to find Term Frequency per index i.e for the whole 24 hours. But I want to find the Term Frequency for 1 hour i.e between 01:00:00 to 02:00:00. Will it be possible? Is there any API to find Term

Re: term frequency calculation in Lucene

2007-04-30 Thread karl wettin
29 apr 2007 kl. 18.33 skrev saikrishna venkata pendyala: Where does the lucene compute term frequency vector ? {filename,function name} DocumentWriter.java private final void invertDocument(Document doc) Actually the task is to replace the all term frequencies with some constant number

Re : term frequency calculation in Lucene

2007-04-29 Thread saikrishna venkata pendyala
Hai , Where does the lucene compute term frequency vector ? {filename,function name} Actually the task is to replace the all term frequencies with some constant number(integer), how to do this ? Any kind of help is appreciated . Thanks in advance.

How to get term frequency of multi terms and TimeRange?

2007-04-24 Thread SK R
Hi, How to get term frequency of multi terms in particular document? Any API method other than using TermVector may help? Also How to calculate termfreq. of time range. i.e : If my index have a field TIME with values in millis (like 1176281188000)., and I want to calculate term freq

Re: Term frequency

2007-04-12 Thread sai hariharan
Hi, Thanx for replying. In my scenario i'm not going to index any of my docs. So is there a way to find out term frequencies of the terms in a doc without doing the indexing part? Thanx in advance, Hari On 4/12/07, Grant Ingersoll [EMAIL PROTECTED] wrote: Add Term Vectors to your Field during

Re: Term frequency

2007-04-12 Thread karl wettin
12 apr 2007 kl. 09.12 skrev sai hariharan: Thanx for replying. In my scenario i'm not going to index any of my docs. So is there a way to find out term frequencies of the terms in a doc without doing the indexing part? Using an analyzer (Tokenstream) and a MapString, Integer? while ((t =

Re: Term frequency

2007-04-12 Thread Doron Cohen
karl wettin [EMAIL PROTECTED] wrote on 12/04/2007 00:25:47: 12 apr 2007 kl. 09.12 skrev sai hariharan: Thanx for replying. In my scenario i'm not going to index any of my docs. So is there a way to find out term frequencies of the terms in a doc without doing the indexing part? Using

Re: Get the total term frequency vector of a specific field from the hit results

2007-04-11 Thread karl wettin
11 apr 2007 kl. 04.21 skrev Grant Ingersoll: Would some sort of caching strategy work? How big is your overall collection? Also, lately there have been a few threads on TV (term vector) performance. I don't recall anyone having actively profiled or examined it for improvements, so

Re: Get the total term frequency vector of a specific field from the hit results

2007-04-11 Thread Grant Ingersoll
On Apr 11, 2007, at 9:07 AM, karl wettin wrote: 11 apr 2007 kl. 04.21 skrev Grant Ingersoll: Would some sort of caching strategy work? How big is your overall collection? Also, lately there have been a few threads on TV (term vector) performance. I don't recall anyone having actively

Term frequency

2007-04-11 Thread sai hariharan
Hi, I've just started using Lucene. Can anybody assist me in calculating the term frequencies of the terms(words) that occur in a document(*.txt), when a particular doc is submitted. Say when i submit sample.txt , i should first analyze the document with a standard anlyzer, then the term

Re: Term frequency

2007-04-11 Thread Grant Ingersoll
Add Term Vectors to your Field during indexing. See the Field constructors. To get a Term Vector out, see IndexReader.getTermFreqVector method. -Grant On Apr 11, 2007, at 3:23 PM, sai hariharan wrote: Hi, I've just started using Lucene. Can anybody assist me in calculating the term

Get the total term frequency vector of a specific field from the hit results

2007-04-10 Thread Sengly Heng
Hello all, I would like to extract the term freq vector from the hit results as a total vector not by document. I have searched the mailing and I found many have talked about this issue but I still could not find the right solution to this matter. Everyone just suggested to look at

Re: Get the total term frequency vector of a specific field from the hit results

2007-04-10 Thread thomas arni
. Here is an example: for (int i = 0; i 10; i++) { int docNumber = hits.id(i); TermFreqVector[] termsV = ir.getTermFreqVectors(docNumber); //return an array of term frequency vectors for the specified document. for (int xy = 0; xy

Re: Get the total term frequency vector of a specific field from the hit results

2007-04-10 Thread karl wettin
the document vector space model is not available in any other fashion than the term frequency vectors, or building them from scratch by enumerating the whole index. The latter of course beeing horrible slow in most cases. -- karl

Re: Get the total term frequency vector of a specific field from the hit results

2007-04-10 Thread Sengly Heng
the hits object you can iterate over the first results. Here is an example: for (int i = 0; i 10; i++) { int docNumber = hits.id(i); TermFreqVector[] termsV = ir.getTermFreqVectors(docNumber); //return an array of term frequency vectors for the specified

Re: Get the total term frequency vector of a specific field from the hit results

2007-04-10 Thread Sengly Heng
Dear Karl, Thank you for taking your time in my problem. We don't really know what your problem is. Explaining that rathern than the solution you have thought of might render a couple of alternate solutions. Perhaps something could be precalculated and stored in the documents. Perhaps

Re: Get the total term frequency vector of a specific field from the hit results

2007-04-10 Thread karl wettin
10 apr 2007 kl. 17.48 skrev Sengly Heng: We don't really know what your problem is. Explaining that rathern than the solution you have thought of might render a couple of alternate solutions. Perhaps something could be precalculated and stored in the documents. Perhaps feature selection

  1   2   >