RE: Giving boost to a more recent item whiule searching

2007-12-20 Thread Brian Grimal
I would love to revisit this one. I implemented pseudo date boosting in an overly simplistic manner in my app, which I know can be improved upon. Might it be useful to reopen a thread on the topic? Brian -Original Message- From: prabin meitei [EMAIL PROTECTED] Sent: Wednesday,

Re: Giving boost to a more recent item whiule searching

2007-12-20 Thread Zhou Qi
Brain, Can you simply describe the method you tried? I am very intertested in that. Jackson 2007/12/20, Brian Grimal [EMAIL PROTECTED]: I would love to revisit this one. I implemented pseudo date boosting in an overly simplistic manner in my app, which I know can be improved upon. Might

Re: document deletion problem

2007-12-20 Thread Doron Cohen
On Dec 20, 2007 8:31 AM, Tushar B [EMAIL PROTECTED] wrote: Hi Doron, Just filed an issue in JIRA. Thanks! Here are the requested stats: Index size- around 11 million documents Query - fieldname:[009 TO 999] (using CSRQ) ConstantScoreRangeQuery, right? Result - 11475

Re: Giving boost to a more recent item whiule searching

2007-12-20 Thread Grant Ingersoll
Have a look at the FunctionQuery capabilities in Lucene, whereby you can use the value of a Field as a scoring factor. So, your FunctionQuery would just do a simple calculation between the current time and whatever date is in the document. -Grant On Dec 20, 2007, at 8:03 AM, prabin meitei

Re: Changing the Punctuation definition for StandardAnalyzer

2007-12-20 Thread tareque
Thanks Karl, I would rather like to modify the lexer grammar. But exactly where it is defined. After having a quick look, seems like StandardTokenizerTokenManager.java may be where it is being done. Ampersand having a decimal value of '38', I was assuming that the following step is taken when

Re: Changing the Punctuation definition for StandardAnalyzer

2007-12-20 Thread Karl Wettin
20 dec 2007 kl. 20.21 skrev [EMAIL PROTECTED]: I would rather like to modify the lexer grammar. But exactly where it is defined. After having a quick look, seems like StandardTokenizerTokenManager.java may be where it is being done.

Hit Count per Document

2007-12-20 Thread Jeff
I don't care about score, but I do care about the # of times a query was hit within a document? example: the quick brown fox jumped over the lazy dog the quick brown fox jumped over the lazy dog the quick brown fox jumped over the lazy dog the quick brown fox jumped over the lazy dog the slow

Re: Hit Count per Document

2007-12-20 Thread Mark Miller
You can override the scoring system and only score by term frequency (use a 1 or whatever creates a no-op for the other factors). If you have indexed with norms than you will have to use a Reader that ignores them to do this. - Mark Jeff wrote: I don't care about score, but I do care about

Re: Changing the Punctuation definition for StandardAnalyzer

2007-12-20 Thread tareque
Karl, I should have mentioned before, I have Lucene 1.9.1. In fact I had previously located the grammar in StandardTokenizer.jj (just wasn't sure if that was the one u were talking about) and had commented out EMAIL entries from all the following files: StandardTokenizer.java

RE: Problem with Escape characters in lucene demo search

2007-12-20 Thread Baljeet Dhaliwal
Hi Erick Thanks. I found something interesting. I was indexing huge text files (2GB) and the search was not returning escape characters. However, when I moved the line to a smaller file (20MB), it works fine. Is there a limit on file size search by Lucene or would you know how do escape

Re: Problem with Escape characters in lucene demo search

2007-12-20 Thread Erick Erickson
Lucene, by default, only indexes the first 10,000 tokens and throws the rest away. You can change this via IndexWriter.SetMaxFieldLength. 2G is a huge file. Are you indexing all that or are you indexing only portions? Erick On Dec 20, 2007 5:20 PM, Baljeet Dhaliwal [EMAIL PROTECTED] wrote: Hi

RE: Problem with Escape characters in lucene demo search

2007-12-20 Thread Baljeet Dhaliwal
Interesting I am trying to make our logs searchable and thought of trying Lucene. I am talking of several (around 50-60) 2GB files to index. Would it scale? How can I index portion of document? Also like with any log , there is a pattern and most of the stuff in there is redundant. Can i

Re: Hit Count per Document

2007-12-20 Thread Jeff
If I am not mistaken, that is for a term.. Is it possible for a query? In the below example, I don't want to know how many times brown is in the document I want to know how many times quick brown is in the document. Thanks, Jeff On Dec 20, 2007 3:03 PM, Mark Miller [EMAIL PROTECTED] wrote: You

Re: Changing the Punctuation definition for StandardAnalyzer

2007-12-20 Thread Karl Wettin
20 dec 2007 kl. 22.32 skrev [EMAIL PROTECTED]: In fact I had previously located the grammar in StandardTokenizer.jj (just wasn't sure if that was the one u were talking about) and had commented out EMAIL entries from all the following files: StandardTokenizer.java StandardTokenizer.jj

Re: Problem with Escape characters in lucene demo search

2007-12-20 Thread Erick Erickson
I think you need to back up and think about what you're trying to accomplish. Just throwing the file into a single document in your index doesn't seem very useful. Of course you can pre-process the input and index only what you want. The examples in the Lucene demo just show you how to index

Re: Changing the Punctuation definition for StandardAnalyzer

2007-12-20 Thread tareque
I actually hadn't implemented the TokenFilter solution before deciding not to go with that solution, so didn't have any benchmark. But eventually I have taken care of this problem with a different variation of your quick and dirty solution. I have captured the character '@' in

Re: Hit Count per Document

2007-12-20 Thread Mark Miller
Gotchya. Well, if you want to check a doc at a time you could use getSpans for a NearSpan query and just count how many you get. No ideas off the top of my head if you want the result like a score in that you get it for each hit in a search of a whole corpus. - Mark Jeff wrote: If I am not

Boosting Vs Sorting

2007-12-20 Thread Rakesh Shete
Hi all, I am using Hibernate Search (http://www.hibernate.org/410.html) which is a wrapper around Lucene for performing search over info stored in the DB. I have questions related to Lucene boosting Vs sorting: Is index time boosting of documents and fields better than specifying sorting

Which file in the lucene package is used to manipulate results..

2007-12-20 Thread sumittyagi
hi, i am using lucene for the very first time and want to manipulate the results, by adding some more factors to it, which file should i edit to manipulate the search results Thanks Sumit Tyagi -- View this message in context: