date:20071220

Which file in the lucene package is used to manipulate results..

2007-12-20 Thread sumittyagi

hi, i am using lucene for the very first time and want to manipulate the results, by adding some more factors to it, which file should i edit to manipulate the search results Thanks Sumit Tyagi -- View this message in context: http://www.nabble.com/Which-file-in-the-lucene-package-is-used-

Boosting Vs Sorting

2007-12-20 Thread Rakesh Shete

Hi all, I am using Hibernate Search (http://www.hibernate.org/410.html) which is a wrapper around Lucene for performing search over info stored in the DB. I have questions related to Lucene boosting Vs sorting: Is index time boosting of documents and fields better than specifying sorting para

Re: Hit Count per Document

2007-12-20 Thread Mark Miller

Gotchya. Well, if you want to check a doc at a time you could use getSpans for a NearSpan query and just count how many you get. No ideas off the top of my head if you want the result like a score in that you get it for each hit in a search of a whole corpus. - Mark Jeff wrote: If I am not m

Re: Changing the Punctuation definition for StandardAnalyzer

2007-12-20 Thread tareque

I actually hadn't implemented the TokenFilter solution before deciding not to go with that solution, so didn't have any benchmark. But eventually I have taken care of this problem with a different variation of your quick and dirty solution. I have captured the character '@' in FastCharStream.java,

Re: Problem with Escape characters in lucene demo search

2007-12-20 Thread Erick Erickson

I think you need to back up and think about what you're trying to accomplish. Just throwing the file into a single document in your index doesn't seem very useful. Of course you can pre-process the input and index only what you want. The examples in the Lucene demo just show you how to index entir

Re: Changing the Punctuation definition for StandardAnalyzer

2007-12-20 Thread Karl Wettin

20 dec 2007 kl. 22.32 skrev [EMAIL PROTECTED]: In fact I had previously located the grammar in StandardTokenizer.jj (just wasn't sure if that was the one u were talking about) and had commented out EMAIL entries from all the following files: StandardTokenizer.java StandardTokenizer.jj Stand

Re: Hit Count per Document

2007-12-20 Thread Jeff

If I am not mistaken, that is for a term.. Is it possible for a query? In the below example, I don't want to know how many times brown is in the document I want to know how many times "quick brown" is in the document. Thanks, Jeff On Dec 20, 2007 3:03 PM, Mark Miller <[EMAIL PROTECTED]> wrote: >

RE: Problem with Escape characters in lucene demo search

2007-12-20 Thread Baljeet Dhaliwal

Interesting I am trying to make our logs searchable and thought of trying Lucene. I am talking of several (around 50-60) 2GB files to index. Would it scale? How can I index portion of document? Also like with any log , there is a pattern and most of the stuff in there is redundant. Can i discar

Re: Problem with Escape characters in lucene demo search

2007-12-20 Thread Erick Erickson

Lucene, by default, only indexes the first 10,000 tokens and throws the rest away. You can change this via IndexWriter.SetMaxFieldLength. 2G is a huge file. Are you indexing all that or are you indexing only portions? Erick On Dec 20, 2007 5:20 PM, Baljeet Dhaliwal <[EMAIL PROTECTED]> wrote: >

RE: Problem with Escape characters in lucene demo search

2007-12-20 Thread Baljeet Dhaliwal

Hi Erick Thanks. I found something interesting. I was indexing huge text files (>2GB) and the search was not returning escape characters. However, when I moved the line to a smaller file (20MB), it works fine. Is there a limit on file size search by Lucene or would you know how do escape character

Re: Changing the Punctuation definition for StandardAnalyzer

2007-12-20 Thread tareque

Karl, I should have mentioned before, I have Lucene 1.9.1. In fact I had previously located the grammar in StandardTokenizer.jj (just wasn't sure if that was the one u were talking about) and had commented out EMAIL entries from all the following files: StandardTokenizer.java StandardTokenizer.j

Re: Hit Count per Document

2007-12-20 Thread Mark Miller

You can override the scoring system and only score by term frequency (use a 1 or whatever creates a no-op for the other factors). If you have indexed with norms than you will have to use a Reader that ignores them to do this. - Mark Jeff wrote: I don't care about score, but I do care about t

Hit Count per Document

2007-12-20 Thread Jeff

I don't care about score, but I do care about the # of times a query was hit within a document? example: the quick brown fox jumped over the lazy dog the quick brown fox jumped over the lazy dog the quick brown fox jumped over the lazy dog the quick brown fox jumped over the lazy dog the slow b

Re: Changing the Punctuation definition for StandardAnalyzer

2007-12-20 Thread Karl Wettin

20 dec 2007 kl. 20.21 skrev [EMAIL PROTECTED]: I would rather like to modify the lexer grammar. But exactly where it is defined. After having a quick look, seems like StandardTokenizerTokenManager.java may be where it is being done. http://svn.apache.org/repos/asf/lucene/java/trunk/src/java

Re: Changing the Punctuation definition for StandardAnalyzer

2007-12-20 Thread tareque

Thanks Karl, I would rather like to modify the lexer grammar. But exactly where it is defined. After having a quick look, seems like StandardTokenizerTokenManager.java may be where it is being done. Ampersand having a decimal value of '38', I was assuming that the following step is taken when face

Re: Changing the Punctuation definition for StandardAnalyzer

2007-12-20 Thread Karl Wettin

20 dec 2007 kl. 18.43 skrev [EMAIL PROTECTED]: I am using StandardAnalyzer for my indexes. Now I don't want to be able to be search whole email addresses, and want to consider '@' as a punctuation too. Because my users would rather be able to search for user id and/ or the host name to ret

Changing the Punctuation definition for StandardAnalyzer

2007-12-20 Thread tareque

I am using StandardAnalyzer for my indexes. Now I don't want to be able to be search whole email addresses, and want to consider '@' as a punctuation too. Because my users would rather be able to search for user id and/or the host name to return all the email addresses than searching by the whole a

Re: Problem with Escape characters in lucene demo search

2007-12-20 Thread Erick Erickson

use Luke. Google Lucene luke and you'll find it. You can use it to examine the contents of your index in many different ways. It's invaluable when exploring the different analyzers and making sure that your index has what you *think* it has. Erick On Dec 19, 2007 10:48 PM, Baljeet Dhaliwal <

Re: Giving boost to a more recent item whiule searching

2007-12-20 Thread Grant Ingersoll

Have a look at the FunctionQuery capabilities in Lucene, whereby you can use the value of a Field as a scoring factor. So, your FunctionQuery would just do a simple calculation between the current time and whatever date is in the document. -Grant On Dec 20, 2007, at 8:03 AM, prabin meitei

Re: Giving boost to a more recent item whiule searching

2007-12-20 Thread prabin meitei

Hi, Looking into older threads and some googling i came across some codes where boosting is done at the time of indexing using time gap from 'epoch' or a *base time*. With this approach what I am afraid of is that over a period of time the boosting factor may go up and I may loose the relevence f

Re: document deletion problem

2007-12-20 Thread Doron Cohen

On Dec 20, 2007 8:31 AM, Tushar B <[EMAIL PROTECTED]> wrote: > Hi Doron, > > Just filed an issue in JIRA. Thanks! > > > Here are the requested stats: > Index size-> around 11 million documents > Query -> fieldname:[009 TO 999] (using CSRQ) ConstantScoreRangeQuery, right? > > Result

Re: Giving boost to a more recent item whiule searching

2007-12-20 Thread Zhou Qi

Brain, Can you simply describe the method you tried? I am very intertested in that. Jackson 2007/12/20, Brian Grimal <[EMAIL PROTECTED]>: > > I would love to revisit this one. I implemented pseudo date boosting in > an overly simplistic manner in my app, which I know can be improved > upon.

RE: Giving boost to a more recent item whiule searching

2007-12-20 Thread Brian Grimal

I would love to revisit this one. I implemented pseudo date boosting in an overly simplistic manner in my app, which I know can be improved upon. Might it be useful to reopen a thread on the topic? Brian -Original Message- From: prabin meitei <[EMAIL PROTECTED]> Sent: Wednesday, Dece

Which file in the lucene package is used to manipulate results..

Boosting Vs Sorting

Re: Hit Count per Document

Re: Changing the Punctuation definition for StandardAnalyzer

Re: Problem with Escape characters in lucene demo search

Re: Changing the Punctuation definition for StandardAnalyzer

Re: Hit Count per Document

RE: Problem with Escape characters in lucene demo search

Re: Problem with Escape characters in lucene demo search

RE: Problem with Escape characters in lucene demo search

Re: Changing the Punctuation definition for StandardAnalyzer

Re: Hit Count per Document

Hit Count per Document

Re: Changing the Punctuation definition for StandardAnalyzer

Re: Changing the Punctuation definition for StandardAnalyzer

Re: Changing the Punctuation definition for StandardAnalyzer

Changing the Punctuation definition for StandardAnalyzer

Re: Problem with Escape characters in lucene demo search

Re: Giving boost to a more recent item whiule searching

Re: Giving boost to a more recent item whiule searching

Re: document deletion problem

Re: Giving boost to a more recent item whiule searching

RE: Giving boost to a more recent item whiule searching

23 matches

Site Navigation

Mail list logo

Footer information