I would love to revisit this one. I implemented pseudo date boosting in an
overly simplistic manner in my app, which I know can be improved upon. Might
it be useful to reopen a thread on the topic?
Brian
-Original Message-
From: prabin meitei [EMAIL PROTECTED]
Sent: Wednesday,
Brain,
Can you simply describe the method you tried? I am very intertested in
that.
Jackson
2007/12/20, Brian Grimal [EMAIL PROTECTED]:
I would love to revisit this one. I implemented pseudo date boosting in
an overly simplistic manner in my app, which I know can be improved
upon. Might
On Dec 20, 2007 8:31 AM, Tushar B [EMAIL PROTECTED] wrote:
Hi Doron,
Just filed an issue in JIRA.
Thanks!
Here are the requested stats:
Index size- around 11 million documents
Query - fieldname:[009 TO 999] (using CSRQ)
ConstantScoreRangeQuery, right?
Result - 11475
Have a look at the FunctionQuery capabilities in Lucene, whereby you
can use the value of a Field as a scoring factor. So, your
FunctionQuery would just do a simple calculation between the current
time and whatever date is in the document.
-Grant
On Dec 20, 2007, at 8:03 AM, prabin meitei
Thanks Karl,
I would rather like to modify the lexer grammar. But exactly where it is
defined. After having a quick look, seems like
StandardTokenizerTokenManager.java may be where it is being done.
Ampersand having a decimal value of '38', I was assuming that the
following step is taken when
20 dec 2007 kl. 20.21 skrev [EMAIL PROTECTED]:
I would rather like to modify the lexer grammar. But exactly where
it is
defined. After having a quick look, seems like
StandardTokenizerTokenManager.java may be where it is being done.
I don't care about score, but I do care about the # of times a query was hit
within a document? example:
the quick brown fox jumped over the lazy dog
the quick brown fox jumped over the lazy dog
the quick brown fox jumped over the lazy dog
the quick brown fox jumped over the lazy dog
the slow
You can override the scoring system and only score by term frequency
(use a 1 or whatever creates a no-op for the other factors). If you have
indexed with norms than you will have to use a Reader that ignores them
to do this.
- Mark
Jeff wrote:
I don't care about score, but I do care about
Karl,
I should have mentioned before, I have Lucene 1.9.1.
In fact I had previously located the grammar in StandardTokenizer.jj (just
wasn't sure if that was the one u were talking about) and had commented
out EMAIL entries from all the following files:
StandardTokenizer.java
Hi Erick
Thanks. I found something interesting. I was indexing huge text files (2GB)
and the search was not returning escape characters. However, when I moved
the line to a smaller file (20MB), it works fine. Is there a limit on file
size search by Lucene or would you know how do escape
Lucene, by default, only indexes the first 10,000 tokens and throws
the rest away. You can change this via IndexWriter.SetMaxFieldLength.
2G is a huge file. Are you indexing all that or are you indexing only
portions?
Erick
On Dec 20, 2007 5:20 PM, Baljeet Dhaliwal [EMAIL PROTECTED] wrote:
Hi
Interesting I am trying to make our logs searchable and thought of
trying Lucene. I am talking of several (around 50-60) 2GB files to index.
Would it scale? How can I index portion of document? Also like with any log
, there is a pattern and most of the stuff in there is redundant. Can i
If I am not mistaken, that is for a term.. Is it possible for a query? In
the below example, I don't want to know how many times brown is in the
document I want to know how many times quick brown is in the document.
Thanks,
Jeff
On Dec 20, 2007 3:03 PM, Mark Miller [EMAIL PROTECTED] wrote:
You
20 dec 2007 kl. 22.32 skrev [EMAIL PROTECTED]:
In fact I had previously located the grammar in StandardTokenizer.jj
(just wasn't sure if that was the one u were talking about) and had
commented out EMAIL entries from all the following files:
StandardTokenizer.java
StandardTokenizer.jj
I think you need to back up and think about what you're trying to
accomplish. Just throwing the file into a single document in
your index doesn't seem very useful.
Of course you can pre-process the input and index only what
you want. The examples in the Lucene demo just show
you how to index
I actually hadn't implemented the TokenFilter solution before deciding not
to go with that solution, so didn't have any benchmark.
But eventually I have taken care of this problem with a different
variation of your quick and dirty solution. I have captured the character
'@' in
Gotchya. Well, if you want to check a doc at a time you could use
getSpans for a NearSpan query and just count how many you get. No ideas
off the top of my head if you want the result like a score in that you
get it for each hit in a search of a whole corpus.
- Mark
Jeff wrote:
If I am not
Hi all,
I am using Hibernate Search (http://www.hibernate.org/410.html) which is a
wrapper around Lucene for performing search over info stored in the DB. I have
questions related to Lucene boosting Vs sorting:
Is index time boosting of documents and fields better than specifying sorting
hi, i am using lucene for the very first time and want to manipulate the
results, by adding some more factors to it, which file should i edit to
manipulate the search results
Thanks
Sumit Tyagi
--
View this message in context:
19 matches
Mail list logo