Bill Janssen wrote:
I'm not sure this solution is very robust
Thanks, but I'm pretty sure it *is* robust. Can you please offer a
specific critique? Always happy to learn and improve :-).
Try to see the behavior if you want to have a single term query
juat something like: "robust
Hi Guys
Aplologies
On a Using the Lucene Search , If returned hits for the following is to be
aquired
Search Word =' kids watches '
Hits on docs returned should have =kid's , kid watch , junior watches
Solution's Please
Thx in advance
WITH WARM REGARD
WRT to my blog post:
It seems the problem is that the distribution for lengthNorm() starts at
1 and moves down from there. 1.0f would work but HUGE documents would
be normalized and so would distort the results.
What would you think of using this implementation for lengthNorm:
public float
Hello
I wrote the following test programs:
I index 150,000 documents in Lucene and I build each document using
this method.
private Document buildDocument(String documentID, String body)
{
Document document = new Document();
document.add(Field.Keyword("docID", documentID));
document.a
On Wednesday 27 October 2004 22:47, Kevin A. Burton wrote:
> If the current behavior is all that happens this is fine... this way I
> can just get this behavior for new documents that are added.
You'll have to try it out, I'm not sure what exactly will happen.
> Also... why isn't this the defaul
Hi,
I'm getting:
java.io.IOException: Lock obtain timed out
I have
a writer service that opens the index to delete and add docs. I have a reader
service that opens the index for searching only.
This error occurs when
the reader service opens the index (this takes about 500ms). Meanwhile
Can I give weights on different indexes when I search against multiple
indexes. The final score of a document should be a linear combination of
the weights on each index and the individual score for that index. Is
this possible in Lucene?
Thanks
Ravi.
Suggestions
[a]
Try invoking the VM w/ an option like "-XX:CompileThreshold=100" or even
a smaller number. This encourages the hotspot VM to compile methods
sooner, thus the app will take less time to "warm up".
http://java.sun.com/docs/hotspot/VMOptions.html#additional
You might want to sea
> I'm not sure this solution is very robust
Thanks, but I'm pretty sure it *is* robust. Can you please offer a
specific critique? Always happy to learn and improve :-).
> I think I already sent an email with a better code...
Pretty vague. Can you send a URL for that message in the archiv
Daniel Naber wrote:
(Kevin complains about shorter documents ranked higher)
This is something that can easily be fixed. Just use a Similarity
implementation that extends DefaultSimilarity and that overwrites
lengthNorm: just return 1.0f there. You need to use that Similarity for
indexing and sea
Hello,
i'm trying to use highlighter from sandbox and
actually i've got a problem with some results getting
from highlighter.
normaly when i search in my index for ex. "motor" i
get
circa 150 results --> this results are ok.
but when i use highlighter i get some results as
"null" values from the
your analyzer will have removed the stopword when you indexed your documents, so
lucene won't be able to do this for you.
You will need to implement a second pass over the results returned by lucene and
check to see if the stopword is included, perhaps with String.indexOf()
On Wed, 27 Oct 2004 1
On Oct 27, 2004, at 3:36 PM, Ravi wrote:
Is there way to include stopwords in an exact phrase search? For
example, when I search on "Melbourne IT", Lucene only searches for
Melbourne ignoring "IT".
But you want stop words removed for general term queries?
Have a look at how Nutch does its thing -
Is there way to include stopwords in an exact phrase search? For
example, when I search on "Melbourne IT", Lucene only searches for
Melbourne ignoring "IT".
Thanks,
Ravi.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additi
On Wednesday 27 October 2004 20:20, Kevin A. Burton wrote:
> http://www.peerfear.org/rss/permalink/2004/10/26/PoorLuceneRankingForSho
>rtText/
(Kevin complains about shorter documents ranked higher)
This is something that can easily be fixed. Just use a Similarity
implementation that extends De
http://www.peerfear.org/rss/permalink/2004/10/26/PoorLuceneRankingForShortText/
--
Use Rojo (RSS/Atom aggregator). Visit http://rojo.com. Ask me for an
invite! Also see irc.freenode.net #rojo if you want to chat.
Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
If you're interested
You could always modify your own local copy if you want to change the
behavior of the parameter.
or just do:
IndexWriter w = new IndexWriter(indexDirectory,
new StandardAnalyzer(),
!(IndexReader.indexEx
Wouldn't it make more sense if the constructor for the IndexWriter always created an
index if it doesn't exist - and the boolean parameter should be clear (instead of
create)
So instead of this (from javadoc):
IndexWriter
public IndexWriter(Directory d,
Analyzer a,
So, are you creating the indexes from inside the tomcat runtime, or are you creating
them on the command line (which would be in a different runtime than tomcat)?
What happens to tomcat? Does it hang - still running but not responsive? Or does it
crash?
If it hangs, maybe you are running ou
I would suggest that you create a lock file for your index writing
process, if the lock file is encountered close the IndexWriter until
the lock file is removed. After you create the lockfile, wait a few
seconds to make sure the writer process has quiesced, then create a
snapshot of the filesystem
Aad,
D'oh forgot to mention that mildly important info. Rather than
re-index I am just creating a new index each time, this makes things easier
to roll-back etc (which is what my boss wants). the command line is
something like I
have wondered about whether sessions could be a problem, but
James,
How do you kick off your reindex? Could it be a session timeout?
cheers,
Aad
Hello,
I am a Java/Lucene/Tomcat newbie I know that does not bode well as a
start
to a post but I really am in dire straits as far as Lucene goes so bear
with
me. I am working on indexing and replacing searc
Hello,
I am working on Lucene and tried to understand the calculation of the score
value. As far as I understand it works as follows:
(1) idf = ln(numDocs/(docFreq+1))
(2) queryWeight = idf * boost
(3) sumOfSquaredWeights = queryWeight * queryWeight
(4) norm = 1/sqrt(sumOfSquaredWeights)
Christiaan Fluit wrote:
I have no practical experience with backing up an online index, but I
would try to find out the details of the write lock mechanism used by
Lucene at the file level. You can then create a backup component that
write-locks the index and does a regular file copy of the inde
Christoph Kiehl wrote:
I'm curious about your strategy to backup indexes based on FSDirectory.
If I do a file based copy I suspect I will get corrupted data because of
concurrent write access.
My current favorite is to create an empty index and use
IndexWriter.addIndexes() to copy the current in
Hi,
I'm curious about your strategy to backup indexes based on FSDirectory.
If I do a file based copy I suspect I will get corrupted data because of
concurrent write access.
My current favorite is to create an empty index and use
IndexWriter.addIndexes() to copy the current index state. But I'm
Hello,
I am a Java/Lucene/Tomcat newbie I know that does not bode well as a start
to a post but I really am in dire straits as far as Lucene goes so bear with
me. I am working on indexing and replacing search functionality for a
website (about 10 gig in size, although only about 7 gig is indexed
27 matches
Mail list logo