Hi, Please help me.
Its been a month since i am trying lucene.
My requirements are huge, i have to index and search in TB of data.
I have question regarding three topics:
1. Problem in Indexing
As i need to index TB of data, so by googling and visiting different forum
i deployed following fash
Might be nice to add a line of documentation to the highlighter on the
possible
performance hit if one uses StandardAnalyzer which probably is a common
case.
Thanks for the speedy response.
-M
On 7/18/07, Mark Miller <[EMAIL PROTECTED]> wrote:
Unfortunately, StandardAnalyzer is slow. StandardA
Unfortunately, StandardAnalyzer is slow. StandardAnalyzer is really
limited by JavaCC speed. You cannot shave much more performance out of
the grammar as it is already about as simple as it gets. You should
first see if you can get away without it and use a different Analyzer,
or if you can re-
Hi all,
I was tracking down slowness in the contrib highlighter code and it seems
the seemingly simple tokenStream.next() is the culprit.
I've seen multiple posts about this being a possible cause. Has anyone
looked into how to speed up StandardTokenizer? For my
documents it's taking about 70ms p
Right , I was making a silly mistake there. I have it working now.
Thanks for the reply.
yu wrote:
You can put lucene-queries-2.2.0.jar on your class path or your
Eclipse project build path. That's all you need.
Jay
Akanksha Baid wrote:
I am using Lucene 2.1.0 and want to use MoreLikeThis f
You can put lucene-queries-2.2.0.jar on your class path or your Eclipse
project build path. That's all you need.
Jay
Akanksha Baid wrote:
I am using Lucene 2.1.0 and want to use MoreLikeThis for querying
documents. I understand that the jar file for the same is in contrib.
I have the contrib
I am using Lucene 2.1.0 and want to use MoreLikeThis for querying
documents. I understand that the jar file for the same is in contrib. I
have the contrib folder extracted, but am not sure what to do from this
point on. What jar file am I looking for and where should put it. I am
using Eclipse
Hi All,
I searched in this forum for anybody looking for need for previous() method
in TermEnum. I found only this link
http://www.nabble.com/How-to-navigate-through-indexed-terms-tf28148.html#a189225
Would it be possible to implement previous() method ? I know i am asking for
quick solution here
Hi,
I am trying to model a Dictionary Type Search in Lucene. My approach was
this
- Load the dictionary file ( words & their meanings ) and index each
dictionary term and associated meaning as a Lucene Document.
- Use IndexReader's term method to peek at the index and get the TermEnum.
TermEnum'
Hey Guys,
I just checked my Lucene results. It shows a document with the word hit
"change" when I am searching for "Chan", and it considers that as a hit. Is
there a way to stop this and show just the exact word match ?
I started using Lucene yesterday, so I am fairly new !
thanks
AZ
On 7/18/0
I don't think this is stored in the index.
I think the closest you can get is the "format" of the segments_N file
which changes every time the index file format changes. That at least
lets you narrow it down possibly to a single release if the file
format is changing frequently (eg it has in the
Are you sure that the hit wasn't on "w" or "kim"? The
default for searching is OR...
I recommend that you get a copy of Luke (google lucene luke)
which allows you to examine your index as well as see how
queries parse using various analyzers. It's an invaluable tool...
Best
Erick
On 7/18/07, As
Is there a way to test as to which version of Lucene was used to build
an index?
-Akanksha
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hey folks,
I am a new Lucene user , I used the following after indexing:
search(searcher, "W. Chan Kim");
Lucene showed me hits of documents where "channel" word existed. Notice that
"Chan" is a part of "Channel" . How do I stop this ?
I am keen to find the exact word.
I used the following, b
On Wednesday 18 July 2007 12:30, Cedric Ho wrote:
> Thanks for the quick response Paul =)
>
> However I am lost while looking at the surround package.
That is not really surprising, the code is factored to the bone, and it
is hardly documented.
You could have a look at the test code to start.
Al
Is there a way to know how big to make the array before hand (how many terms
are in the topic total?). I'm worried about the efficiency of this, since
I'd have to rebuild every document that is a "hit" on the fly to make a
snippet for each "hit" on the page (say 10 a page).
Now I have to wonder
When in doubt, WhitespaceAnalyzer is the most predictable. Note that
it doesn't lower-case the tokens though. Depending upon your
requirements, you can always pre-process your query and indexing
streams and do your own lowercasing and/or character stripping.
You can always create your own analyze
Witch analyser I have to use to find text like this ''?
You could give this a shot (From my Qsol query parser):
package com.mhs.qsol.spans;
/**
* Copyright 2006 Mark Miller ([EMAIL PROTECTED])
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy
Thanks for the quick response Paul =)
However I am lost while looking at the surround package. Are you
suggesting I can solve my problem at hand using the surround package?
On 7/18/07, Paul Elschot <[EMAIL PROTECTED]> wrote:
On Wednesday 18 July 2007 05:58, Cedric Ho wrote:
> Hi everybody,
>
>
20 matches
Mail list logo