Have you considered using bi-grams and tri-grams? It might be useful
indexing with NgramFilter and then searching for N-grams through the text.
You could also count the number of times a particular document consists of
"Car Insurance Rate" for term-frequency etc.
-Hemant
I am deploying a web application serving searches on a Lucene index,
and am deciding between distributing search between several machines
or single searching, and was hoping that someone could tell me from
their experiences:
+ Is there anything particular to watch out for if using distributed
sear
Pradeep Sharma wrote:
Still in the designing phase, and I see that we need to manage several
> user / application specific configurations and I am exploring the idea
> of storing the configuration information also in the Index, may be
> create a separate index just for the configuration, because
Peter,
CharTokenizer may be the cause of the problem.
It is the parent Tokenizer of WhitespaceTokenizer
which is used by WhitespaceAnalyzer and it
has 255 bytes buffer.
How about using KeywordAnalyzer instead of WhitespaceAnalyzer?
Thanks,
Koji
> -Original Message-
> From: [EMAIL PROTE
I have just joined this user group, but I probably will be asking questions /
contributing for a while now as I am starting to work on a product which will
use Lucene exclusively.
Still in the designing phase, and I see that we need to manage several user /
application specific configurations
hi Rajesh,
thanks for the reply. i'll go ahead with the new method as you suggest.
- Original Message -
From: "Rajesh Munavalli" <[EMAIL PROTECTED]>
To:
Sent: Tuesday, January 31, 2006 10:06 PM
Subject: Re: indexing whole harddrive
You have to recursively traverse the directories usi
A word of caution in using synonyms alone
(1) Would not be able to suggest terms like "home", "cheap", "company",
which are not synonyms of either of the terms "car", "insurance"
(2) Would probably suggest terms like "machine" and "indemnity" (actual
synonyms for "car" and "insurance" retrieved fro
Hi Leon,
have you tried the WorldNet ad-on? You can easily expand the query with
synonyms.
-Ursprüngliche Nachricht-
Von: xing jiang [mailto:[EMAIL PROTECTED]
Gesendet: Dienstag, 31. Januar 2006 19:03
An: java-user@lucene.apache.org
Betreff: Re: Related searches
I think you should build
Jonathan,
what should I say, I'm feeling like an idiot now. Of course you're
right. This actually solves the issue ;)
thanks and sorry for wasting time,
- Markus
Jonathan O'Connor wrote:
Markus,
As I'm sure you know, "sucht" is also an inflection of "suchen", e.g.
"er sucht etwas". Sadly, y
: Thanks for the information Chris, but I don't see a reference to
: ConstantScoreQuery or ConstanctScoreRangeQuery in the 1.4.3 Lucene jar.
: Perhaps I'm not looking in the right place?
they didn't make it into the 1.4.3 release ... i'm not even 100% sure they
have been commited to the trunk yet
Thanks for the information Chris, but I don't see a reference to
ConstantScoreQuery or ConstanctScoreRangeQuery in the 1.4.3 Lucene jar.
Perhaps I'm not looking in the right place?
import org.apache.lucene.search.ConstantScoreQuery;
import org.apache.lucene.search.ConstantScoreRangeQuery;
Tom
-
I have some really long chemical names that I am storing in an index and
it looks like they are being split into two terms. Is there a way to
increase the max term length?
Here is an example:
DTryptophanmethylLleucineethylLhprolinamidedeglycinamideluteinizing
;hormonereleasing factor pig679010N
I would suggest you to look at papers on local/global document analysis. One
of the approach is to get a set of terms which co-occur with the query term
say "insurance". From the initial query they select the top 'N' documents
and compute the co-occurrence of other terms (usually those having high
Jonathan O'Connor wrote:
Markus,
As I'm sure you know, "sucht" is also an inflection of "suchen", e.g.
"er sucht etwas". Sadly, you may be able to fix this one problem, but
there will be hundreds of other problems too. Stemmers are never
perfect. You just have to live with it.
Most users wo
I think you should build a type of domain specific dictionary first. You
should say, for instance, "automobile = car". This approach can satisfy your
requirement.
On 1/30/06, Leon Chaddock <[EMAIL PROTECTED]> wrote:
>
> Hi,
> Does anyone know if it is possible to show related searches with lucene,
You have to recursively traverse the directories using something like...(in
Java)
void indexDocs(String file){
if (file.isDirectory()) { // if a directory
String[] files = file.list(); // list its files
for (int i = 0; i < files.length; i++) // recursively index them
Markus,
As I'm sure you know, "sucht" is also an inflection of "suchen", e.g. "er sucht etwas". Sadly, you may be able to fix this one problem, but there will be hundreds of other problems too. Stemmers are never perfect. You just have to live with it.
Most users won't have a problem with tha
Hi,
I'm currently using the GermanStemmer and it works well. However today
I've found two words which get stemmed to the same stemm-word.
"Suche" and "Sucht" both get stemmed to the same "such" it seems,
however they've completely different meanings in german (Suche = the
Search, Sucht => ad
Actually, the relevance is the primary sort, and the date is the secondary
sort. Still the same sort problem. Any help will be greatly appreciated.
~
Daniel Clark, Senior Consultant
Sybase Federal Professional Services
6550 Rock Spring Drive, Suite 800
Bet
My primary sort is by date and my secondary sort is by relevance score.
The Hits.getScore() method returns the score by 7 digits to the right of
the decimal point. Therefore, If I round to only 2 decimal points in the
display, the underlying 7 point score will be different in the sort.
Example:
Actually I get the same result with CJKAnalyzer like with StandardAnalyzer.
Zsolt
>-Original Message-
>From: Ray Tsang [mailto:[EMAIL PROTECTED]
>Sent: Sunday, January 29, 2006 10:26 AM
>To: java-user@lucene.apache.org
>Subject: Re: Chinese support
>
>Zsolt,
>
>It's in the lucene trunk un
> When using the TermEnum method won't the terms be
> analyzed
Typically this doesn't matter because "group fields"
tend to be things other than free-text eg
* Articles totalled by Year/Month
* Products totalled by category code
* Emails totalled by sender
If a group field's values aren't a st
how can I index the whole hard drive? I tried using "c:/" but it didnt
work.
The results only return c:/ directory where as a I want it to index all
the sub folders as well as the the other directories.
Azlan
This e-mail has been
When using the TermEnum method won't the terms be analyzed i.e. split in
to single words and lowercase, will this be a problem if your grouping
name is 2+ words mixed case etc?
Mike
www.ardentia.com the home of NetSearch
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTE
24 matches
Mail list logo