Re: Related searches

2006-01-31 Thread Hemant Joshi
Have you considered using bi-grams and tri-grams? It might be useful indexing with NgramFilter and then searching for N-grams through the text. You could also count the number of times a particular document consists of "Car Insurance Rate" for term-frequency etc. -Hemant

Distributed vs Merged Searching

2006-01-31 Thread Chun Wei Ho
I am deploying a web application serving searches on a Lucene index, and am deciding between distributing search between several machines or single searching, and was hoping that someone could tell me from their experiences: + Is there anything particular to watch out for if using distributed sear

Re: Greetings and my first question - Is it a good practise to store application configuration in Lucene

2006-01-31 Thread Daniel Noll
Pradeep Sharma wrote: Still in the designing phase, and I see that we need to manage several > user / application specific configurations and I am exploring the idea > of storing the configuration information also in the Index, may be > create a separate index just for the configuration, because

RE: maximum string length in index field

2006-01-31 Thread Koji Sekiguchi
Peter, CharTokenizer may be the cause of the problem. It is the parent Tokenizer of WhitespaceTokenizer which is used by WhitespaceAnalyzer and it has 255 bytes buffer. How about using KeywordAnalyzer instead of WhitespaceAnalyzer? Thanks, Koji > -Original Message- > From: [EMAIL PROTE

Greetings and my first question - Is it a good practise to store application configuration in Lucene

2006-01-31 Thread Pradeep Sharma
I have just joined this user group, but I probably will be asking questions / contributing for a while now as I am starting to work on a product which will use Lucene exclusively. Still in the designing phase, and I see that we need to manage several user / application specific configurations

Re: indexing whole harddrive

2006-01-31 Thread Azlan Abdul Latiff
hi Rajesh, thanks for the reply. i'll go ahead with the new method as you suggest. - Original Message - From: "Rajesh Munavalli" <[EMAIL PROTECTED]> To: Sent: Tuesday, January 31, 2006 10:06 PM Subject: Re: indexing whole harddrive You have to recursively traverse the directories usi

Re: Related searches

2006-01-31 Thread Rajesh Munavalli
A word of caution in using synonyms alone (1) Would not be able to suggest terms like "home", "cheap", "company", which are not synonyms of either of the terms "car", "insurance" (2) Would probably suggest terms like "machine" and "indemnity" (actual synonyms for "car" and "insurance" retrieved fro

AW: Related searches

2006-01-31 Thread Klaus
Hi Leon, have you tried the WorldNet ad-on? You can easily expand the query with synonyms. -Ursprüngliche Nachricht- Von: xing jiang [mailto:[EMAIL PROTECTED] Gesendet: Dienstag, 31. Januar 2006 19:03 An: java-user@lucene.apache.org Betreff: Re: Related searches I think you should build

Re: Stemming german words

2006-01-31 Thread Markus Fischer
Jonathan, what should I say, I'm feeling like an idiot now. Of course you're right. This actually solves the issue ;) thanks and sorry for wasting time, - Markus Jonathan O'Connor wrote: Markus, As I'm sure you know, "sucht" is also an inflection of "suchen", e.g. "er sucht etwas". Sadly, y

RE: Number Searches vs Character

2006-01-31 Thread Chris Hostetter
: Thanks for the information Chris, but I don't see a reference to : ConstantScoreQuery or ConstanctScoreRangeQuery in the 1.4.3 Lucene jar. : Perhaps I'm not looking in the right place? they didn't make it into the 1.4.3 release ... i'm not even 100% sure they have been commited to the trunk yet

RE: Number Searches vs Character

2006-01-31 Thread Aigner, Thomas
Thanks for the information Chris, but I don't see a reference to ConstantScoreQuery or ConstanctScoreRangeQuery in the 1.4.3 Lucene jar. Perhaps I'm not looking in the right place? import org.apache.lucene.search.ConstantScoreQuery; import org.apache.lucene.search.ConstantScoreRangeQuery; Tom -

maximum string length in index field

2006-01-31 Thread Peter.Kipping
I have some really long chemical names that I am storing in an index and it looks like they are being split into two terms. Is there a way to increase the max term length? Here is an example: DTryptophanmethylLleucineethylLhprolinamidedeglycinamideluteinizing  ;hormonereleasing factor pig679010N

Re: Related searches

2006-01-31 Thread Rajesh Munavalli
I would suggest you to look at papers on local/global document analysis. One of the approach is to get a set of terms which co-occur with the query term say "insurance". From the initial query they select the top 'N' documents and compute the co-occurrence of other terms (usually those having high

Re: Stemming german words

2006-01-31 Thread Stefan Gusenbauer
Jonathan O'Connor wrote: Markus, As I'm sure you know, "sucht" is also an inflection of "suchen", e.g. "er sucht etwas". Sadly, you may be able to fix this one problem, but there will be hundreds of other problems too. Stemmers are never perfect. You just have to live with it. Most users wo

Re: Related searches

2006-01-31 Thread xing jiang
I think you should build a type of domain specific dictionary first. You should say, for instance, "automobile = car". This approach can satisfy your requirement. On 1/30/06, Leon Chaddock <[EMAIL PROTECTED]> wrote: > > Hi, > Does anyone know if it is possible to show related searches with lucene,

Re: indexing whole harddrive

2006-01-31 Thread Rajesh Munavalli
You have to recursively traverse the directories using something like...(in Java) void indexDocs(String file){ if (file.isDirectory()) { // if a directory String[] files = file.list(); // list its files for (int i = 0; i < files.length; i++) // recursively index them

Re: Stemming german words

2006-01-31 Thread Jonathan O'Connor
Markus, As I'm sure you know, "sucht" is also an inflection of "suchen", e.g. "er sucht etwas". Sadly, you may be able to fix this one problem, but there will be hundreds of other problems too. Stemmers are never perfect. You just have to live with it. Most users won't have a problem with tha

Stemming german words

2006-01-31 Thread Markus Fischer
Hi, I'm currently using the GermanStemmer and it works well. However today I've found two words which get stemmed to the same stemm-word. "Suche" and "Sucht" both get stemmed to the same "such" it seems, however they've completely different meanings in german (Suche = the Search, Sucht => ad

Re: Sorting

2006-01-31 Thread Daniel . Clark
Actually, the relevance is the primary sort, and the date is the secondary sort. Still the same sort problem. Any help will be greatly appreciated. ~ Daniel Clark, Senior Consultant Sybase Federal Professional Services 6550 Rock Spring Drive, Suite 800 Bet

Sorting

2006-01-31 Thread Daniel . Clark
My primary sort is by date and my secondary sort is by relevance score. The Hits.getScore() method returns the score by 7 digits to the right of the decimal point. Therefore, If I round to only 2 decimal points in the display, the underlying 7 point score will be different in the sort. Example:

RE: Chinese support

2006-01-31 Thread Zsolt
Actually I get the same result with CJKAnalyzer like with StandardAnalyzer. Zsolt >-Original Message- >From: Ray Tsang [mailto:[EMAIL PROTECTED] >Sent: Sunday, January 29, 2006 10:26 AM >To: java-user@lucene.apache.org >Subject: Re: Chinese support > >Zsolt, > >It's in the lucene trunk un

RE: grouping results by fields

2006-01-31 Thread mark harwood
> When using the TermEnum method won't the terms be > analyzed Typically this doesn't matter because "group fields" tend to be things other than free-text eg * Articles totalled by Year/Month * Products totalled by category code * Emails totalled by sender If a group field's values aren't a st

indexing whole harddrive

2006-01-31 Thread Azlan Abdul Latiff
how can I index the whole hard drive? I tried using "c:/" but it didnt work. The results only return c:/ directory where as a I want it to index all the sub folders as well as the the other directories. Azlan This e-mail has been

RE: grouping results by fields

2006-01-31 Thread Mike Streeton
When using the TermEnum method won't the terms be analyzed i.e. split in to single words and lowercase, will this be a problem if your grouping name is 2+ words mixed case etc? Mike www.ardentia.com the home of NetSearch -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTE