Re: NGram Language Categorization Source

2005-08-22 Thread Andrzej Bialecki
Kevin Burton wrote: A lot depends on the reference profiles (which in turn depend on the quality of your training corpus - in this case, your corpus is not the best choice, because each text contains a lot of foreign words). I realize that my corpus isnt' the best. That's one of the reason's

Store index in database

2005-08-22 Thread Ivan Frade Ortea
Hello, I'm working in a big j2ee application with some documents (5000 aprox.) in a database. Now i'm indexing it without any problem using lucene with FSDirectory. But the use of files in j2ee app is not a "good-design" choice. So i was thinking in store index in database (using JDBCDirectory

Re: Store index in database

2005-08-22 Thread Chris Lu
1. storing index in database will definitely slow down the search, using JDBCDirectory only. 2. You can try to read the index from database into RAMDirectory What do you mean by "good-design"? You dislike filesystem? :) I guess you don't want to have several copies of index when your system grows.

setPhraseSlop return the same results irrespective of int parameter

2005-08-22 Thread Anil Kumar E D
In reference to the bug no:36296http://issues.apache.org/bugzilla/show_bug.cgi?id=36296Hi Eric,Thanks for the reply.Query toString println was toString : data:dhotre data:anilThen i changed my query text value to include escaped (") value. something like this search("\"Dhotre Anil\"");Now the re

Re: setPhraseSlop return the same results irrespective of int parameter

2005-08-22 Thread Erik Hatcher
How have you indexed the "data" field and what is DEFAULT_ANALYZER? Erik On Aug 22, 2005, at 7:44 AM, Anil Kumar E D wrote: In reference to the bug no:36296 http://issues.apache.org/bugzilla/show_bug.cgi?id=36296Hi Eric,Thanks for the reply.Query toString println was toString : data:d

UpdateIndex

2005-08-22 Thread dozean
Hi, i wrote an Index update, where first the IndexReader delete all files from index which are changed. Than add documents which are not in the index! Alone the deletion take so long, because i have 2 "for" loops! file = array with all files in a directory for (int i = 0; i Integer.parseInt(r

RE: UpdateIndex

2005-08-22 Thread Mordo, Aviran (EXP N-NANNATEK)
In your approach, you are reading all the documents in your index. You should instead query the index for the file name instead of reading the entire index for each file. HTH Aviran http://www.aviransplace.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent:

RE: UpdateIndex

2005-08-22 Thread dozean
Yeah, that is a good idea, but i have the following problem of doing the update that way. I can not query the index for the file name, because it could be that i have many files with the same name in different directories. So i have to query the index for the path! I store the path in a Keyword fi

RE: UpdateIndex

2005-08-22 Thread Mordo, Aviran (EXP N-NANNATEK)
You might be doing something wrong, you shouldn't have any problem searching on keywords. Note that keyword is case sensitive, thus you need to write your search term EXACTLY as it was indexed (case sensitive, spaces and all) HTH Aviran http://www.aviransplace.com -Original Message-

Re: Case-sensitive search

2005-08-22 Thread tareque
>> >> On Aug 18, 2005, at 6:22 PM, [EMAIL PROTECTED] wrote: >> On Thu, 2005-08-18 at 17:16, [EMAIL PROTECTED] wrote: > Thanks again! The analyzer is working now. But seems like > actually the > QueryParser I am using is probably converting the queries to > lowercase >

RE: Case-sensitive search

2005-08-22 Thread Mordo, Aviran (EXP N-NANNATEK)
You'll need to have two fields in your index, one for case sensitive and one for case insensitive HTH Aviran http://www.aviransplace.com Is there any way to index as case-sensitive and then, while searching, making the search case-sensitive and case-insensitive using the same index as needed?

Re: Case-sensitive search

2005-08-22 Thread Erik Hatcher
On Aug 22, 2005, at 10:40 AM, [EMAIL PROTECTED] wrote: Is there any way to index as case-sensitive and then, while searching, making the search case-sensitive and case-insensitive using the same index as needed? Not really. Terms in the index are ordered lexicographically, including case

RE: Case-sensitive search

2005-08-22 Thread Rajesh Munavalli
You could also treat the case-sensitive and case-insensitive as Synonyms and index them at the same position. This would be helpful in phrase queries. Rajesh Munavalli > -Original Message- > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > Sent: Monday, August 22, 2005 10:04 AM > To: java

Re: Case-sensitive search

2005-08-22 Thread Erik Hatcher
On Aug 22, 2005, at 11:10 AM, Rajesh Munavalli wrote: You could also treat the case-sensitive and case-insensitive as Synonyms and index them at the same position. This would be helpful in phrase queries. You wouldn't be able to selectively toggle between case-sensitive and -insensitive se

RE: Case-sensitive search

2005-08-22 Thread Rajesh Munavalli
At the query time I was thinking of two queries ORed toegether. One with user entered query and the other case insensitive query. For example: The user query "Java Virtual machine" would be translated into "Java Virtual machine" OR "java virtual machine". Eventhough the user mistyped the case ("

Re: Case-sensitive search

2005-08-22 Thread tareque
Yeah, since I will need to toggle selectively, it may not work. Also as for writing customized query subclasses, only way I can think of is to create several queries with all possible upper/lower case combinations of the main query, which may really degrade the performance. Creating two different i

Re: UpdateIndex

2005-08-22 Thread Otis Gospodnetic
Yes, this is not how you should do it. Use reader.delete(Term) method to delete documents: http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html#delete(org.apache.lucene.index.Term) Otis --- [EMAIL PROTECTED] wrote: > Hi, > > i wrote an Index update, where first the In

Query Parser custom analyzer question

2005-08-22 Thread Dan Armbrust
I have a custom Analyzer which performs normalization on all of the terms as they pass through. It does normalization like the following: trees -> tree Sometimes my normalizer returns multiple words for a normalization - for example: leaves -> leaf leave The second and all subsequent terms

Re: Query Parser custom analyzer question

2005-08-22 Thread Daniel Naber
On Monday 22 August 2005 21:54, Dan Armbrust wrote: > The problem I am having now is that the QueryParser seems to ignore the > positionIncrement values. Correct handling of multiple terms per position was only added to SVN, it's not part of Lucene 1.4.3. Regards Daniel -- http://www.danieln

Re: Case-sensitive search

2005-08-22 Thread Erik Hatcher
On Aug 22, 2005, at 11:43 AM, Rajesh Munavalli wrote: At the query time I was thinking of two queries ORed toegether. One with user entered query and the other case insensitive query. For example: The user query "Java Virtual machine" would be translated into "Java Virtual machine" OR "java

RE: Case-sensitive search

2005-08-22 Thread Rajesh Munavalli
If the original document contains the case-sensitive content that document will be retrieved with higher score. For example: Document 1: "Java Virtual Machine is " Document 2: "java virtual machine is " Index Contents of Document 1: Java java Virtual virtual Machine machine Is Index con

1.9 official betas WAS: Query Parser custom analyzer question

2005-08-22 Thread Dan Armbrust
Daniel Naber wrote: Correct handling of multiple terms per position was only added to SVN, it's not part of Lucene 1.4.3. Regards Daniel Cool - is there a daily build somewhere, or do I have to roll my own? I couldn't find a daily build or a 1.9 alpha, beta, etc. on the site. Any idea whe

Re: Indexing document instances and retrieving instance attributes

2005-08-22 Thread Chris D
On 8/18/05, Doug Cutting <[EMAIL PROTECTED]> wrote: > Chris D wrote: > > Well in my case field order is important, but the order of the > > individual fields isn't. So I can speed up getFields to roughly O(1) > > by implementing Document as follows. > > Have you actually found getFields to be a pe

Re: 1.9 official betas WAS: Query Parser custom analyzer question

2005-08-22 Thread Daniel Naber
On Monday 22 August 2005 22:46, Dan Armbrust wrote: > Cool - is there a daily build somewhere, or do I have to roll my own?  I > couldn't find a daily build or a 1.9 alpha, beta, etc. on the site. You need to get it from SVN and then build it yourself. > Any idea when 1.9 might be released, even

Re: AW: How does Lucene to compute score ?

2005-08-22 Thread Will (sent by Nabble.com)
Hey guys, here is the exact thing you want, check out this searchable archive hosted by Nabble: http://www.nabble.com/Lucene-f44.html - it archives all Lucene mailing lists into a forum, you can cross search all or drill down and search a single list. You can also narrow search by author, sort

Re: UpdateIndex

2005-08-22 Thread Ray Tsang
This could be off topic, but I made something that updates indices that worked like the following, wonder if anybody has the same ideas? I found something like IndexAccessControl in the mailing list before. An implementation of the following uses IAC. ManagedIndex index = ManagedIndex.getInstanc