RE: how to enhance speed of sorted search

2006-09-25 Thread Mordo, Aviran (EXP N-NANNATEK)
AFAIK when you sort Lucene does not calculate the relevance score. Aviran http://www.aviransplace.com -Original Message- From: Yura Smolsky [mailto:[EMAIL PROTECTED] Sent: Monday, September 25, 2006 4:39 AM To: java-user@lucene.apache.org Subject: how to enhance speed of sorted search

RE: Performance in having Multiple Index files

2007-03-01 Thread Mordo, Aviran (EXP N-NANNATEK)
Yes, it will affect the search performance because you need to merge the results from the different indexes. The best performance is from a single index. The more indexes you have the more time it takes to search. Aviran http://www.aviransplace.com -Original Message- From: Raaj [mailto:[

RE: How can I use SortComparator in my case?

2007-03-02 Thread Mordo, Aviran (EXP N-NANNATEK)
You'll need to do it manually and not with Lucene. Just grab all the results from Lucene and process them yourself. Aviran http://aviransplace.com -Original Message- From: Ramana Jelda [mailto:[EMAIL PROTECTED] Sent: Friday, March 02, 2007 5:45 AM To: java-user@lucene.apache.org Subjec

Language detection library

2007-05-03 Thread Mordo, Aviran (EXP N-NANNATEK)
Anyone knows of a good language detection library that can detect what language a document (text) is ? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

RE: Language detection library

2007-05-04 Thread Mordo, Aviran (EXP N-NANNATEK)
/service is free. Nutch language id plugin. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: "Mordo, Aviran (EXP N-NANNATEK)" <[EMAIL PROTECTED]> To: java-user@lucene

RE: Implement a tokenizer

2007-05-21 Thread Mordo, Aviran (EXP N-NANNATEK)
What you need to do is to create your own tokenizer. Just copy the code from the StandardTokenizer to your XYZTokenizer and make your changes. Then you need to create your own Analyzer class (again copy the code from the StandardAnalyzer) and user your XYZTokenizer in the new XYZAnalyzer you create

RE: Searching on a Rapidly changing Index

2007-05-24 Thread Mordo, Aviran (EXP N-NANNATEK)
You can create two indexes. One will be for new documents, let say the last 24 hours and another one for older documents. This way you will only update a small portion of your index while the large index will remain relatively constant so you don't have to get a new searcher for it. HTH Aviran ht

RE: Lucene code injection?

2007-05-24 Thread Mordo, Aviran (EXP N-NANNATEK)
This sounds good. As for the code injection it is up to you to sanitize the request before it goes to lucene, probably by filling the email field yourself and not rely on the user input for the email address. HTH Aviran http://www.aviransplace.com http://shaveh.co.il -Original Message-

RE: addding/searching documents during optimize

2007-05-29 Thread Mordo, Aviran (EXP N-NANNATEK)
1. Yes it is safe to search while optimizing and adding documents to an index. 2. NO you can not add documents to an index while it is optimized. You can only have one instance of IndexWriter working on an index HTH Aviran http://www.aviransplace.com http://shaveh.co.il -Original Message--

RE: searching for empty field

2007-06-11 Thread Mordo, Aviran (EXP N-NANNATEK)
AFAIK there is no strait way of doing that, however you can create another field (field4) which can indicate if field2 exists. HTH Aviran http://www.aviransplace.com http://shaveh.co.il -Original Message- From: Dino [mailto:[EMAIL PROTECTED] Sent: Monday, June 11, 2007 9:54 AM To: java

Content Summarization

2007-06-18 Thread Mordo, Aviran (EXP N-NANNATEK)
Any one knows of a content summarization library. I need to display a summarized version of the document, not snippets of text like the highlighter, but actually a summary of the document. Thanks Aviran - To unsubscribe, e-mail:

RE: Search shortly after adding a doc

2005-08-05 Thread Mordo, Aviran (EXP N-NANNATEK)
You can try working with two indexes one for all of today's messages which will be pretty small, and another for past messages. Then once a day merge the small index to the big one and start fresh. This way you need only to open an IndexReader for the small index while the big one does not change.

RE: Split Search Word

2005-08-05 Thread Mordo, Aviran (EXP N-NANNATEK)
The StandardAnalyzer should work just fine with it, It will break the search string to 5 search terms. HTH Aviran http://www.aviransplace.com _ From: Karthik N S [mailto:[EMAIL PROTECTED] Sent: Friday, August 05, 2005 1:57 AM To: LUCENE Subject: Split Search Word Hi Luceners Apol

RE: merging indexes together

2005-08-08 Thread Mordo, Aviran (EXP N-NANNATEK)
Why don't you just add the new information directly to the main index ? As long as you don't get a new IndexReader you should be able to access the old information. Once your indexing and deletion is done just get a new IndexReader instance to access the new documents. Aviran http://www.aviranspla

RE: [ANN] Lucene "Did You Mean" article on java.net

2005-08-18 Thread Mordo, Aviran (EXP N-NANNATEK)
Thanks, Very nice article :) Aviran http://www.aviransplace.com -Original Message- From: Joseph B. Ottinger [mailto:[EMAIL PROTECTED] Sent: Thursday, August 18, 2005 7:22 AM To: java-user@lucene.apache.org Subject: Re: [ANN] Lucene "Did You Mean" article on java.net TSS referred to it,

RE: w.fnm (System can not find file.)

2005-08-18 Thread Mordo, Aviran (EXP N-NANNATEK)
Try to decrease the merge factor, and I would also check the Max number of files allowed to be opened in the OS. HTH Aviran http://www.aviransplace.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, August 18, 2005 7:34 AM To: java-user@lucene.apa

RE: OutOfMemoryError on addIndexes()

2005-08-18 Thread Mordo, Aviran (EXP N-NANNATEK)
You can still have the complete date as a separate field, and sort or filter by it, just don't use this field in your query. Aviran http://www.aviransplace.com -Original Message- From: Tony Schwartz [mailto:[EMAIL PROTECTED] Sent: Thursday, August 18, 2005 8:32 AM To: java-user@lucene.ap

RE: UpdateIndex

2005-08-22 Thread Mordo, Aviran (EXP N-NANNATEK)
In your approach, you are reading all the documents in your index. You should instead query the index for the file name instead of reading the entire index for each file. HTH Aviran http://www.aviransplace.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent:

RE: UpdateIndex

2005-08-22 Thread Mordo, Aviran (EXP N-NANNATEK)
achricht --- > Von: "Mordo, Aviran (EXP N-NANNATEK)" <[EMAIL PROTECTED]> > An: java-user@lucene.apache.org > Betreff: RE: UpdateIndex > Datum: Mon, 22 Aug 2005 09:43:05 -0400 > > In your approach, you are reading all the documents in your index. You > should inste

RE: Case-sensitive search

2005-08-22 Thread Mordo, Aviran (EXP N-NANNATEK)
You'll need to have two fields in your index, one for case sensitive and one for case insensitive HTH Aviran http://www.aviransplace.com Is there any way to index as case-sensitive and then, while searching, making the search case-sensitive and case-insensitive using the same index as needed?

RE: Does order of BooleanQuery clauses affect search performance?

2005-08-26 Thread Mordo, Aviran (EXP N-NANNATEK)
As far as I remember the order of Queries in a BooleanQuery does not affect performance. (but I may be wrong) Aviran http://www.aviransplace.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, August 26, 2005 11:59 AM To: java-user@lucene.apache.org Sub

RE: UpdateIndex

2005-08-29 Thread Mordo, Aviran (EXP N-NANNATEK)
After you delete / add documents, you need to get a new IndexReader instance to reflect the changes. HTH Aviran http://www.aviransplace.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, August 29, 2005 7:32 AM To: java-user@lucene.apache.org Subje

RE: UpdateIndex

2005-08-29 Thread Mordo, Aviran (EXP N-NANNATEK)
--- Ursprüngliche Nachricht --- > Von: "Mordo, Aviran (EXP N-NANNATEK)" <[EMAIL PROTECTED]> > An: java-user@lucene.apache.org > Betreff: RE: UpdateIndex > Datum: Mon, 29 Aug 2005 09:28:59 -0400 > > After you delete / add documents, you need to get a new IndexReader

RE: custom sort

2005-08-30 Thread Mordo, Aviran (EXP N-NANNATEK)
When using sort there is no meaning for weight. Aviran http://www.aviransplace.com -Original Message- From: Chris Lu [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 30, 2005 12:35 AM To: java-user@lucene.apache.org; raymondcreel Subject: Re: custom sort You can just assign the field B

RE: Sorting results by both score and date

2005-09-16 Thread Mordo, Aviran (EXP N-NANNATEK)
You can write a query and add a date range to it giving the date field a boost. For instance you can do "+content:foo date:[{Today's date} TO null]^5 date:[{Yesterday's Date} TO {Today's Date}]^4 date:[{Last Week's Date} TO Yesterday's Date}]^3 and so on Aviran http://www.aviransplace.com -O

RE: Deleting documents

2005-09-16 Thread Mordo, Aviran (EXP N-NANNATEK)
Because when you add a document, the id is going thru an Analyzer, which in your case uses a low case filter, but when you create a Term object the term is not lower cased by an Analyzer. If instead of using Field.Text for your ID, you'll use Keyword, then the Analyzer will not lower case the ID

RE: date keyword

2005-09-20 Thread Mordo, Aviran (EXP N-NANNATEK)
Lucene only uses strings to store and search, you should convert any objects to string. For dates you have a special Date field that you should use which converts dated to a searchable strings Aviran http://www.aviransplace.com -Original Message- From: haipeng du [mailto:[EMAIL PROTECTED

RE: Storing HashMap as an UnIndexed Field

2005-09-20 Thread Mordo, Aviran (EXP N-NANNATEK)
You can store the values as a coma separated string (which then you'll need to parse manually back to a HashMap) -Original Message- From: Tricia Williams [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 20, 2005 3:14 PM To: java-user@lucene.apache.org Subject: Storing HashMap as an UnI

RE: Storing HashMap as an UnIndexed Field

2005-09-20 Thread Mordo, Aviran (EXP N-NANNATEK)
o you think there is anyway that I could use the serialization already built into the HashMap data structure? On Tue, 20 Sep 2005, Mordo, Aviran (EXP N-NANNATEK) wrote: > You can store the values as a coma separated string (which then you'll > need to parse manually b

RE: How to sort results

2005-09-21 Thread Mordo, Aviran (EXP N-NANNATEK)
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Sort.htm l -Original Message- From: Wi [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 21, 2005 11:20 AM To: java-user@lucene.apache.org Subject: How to sort results I want to sort results by some field. Of cause i ca

RE: Some error while searching the index

2005-09-28 Thread Mordo, Aviran (EXP N-NANNATEK)
You can increase the maxClauseCount (default is 1024), or use filters. HTH Aviran http://www.aviransplace.com -Original Message- From: tirupathi reddy [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 28, 2005 6:50 AM To: java-user@lucene.apache.org Subject: Some error while searchin

RE: indexing documents from 1857

2005-09-28 Thread Mordo, Aviran (EXP N-NANNATEK)
Since lucene works only with strings, you can simply write your own string representation of the date (simple mmdd would work just fine) HTH Aviran http://www.aviransplace.com -Original Message- From: Renaud Richardet [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 28, 2005 10:

RE: IndexSearcher in servlet containers

2005-10-05 Thread Mordo, Aviran (EXP N-NANNATEK)
There where no problems for me. Do you use the same IndexReader for all your searchers ? Aviran http://www.aviransplace.com -Original Message- From: Cyril Barlow [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 05, 2005 9:15 AM To: java-user@lucene.apache.org Subject: IndexSearcher in

RE: IndexWriter.optimize() need to much time.

2005-10-05 Thread Mordo, Aviran (EXP N-NANNATEK)
The index is available for search even during optimization, you should not have any problem with that. Aviran http://www.aviransplace.com -Original Message- From: Eric Louvard [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 05, 2005 10:10 AM To: java-user@lucene.apache.org Subject: I

RE: IndexWriter.optimize() need to much time.

2005-10-05 Thread Mordo, Aviran (EXP N-NANNATEK)
'temp-index' I need to merge it with the optimized index how can I do it ? Thanks. Éric Mordo, Aviran (EXP N-NANNATEK) wrote: >The index is available for search even during optimization, you should not >have any problem with that. > >Aviran >http://www.aviransplac

RE: Lucene Security Advice

2005-10-05 Thread Mordo, Aviran (EXP N-NANNATEK)
The simple solution is to put each section in a separate field and query the appropriate fields according to the user group. Aviran http://www.aviransplace.com -Original Message- From: Steven Thompson [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 05, 2005 2:04 PM To: java-user@luce

RE: What is a Hits object?

2005-10-05 Thread Mordo, Aviran (EXP N-NANNATEK)
Hits is a list of reference points to Documents, it does not contain the entire document, only when you ask for a document it goes and read the document from the index Aviran http://www.aviransplace.com -Original Message- From: Cyril Barlow [mailto:[EMAIL PROTECTED] Sent: Wednesday, Octo

RE: Can't find record when I'm sure I should

2005-10-11 Thread Mordo, Aviran (EXP N-NANNATEK)
You might want to check your analyzer, it might trims or ignore these names. Aviran http://www.aviransplace.com -Original Message- From: Dan Quaroni [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 11, 2005 2:22 PM To: java-user@lucene.apache.org Subject: Can't find record when I'm sure

RE: One index or 2 indices

2005-10-11 Thread Mordo, Aviran (EXP N-NANNATEK)
Well there isn't really much difference. If you have large amount of data then I would suggest 2 indexes, but not then one index will work too. HTH Aviran http://www.aviransplace.com -Original Message- From: Sharma, Siddharth [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 11, 2005 2:

RE: Hits sorted

2005-10-11 Thread Mordo, Aviran (EXP N-NANNATEK)
Just use the Sort option in the searcher http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Searcher .html#search(org.apache.lucene.search.Query,%20org.apache.lucene.search. Sort) Aviran http://www.aviransplace.com -Original Message- From: Daniel Cortes [mailto:[EMAIL PROTECT

RE: Non-scoring fields

2005-10-24 Thread Mordo, Aviran (EXP N-NANNATEK)
You can also use a filter to filter your results. As far as I know Filter does not effect the score HTH Aviran http://www.aviransplace.com -Original Message- From: Maik Schreiber [mailto:[EMAIL PROTECTED] Sent: Monday, October 24, 2005 2:24 PM To: java-user@lucene.apache.org Subject: R

RE: Searching Special Characters

2005-11-15 Thread Mordo, Aviran (EXP N-NANNATEK)
You can use your own Analyzer to support special characters. Just process the special characters in your analyzer Aviran http://www.aviransplace.com -Original Message- From: Lucene User [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 15, 2005 11:00 AM To: java-user@lucene.apache.org S

RE: Basic lucene usage

2005-12-06 Thread Mordo, Aviran (EXP N-NANNATEK)
Lucene is thread safe, it is recommended that you only have one IndexSearcher instance. No problems with multiple searches on the same IndexSearcher. You can index while searching, as soon as you want the new entries to be found by the IndexSearcher, just get a new instance of IndexSearcher Avira

RE: delete and optimize

2005-12-08 Thread Mordo, Aviran (EXP N-NANNATEK)
Well the best way in my opinion is to: 1) open the IndexReader and delete some documents from the same index 2) close the IndexReader 3) open IndexWriter and index documents 4) optimize the indexWriter and close the indexWriter For best performance you want the optimization to be

RE: delete and optimize

2005-12-08 Thread Mordo, Aviran (EXP N-NANNATEK)
earch time" The approach1 does deletion on an optimized index. So, the number of index files are the same as before the deletion. -Original Message- From: Mordo, Aviran (EXP N-NANNATEK) [mailto:[EMAIL PROTECTED] Sent: Thursday, December 08, 2005 1:16 PM To: java-user@lucene.apache.org Sub

RE: searching portions of an index

2005-12-21 Thread Mordo, Aviran (EXP N-NANNATEK)
You approach is correct but you should use groups instead of users. So just give a group permission, and add users to groups, this way you don't have to worry about reindexing when adding more users, just add the user to the group. Aviran http://www.aviransplace.com -Original Message- Fro

RE: Basic lucene usage

2005-12-22 Thread Mordo, Aviran (EXP N-NANNATEK)
splace.com -Original Message- From: Mordo, Aviran (EXP N-NANNATEK) [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 06, 2005 12:19 PM To: java-user@lucene.apache.org Subject: RE: Basic lucene usage Lucene is thread safe, it is recommended that you only have one IndexSearcher instance. No probl

RE: Basic lucene usage

2005-12-22 Thread Mordo, Aviran (EXP N-NANNATEK)
sage I am unable to delete those files while an indexsearcher has them open. Are you sure that's possible (question/response #1)? I sorta have to stick to a single directory. I guess I'll have to work around that. -Original Message- From: Mordo, Aviran (EXP N-NANNATEK) [mai

RE: Does anybody here do some efforts about RSS/Blog search?

2006-02-07 Thread Mordo, Aviran (EXP N-NANNATEK)
Technorati is based on lucene -Original Message- From: 盖世豪侠 [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 07, 2006 11:40 AM To: java-user@lucene.apache.org Subject: Does anybody here do some efforts about RSS/Blog search? I'm interested in this topic. See if we can exchange some ide

RE: Speedup indexing process

2006-02-20 Thread Mordo, Aviran (EXP N-NANNATEK)
After indexing is done, you can copy the index files and merge them to one large index. Or you can maintain several small indexes and search across indexes. Aviran http://www.aviransplace.com -Original Message- From: Java Programmer [mailto:[EMAIL PROTECTED] Sent: Friday, February 17, 2

RE: Searching in paths

2006-03-14 Thread Mordo, Aviran (EXP N-NANNATEK)
You need to index the field as a keyword, or use an analyzer that will not strip the / from the string Aviran http://www.aviransplace.com -Original Message- From: Java Programmer [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 14, 2006 11:28 AM To: java-user@lucene.apache.org Subject: Se

What is the largest index(s) size lucene can support

2006-03-29 Thread Mordo, Aviran (EXP N-NANNATEK)
I know Lucene can have multiple indexes and have a parallel search across indexes. The question I have is what is the largest number of documents Lucene can support with multiple distributed indexes. Or if to be more specific, can Lucene support BILLIONS of documents (across multiple indexes), and

RE: adding new fields to index

2006-05-17 Thread Mordo, Aviran (EXP N-NANNATEK)
No, Lucene does not have an update index option, you need to reindex Aviran http://www.aviransplace.com -Original Message- From: Harini Raghavan [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 17, 2006 12:59 PM To: java-user@lucene.apache.org Subject: adding new fields to index Hi All,

RE: Changing the scoring (newest doc date first)

2006-05-17 Thread Mordo, Aviran (EXP N-NANNATEK)
When you write your query, you can add a date range with a boot factor for this field, i.e boost y a factor x the documents that have a date of today, boost by x-1 the documents from the past wee, boost by x-2 the documents from the past two weeks, etc'. This will not be a perfect sort on the dat

RE: Avoiding ParseExceptions

2006-06-06 Thread Mordo, Aviran (EXP N-NANNATEK)
Basically you need to pre-process the query and rewrite it in a way you think it should be. Then catch the parse exception if you failed to rewrite the query and display an error message on the screen (something like - This kind of query is not supported, please rephrase your query). HTH Aviran h

RE: Property comparison possible??

2006-06-08 Thread Mordo, Aviran (EXP N-NANNATEK)
AFIK it is not possible to perform this kind of query with Lucene Aviran http://www.aviransplace.com -Original Message- From: Robert Haycock [mailto:[EMAIL PROTECTED] Sent: Thursday, June 08, 2006 12:59 PM To: java-user@lucene.apache.org Subject: Property comparison possible?? Is it po

RE: How can I tell Lucene to also use analyzer for Keyword fields

2006-06-12 Thread Mordo, Aviran (EXP N-NANNATEK)
What you are asking is not possible. The whole purpose of the analyzer is to tokenize the fields, so if you want them to be tokenized don't use the Keyword fields. If you want to use both tokenized and untokenized just create another filed that will be tokenized. Aviran http://www.aviransplace.co

RE: Related documents ...

2006-06-12 Thread Mordo, Aviran (EXP N-NANNATEK)
You'll need to run two queries. One for the user's query. Then if you need to get the related books, collect all the related books from the results, build a second query that will query the BookId field for all the related books (create an OR query for all the related bookIDs). Then merge the resu

RE: How can I tell Lucene to also use analyzer for Keyword fields

2006-06-12 Thread Mordo, Aviran (EXP N-NANNATEK)
-user@lucene.apache.org Subject: Re: How can I tell Lucene to also use analyzer for Keyword fields Mordo, Aviran (EXP N-NANNATEK) wrote: > What you are asking is not possible. The whole purpose of the analyzer > is to tokenize the fields, so if you want them to be tokenized don't &

RE: about PrefixQuery Matching

2006-06-13 Thread Mordo, Aviran (EXP N-NANNATEK)
The query should be test* The brackets will be eliminated by the analyzer Aviran http://www.aviransplace.com -Original Message- From: Flik Shen [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 13, 2006 6:07 AM To: java-user@lucene.apache.org Subject: about PrefixQuery Matching When I s

RE: Document Get question

2006-08-24 Thread Mordo, Aviran (EXP N-NANNATEK)
It is up to you. What ever you put in the document during indexing you'll get back. If you'll add a field of just the document name you can retrieve that, or just parse the file name from the path. Aviran http://www.aviransplace.com -Original Message- From: Mag Gam [mailto:[EMAIL PROTEC

RE: Newbie question: lucene sorting problems and stored fields

2006-09-14 Thread Mordo, Aviran (EXP N-NANNATEK)
AFIK, the field has to be indexed, but I don't think it has to be stored (but then again maybe I'm wrong) Aviran http://www.aviransplace.com -Original Message- From: Alan Boshier [mailto:[EMAIL PROTECTED] Sent: Thursday, September 14, 2006 11:39 AM To: java-user@lucene.apache.org Subjec