Re: Lucene Developer

2007-05-11 Thread xin huang
I want to get this job,you could see my resume thank you 2007/5/11, [EMAIL PROTECTED] <[EMAIL PROTECTED]>: We are a startup company based in the city of Sheffield, UK actively seeking experienced java programmers to develop intelligent web mining systems using Apache Lucene/Nutch. Experience

Re: Lucene Developer

2007-05-11 Thread Erik Hatcher
On May 11, 2007, at 7:49 PM, Chris Hostetter wrote: ...pesonally, i can barely keep up with all of my lucene email as is, if i wantedto wade through a bunch of email from companies looking to hire and people looking for work i'd go read monster.com :) or tune into the Lucene-powered goodne

RE: IndexReader.isCurrent very slow in 2.1

2007-05-11 Thread Andreas Guther
Chris, I have optimized our index directories using the compound index format. I have also moved the index directories for testing purposes local to the search process (before it was over network and shared NTFS file system). Now the time for getting the isCurrent information is negligible, i.e.

Re: Lucene Developer

2007-05-11 Thread Chris Hostetter
: > http://wiki.apache.org/lucene-java/Support : : That's more a list of people/companies providing Lucene-related services, : not so much for people looking for a new job. For me think it's okay if : these job offers are posted here. Hmmm .. not sure if i would agree, i know that it's ge

Re: IndexReader.isCurrent very slow in 2.1

2007-05-11 Thread Chris Hostetter
: I am experiencing a same problem with some 40 segments. Chris, Do you have do you have 40 segments, or do you have 40 files matching the glob segement* .. there is a differnece (the "segment" files records the number of segments, as of 2.1 they are versioned so they have names like "segments_7"

Re: Numerical fields

2007-05-11 Thread Chris Hostetter
: I know that usually one has to index such fields as text with : the property a > b => lex(text(a)) > lex(text(b)) and devise : the text(n) transformation appropriately. : : What I'm looking for is an enhancement which would eliminate : the a -> text(a) transformation or simplify it. Is it nece

Re: Indexing the ORACLE using lucene

2007-05-11 Thread Steven Rowe
Krishna Prasad Mekala wrote: > I have to create the index from my Oracle database. Can anybody tell me > how to create the index from Oracle using lucene? Check out Marcelo Ochoa's Oracle/Lucene integration: http://issues.apache.org/jira/browse/LUCENE-724 Steve --

Re: Mixing Case and Case-Insensitive Searching

2007-05-11 Thread Yonik Seeley
On 5/11/07, Walt Stoneburner <[EMAIL PROTECTED]> wrote: In this tutorial he stresses not once, not twice, but three times that the same Analyzer that is used to build an index -must- also be used when performing a Query. There is great detail explaining why this is so. However, in order to get

Mixing Case and Case-Insensitive Searching

2007-05-11 Thread Walt Stoneburner
Time to give a little something back to the Lucene community, even if it's just a little knowledge for the maintainers... Back on 17-Apr-2007 (for those searching the archives), I expressed a need to match on queries using an intermix of case-sensitive with case-insensitive terms. The example th

RE: IndexReader.isCurrent very slow in 2.1

2007-05-11 Thread Andreas Guther
We have everything on Windows NTFS. Our index folders are on a server and accessed via shared drive. I haven't optimized the folders yet but after doing optimization on a test folder I noticed that we have very little files left. That might help. I am going to optimize all folders now and then

Re: Numerical fields

2007-05-11 Thread Doron Cohen
karl wettin wrote: > > 11 maj 2007 kl. 18.16 skrev Stadler Hans-Christian: > > > Is there an enhancment/plugin to Lucene which would allow > > queries like > > > > myNumericalField > 100 > > FunctionQuery might be what you are looking for. > > http://issues.apache.org/jira/browse/LUCENE-446 LUCEN

Re: Indexing the ORACLE using lucene

2007-05-11 Thread Chris Lu
Just let you know DBSight already handles the index synchronization with the database, including incremental indexing, with either soft-deleted or hard-deleted records. And it's free without any size limit if you just need to create an index from the database. You don't need to do any java coding

Re: IndexReader.isCurrent very slow in 2.1

2007-05-11 Thread jafarim
I am experiencing a same problem with some 40 segments. Chris, Do you have any recommendation on the file system to use? On 5/11/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : Are there are large number of files in your index directory? and is there any correlation between the number files

Re: Lucene Developer

2007-05-11 Thread Daniel Naber
On Friday 11 May 2007 19:21, Chris Hostetter wrote: > Please do not send resume requests to any of the @lucene email lists. > There is a wiki page listing parties available for hire who are > knowledgable in Lucene for this explicit purpose... > > http://wiki.apache.org/lucene-java/Support

Re: Simple, always do wildcard or fuzzy query

2007-05-11 Thread Daniel Naber
On Thursday 10 May 2007 23:09, bbrown wrote: > I think this is a simple question; or dont know. Is there a way to > automatically convert all tokens to wildcard query with any given input. Either just append the "*" before you pass your terms, or extend QueryParser and overwrite getFieldQuery()

Re: IndexReader.isCurrent very slow in 2.1

2007-05-11 Thread Chris Hostetter
: Are there are large number of files in your index directory? and is there any correlation between the number files matching segment* and the time isCurrent taks? it would also be handy to know what filesystem you use as well ... directory listings may be more expensive on some filesystems then

Re: add in an existing document

2007-05-11 Thread jafarim
How about this idea: - a special Identifier field. - A DocumentHash class which calculates a hash value from a Document. - A query on Identifier before inserting new Documents to check if it already exists. --jaf On 5/10/07, STEFANOS STEFANOS <[EMAIL PROTECTED]> wrote: Hello, I would li

Re: How to re-open the IndexSearcher's IndexReader

2007-05-11 Thread Chris Hostetter
: Opening a new IndexReader is trivial. But then how do I set the : IndexSearcher's reader to the new one without getting a new instance? just reopen a new IndexSearcher ... IndexSearchers are extremely light weight and cheap, the meat they have is the IndexReader itself, so there wouldn't be

Re: IndexReader.isCurrent very slow in 2.1

2007-05-11 Thread Michael McCandless
"Andreas Guther" <[EMAIL PROTECTED]> wrote: > I moved today from Lucene 2.0 to 2.1 and I noticed that the > IndexReader.isCurrent() call is very expensive. What took 20 > milliseconds in 2.0 now takes seconds in 2.1. > > I have the following scenario: > > - 7 index directories of different size

Re: Lucene Developer

2007-05-11 Thread Chris Hostetter
Please do not send resume requests to any of the @lucene email lists. There is a wiki page listing parties available for hire who are knowledgable in Lucene for this explicit purpose... http://wiki.apache.org/lucene-java/Support : We are a startup company based in the city of Sheffield,

Re: Indexing the ORACLE using lucene

2007-05-11 Thread bbrown
On Fri, 11 May 2007 09:02:04 -0400, Erick Erickson wrote > Search the mail archive for Oracle, and there's lengthy discussion. The > short form is that you query your database, taking selected > data from it and add it to a Lucene document, then write the > document to your Lucene index. Repeat thi

IndexReader.isCurrent very slow in 2.1

2007-05-11 Thread Andreas Guther
I moved today from Lucene 2.0 to 2.1 and I noticed that the IndexReader.isCurrent() call is very expensive. What took 20 milliseconds in 2.0 now takes seconds in 2.1. I have the following scenario: - 7 index directories of different size, ranging from some MB to 5 GIG - Some index are upgraded

Re: Numerical fields

2007-05-11 Thread karl wettin
11 maj 2007 kl. 18.16 skrev Stadler Hans-Christian: Is there an enhancment/plugin to Lucene which would allow queries like myNumericalField > 100 FunctionQuery might be what you are looking for. http://issues.apache.org/jira/browse/LUCENE-446 -- karl --

Re: optimization behaviour

2007-05-11 Thread karl wettin
10 maj 2007 kl. 21.29 skrev Yonik Seeley: On 5/10/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: Deleted documents are removed on segment merges (for documents marked as deleted in those segments). Of course, that doesn't have to be the case. It would be a trivial change to merge segments and

Numerical fields

2007-05-11 Thread Stadler Hans-Christian
Is there an enhancment/plugin to Lucene which would allow queries like myNumericalField > 100 I know that usually one has to index such fields as text with the property a > b => lex(text(a)) > lex(text(b)) and devise the text(n) transformation appropriately. What I'm looking for is an enhance

Re: optimization behaviour

2007-05-11 Thread Yonik Seeley
On 5/10/07, karl wettin <[EMAIL PROTECTED]> wrote: > Deleted documents are removed on segment merges (for documents marked > as deleted in those segments). > Due to the nature of an inverted index, it's impossible w/o going over > the complete index looking for all references to that docid. Wha

Re: Indexing the ORACLE using lucene

2007-05-11 Thread Erick Erickson
Search the mail archive for Oracle, and there's lengthy discussion. The short form is that you query your database, taking selected data from it and add it to a Lucene document, then write the document to your Lucene index. Repeat this for as many "documents" as you need. There are a large number

Re: How to re-open the IndexSearcher's IndexReader

2007-05-11 Thread Erick Erickson
As far as I know, you have to open a new instance. I don't think this is solved in Lucene 2.1 at all. I do remember discussions a while ago where the general idea was to open a new reader and, perhaps, fire a few "primer" queries at it on some background thread. Then, your main thread can switch

Re: Stop words (how to create ideal set of stop words?)

2007-05-11 Thread mark harwood
>For some reason, the application of zipf's law comes to mind, whereby >you could look at the most commonly occurring words and >mathematically deduce which ones are "too" common, but where your >cutoff is still may be difficult to choose. I recently gave Andrzej a "Zipf visualisation" plug-

Help IndexWriter,Multi-threaded index access

2007-05-11 Thread legrand thomas
Hello, I work on a web application deployed on a Tomcat server 5. Many jsp front pages (thanks to controllers) query a single manager (retrieved by a factory as an instance). This manager deals with Lucene index, stored by using a FSDirectory, to create several kind of documents, append or rem

Re: Stop words (how to create ideal set of stop words?)

2007-05-11 Thread Grant Ingersoll
Use Lucener's tend to be more practically oriented! :-) For some reason, the application of zipf's law comes to mind, whereby you could look at the most commonly occurring words and mathematically deduce which ones are "too" common, but where your cutoff is still may be difficult to choose

Lucene Developer

2007-05-11 Thread recruitment
We are a startup company based in the city of Sheffield, UK actively seeking experienced java programmers to develop intelligent web mining systems using Apache Lucene/Nutch. Experience of genetic programming would be an advantage but is not essential. If you are interested in this fasci

Indexing the ORACLE using lucene

2007-05-11 Thread Krishna Prasad Mekala
Hi all, I am new to Lucene. I am developing a small search utility using lucene. I have to create the index from my Oracle database. Can anybody tell me how to create the index from Oracle using lucene? Please send me code snippets if possible. Your valuable help is highly appreciated.

Is the Similarity Algorithm of Lucene Better Than Standard VSM?

2007-05-11 Thread 胡宝顺
Hi,all: Is the Similarity Algorithm of Lucene Better Than Standard VSM(vector space model)? Are there any papers that have proved this? Could you send me this paper? Thanks a lot. [EMAIL PROTECTED] - To unsubscribe, e-mail: [EM

RE: Help IndexWriter,Multi-threaded index access

2007-05-11 Thread Fang_Li
Hi, You cannot create more than one indexwriter for one index instance. But you can share the indexwriter through multi servlets or threads. Don't open a new IndexWriter in different threads, reuse the old one. Regards, -Original Message- From: legrand thomas [mailto:[EMAIL PRO

Help IndexWriter,Multi-threaded index access

2007-05-11 Thread legrand thomas
Hello, I work on a web application deployed on a Tomcat server 5. Many jsp front pages (thanks to controllers) query a single manager (retrieved by a factory as an instance). This manager deals with Lucene index, stored by using a FSDirectory, to create several kind of documents, append or rem