RE: Oracle and Lucene Integration

2006-11-23 Thread Vladimir Olenin
I don't think you can define rowid on 'insert' operations (ie, when a new entry in the table is created) - it's a 'hidden'/automatic field Oracle maintains itself... Vlad -Original Message- From: Marcelo Ochoa [mailto:[EMAIL PROTECTED] Sent: Thursday, November 23, 2006 7:23 AM To: java-u

RE: Oracle and Lucene Integration

2006-11-22 Thread Vladimir Olenin
Hi, Marcelo, Yes, putting it in the public space would be great. I personally would be very interested to have a look. Can it be posted on the 'lucene' website? Vlad -Original Message- From: Marcelo Ochoa [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 22, 2006 8:10 AM To: java-us

RE: Google Coop - Lucene style

2006-11-09 Thread Vladimir Olenin
I think it's pretty straighforward: the 'custom search engine' is essentially the 'filter' that can also modify score weights of found documents. I'd say 'coop engine' + 'your query' should be relatively easily reducted into your 'your extended query', once you subsitute 'coop engine' with 'query p

RE: Update an existing index

2006-11-08 Thread Vladimir Olenin
>From what I remember reading in the docs, you need to delete current document and create a new one with updated fields. Vlad -Original Message- From: WATHELET Thomas [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 08, 2006 6:01 AM To: java-user@lucene.apache.org Subject: Update an

RE: Help on search

2006-11-07 Thread Vladimir Olenin
You might actually try to look for some 'names database' (similar to Wordnet). Someone has probably already compiled a list of english 'names' and their common short forms (eg, 'Vlad' for 'Vladimir', 'Fred' for 'Frederich', etc). Alternatevly, compile such DB yourself (and don't forget to publis

RE: Help on search

2006-11-07 Thread Vladimir Olenin
Search for 'Fred*' if I'm not mistaken... -Original Message- From: Alice [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 07, 2006 11:51 AM To: java-user@lucene.apache.org Subject: Help on search Hello! I am totally new to Lucene and I'm trying to use it with my web application. Wh

RE: Intermittent search performance problem

2006-11-06 Thread Vladimir Olenin
Any profiler can add it's own overhead. You might try the "-verbose:gc" JVM flag (if you haven't tried it yet). The fastest way to check if you problems are GC related. Check JVM docs (or plainly - 'java -help' for more flags. There are some '-X' flags for more detailed info, as well as flags to du

RE: injecting fields looked up from DB at the runtime - Solr/Lucene question

2006-11-06 Thread Vladimir Olenin
:36 AM To: java-user@lucene.apache.org Subject: Re: injecting fields looked up from DB at the runtime - Solr/Lucene question On 11/5/06, Vladimir Olenin <[EMAIL PROTECTED]> wrote: > - when the Hits objects are returned from IndexSearcher (as a result of some search), 'inject'

injecting fields looked up from DB at the runtime - Solr/Lucene question

2006-11-04 Thread Vladimir Olenin
Hi, I wonder if the below is the correct way of doing things... - when the Hits objects are returned from IndexSearcher (as a result of some search), 'inject' 'info' fields into the 'Hit' objects at runtime by looking the values up in the DB. The main purpose is to avoid storing 'info' fields

RE: Re: lucene and web services?

2006-11-04 Thread Vladimir Olenin
You might want to check out: - Solr (WS & RESTish access to Lucene engine, both search & index) - DWR (AJAX remote access library. Not really a WS, since communication protocol is not generic at this point, but works excellent if all you need is access to POJOs from JavaScript; it's more or less

RE: Any experience with spring's lucene support?

2006-11-03 Thread Vladimir Olenin
Haven't used them, but had a look at them some time ago. Seems like a nice set of helper factory classes to manage Lucene engine through Spring IoC. Can't do much wrong in here I guess... If you'd be using Spring in your app, you'd have to come up with similar factories either way, so probably it'd

RE: experiences with lingpipe

2006-11-03 Thread Vladimir Olenin
> You need to increase the memory for java. I think 32-bit jave is limited to a 1.3 gig heap but > could be wrong. No heuristics at the tip of my fingers. 32-bit JVM under Linux/Windows. Solaris runs OK. Limit on the heap is ~1.7 - 1.8Gb. -Original Message- From: Breck Baldwin [mailto:[EM

RE: Wildcard Search and "Note: You cannot use a * or ? symbol as the first character of a search"

2006-10-20 Thread Vladimir Olenin
Don't know Lucene internals, but I'd say you'd have to create your own 'reverse' B-Tree of some kind (Lucene gurus will probably advise you on the place where this can be changed in the Lucene). Even if this functionality can't be redefined in Lucene itself, you can easily implement it by yourself

RE: Advantage of putting lucene index in RDBMS

2006-10-05 Thread Vladimir Olenin
As one of the people who asked about placing indeces into RDBMS, I was primarily interested in just storing index in the RDBMS (basically, storing the structures described on this page http://lucene.apache.org/java/docs/fileformats.html in the relational DB). The main reason is NOT to be able to pe

native Java DB (eg, Derby) to store the index: performance comparision?..

2006-10-03 Thread Vladimir Olenin
sync with data, since Lucene will reuse PKs and indexes from the DB So, I think the main question is whether Lucene custom way of maintaining _and accessing_ the index is (much?) more efficient than that one of available open source native Java DBs (Derby, etc) Thanks! Vladimir Olenin Software

'categorized-term' web index

2006-09-28 Thread Vladimir Olenin
Hi, I wonder if anyone knows. - is there a place I can get already crawled internet web pages in an archive (10 - 100Gb of data) - is there a place I can get already created Lucene index for these pages - is there such thing as 'categorized-terms' index, meaning each page is processed by an

RE: how to get results without getting total number of found documents?

2006-09-26 Thread Vladimir Olenin
lts. Some of these large distributed architectures will divide content into popular/recent content and older/less popular content. Approximations for total number of matching docs are calculated based on queries executed solely on the subset of popular stuff. Only queries with insufficient

how to get results without getting total number of found documents?

2006-09-26 Thread Vladimir Olenin
Hi. I couldn't find the answer to this question in the mailing list archive. In case I missed it, please let me know the keyword phrase I should be looking for, if not a direct link. All the 'Lucene' powered implementations I saw (well, primarily those utilizing Solr) return exact count of the

term OR term OR term OR .... query question

2006-09-26 Thread Vladimir Olenin
Hi. I have a question regarding Lucene scoring algorithm. Providing I have a query "a OR b OR c OR d OR e OR f", and two documents: doc1 "a b c d" and doc2 "d e", will doc1 score higher than doc2? In other words, does Lucene takes into account the number of terms matched in the document in case o

does anyone know of a 'smart' categorizing text pattern finder?

2006-09-25 Thread Vladimir Olenin
27;template decorations' parts to a set of templates, to 'guess' the nature of each of the 'page specific' block (eg, 'Vladimir Olenin' in the left side column will be marked as 'name', while whatever is adjucent to this column is the post body). So,

lucene based frameworks/servers: solr, nutch, compass - which one is for what????

2006-09-19 Thread Vladimir Olenin
Hi, Couple of people mentioned here SOLR as a 'new' Lucene based search server. But NUTCH is also Lucene based. Also, there is an OpenSymphony initiative called 'Compass', which is rather an integration framework than server. I wonder if anyone can come up with a small summary of what are scope

RE: Lucene-In-Action book - any details?..

2006-07-10 Thread Vladimir Olenin
l 10, 2006, at 10:29 AM, Vladimir Olenin wrote: > Hi, > > Can anyone, pls, advise, based on which version of Lucene the 'Lucene > in Action' book is written? I've looked at various releases > (http://gulus.usherbrooke.ca/pub/appl/apache/lucene/java/archive/), > and

Lucene-In-Action book - any details?..

2006-07-10 Thread Vladimir Olenin
Hi, Can anyone, pls, advise, based on which version of Lucene the 'Lucene in Action' book is written? I've looked at various releases (http://gulus.usherbrooke.ca/pub/appl/apache/lucene/java/archive/), and it seems like there was a big gap between 1.4 and 1.9 release (over a year), with 1.4 relea

What is a good book on Lucene?

2006-06-28 Thread Vladimir Olenin
I wonder what is the best book, that can be recommended as an introduction as well as 'in-depth' coverage of the latest version of Lucene? There are a few in the Internet, but I was wondering which has the most comprehensive coverage of all features, etc. Thanks! Vlad

RE: search performance benchmarks

2006-06-27 Thread Vladimir Olenin
erm queries (oh yeah, the query makeup also matters a lot) in sub seconds on a Intel dual processor (each is 3.6Ghz I think.) I frankly haven't tested out scalability yet. Jeff Emptoris, Inc. -Original Message- From: Vladimir Olenin [mailto:[EMAIL PROTECTED] Sent: Monday, June 26, 20

search performance benchmarks

2006-06-26 Thread Vladimir Olenin
Hi, I'm evaluating Lucene right now to use as a base for one open source project. I found some _indexing_ benchmarks on the lucene website (http://lucene.apache.org/java/docs/benchmarks.html), but, after a short browsing, couldn't find any 'runtime' performance benchmarks (Query speed). Only one