NO_NORM and TOKENIZED

2008-03-04 Thread Tobias Hill
Hi, I am quite new to the Lucene API. I find the Field-constructor unintuitive. Maybe I have misunderstood it. Let's find out... It can be used either as: new Field("field", "data", Store.NO, TOKENIZED) or: new Field("field", "data", Store.NO, NO_NORM) As I understand it NO_NORM and TOKENIZED

Re: Why indexing database is necessary? (RE: indexing database)

2008-03-04 Thread Erick Erickson
And one other point. You probably *don't* need a search engine for your database *if* you don't have much textual data. That is, if your database consists of "classical" tables with columns like "firstname", "lastname", etc. But if your database has columns in it containing, say, a page of text th

Re: Why indexing database is necessary? (RE: indexing database)

2008-03-04 Thread Chris Lu
Hi, Nick, Lucene Index in a sense is more like another kind of database indexes, because it's inverted, etc. If we ask why we need many database indexes, the answer is, different query execution path. Same thing for Lucene index, which is faster for term matching. Lucene index actually can do mo

Re: C++ as token in StandardAnalyzer?

2008-03-04 Thread Erick Erickson
Almost by definition, you have to write your own analyzer. This may be as simple as chaining another filter into one of the regular analyzers or as complex as defining your own grammar. As far as I know, there's no "keep word" list. But that would be an interesting addition. That is, a variety of

Looking for an example of Using Position Increment Gap

2008-03-04 Thread Matthew Hall
Fellows, I'm working on a project here where we are trying to use our lucene indexes to return concrete objects. One of the things we want to be able to match by is by vocabulary terms annotated to that object, as well as all of the child vocabulary terms of that annotated term. So, what I

Re: More IndexDeletionPolicy questions

2008-03-04 Thread Tim Brennan
The bigger picture here is NFS-safety. When I run a search, I hand off the search results to another thread so that they can be processed as necessary -- in particular so that they can be JOINed with a SQL DB -- but I don't want to completely lock the index from writes while doing a bunch of SQ

RE: Why indexing database is necessary? (RE: indexing database)

2008-03-04 Thread Will Johnson
Not necessarily, many of the high traffic search sites on the market today for everything from yellow pages to job boards to ecommerce sites use search engines to exclusively search *and* retrieve/serve content. The key is that they don't have to return all matching rows only the 'best' which are

RE: Why indexing database is necessary? (RE: indexing database)

2008-03-04 Thread Duan, Nick
Hmm, I guess that's because a database query returns a list of records, whereas search engine returns only the links, not the actual content. So a search engine works only in the index space, whereas a database query engine would have to work in both index and content space... ND -Original M

C++ as token in StandardAnalyzer?

2008-03-04 Thread Donna L Gresh
I saw some discussion in the archives some time ago about the fact that C++ is tokenized as C in the StandardAnalyzer; this seems to still be the case; I was wondering if there is a simple way for me to get the behavior I want for C++ (that it is tokenized as C++) in particular, and perhaps for

RE: Why indexing database is necessary? (RE: indexing database)

2008-03-04 Thread Will Johnson
Don't forget the number 1 reason: speed. For certain types of queries a search engine can return results orders of magnitude faster than a database. I've seen search engines return hits in hundreds of milliseconds when the same database query took hours or even days. That's not to say that a sear

RE: Why indexing database is necessary? (RE: indexing database)

2008-03-04 Thread Darren Hartford
Indexing with lucene/nutch on top of/instead of DB indexing for: 1) relativity scoring 2) alias searching (i.e. a large amount of aliases, like first names) 3) highlighting 4) cross-datasource searching (multi DB, DB + XML files, etc). As for best approach to externally index, I do not have any d

Why indexing database is necessary? (RE: indexing database)

2008-03-04 Thread Duan, Nick
Could anyone provide any insight on why someone would use nutch/lucene or any other search engines to index relational databases? With use cases if possible? Shouldn't the database's own indexing mechanism be used since it is more efficient? If there is such a need of indexing the database conten

Incorrect Token Offset when using multiple fieldable instance

2008-03-04 Thread Renaud Delbru
Hi, I currently use multiple fieldable instances for indexing sentences of a document. When there is only one single fieldable instance, the token offset generation performed in DocumentWriter is correct. The problem appears when there is two or more fieldable instances. In DocumentWriter$Fiel