Hi,
I am quite new to the Lucene API. I find the Field-constructor
unintuitive. Maybe I have misunderstood it. Let's find out...
It can be used either as:
new Field("field", "data", Store.NO, TOKENIZED)
or:
new Field("field", "data", Store.NO, NO_NORM)
As I understand it NO_NORM and TOKENIZED
And one other point. You probably *don't* need a search engine for your
database *if* you don't have much textual data. That is, if your database
consists of "classical" tables with columns like "firstname", "lastname",
etc.
But if your database has columns in it containing, say, a page of text th
Hi, Nick,
Lucene Index in a sense is more like another kind of database indexes,
because it's inverted, etc.
If we ask why we need many database indexes, the answer is, different
query execution path.
Same thing for Lucene index, which is faster for term matching.
Lucene index actually can do mo
Almost by definition, you have to write your own analyzer. This may be as
simple as chaining another filter into one of the regular analyzers or as
complex as defining your own grammar.
As far as I know, there's no "keep word" list. But that would be an
interesting addition. That is, a variety of
Fellows,
I'm working on a project here where we are trying to use our lucene
indexes to return concrete objects. One of the things we want to be
able to match by is by vocabulary terms annotated to that object, as
well as all of the child vocabulary terms of that annotated term.
So, what I
The bigger picture here is NFS-safety. When I run a search, I hand off the
search results to another thread so that they can be processed as necessary --
in particular so that they can be JOINed with a SQL DB -- but I don't want to
completely lock the index from writes while doing a bunch of SQ
Not necessarily, many of the high traffic search sites on the market today
for everything from yellow pages to job boards to ecommerce sites use search
engines to exclusively search *and* retrieve/serve content. The key is that
they don't have to return all matching rows only the 'best' which are
Hmm, I guess that's because a database query returns a list of records,
whereas search engine returns only the links, not the actual content.
So a search engine works only in the index space, whereas a database
query engine would have to work in both index and content space...
ND
-Original M
I saw some discussion in the archives some time ago about the fact that
C++ is tokenized as C in the StandardAnalyzer; this seems to still be the
case; I was wondering if there is a simple way for me to get the behavior
I want for C++ (that it is tokenized as C++) in particular, and perhaps
for
Don't forget the number 1 reason: speed. For certain types of queries a
search engine can return results orders of magnitude faster than a database.
I've seen search engines return hits in hundreds of milliseconds when the
same database query took hours or even days. That's not to say that a
sear
Indexing with lucene/nutch on top of/instead of DB indexing for:
1) relativity scoring
2) alias searching (i.e. a large amount of aliases, like first names)
3) highlighting
4) cross-datasource searching (multi DB, DB + XML files, etc).
As for best approach to externally index, I do not have any d
Could anyone provide any insight on why someone would use nutch/lucene
or any other search engines to index relational databases? With use
cases if possible? Shouldn't the database's own indexing mechanism be
used since it is more efficient?
If there is such a need of indexing the database conten
Hi,
I currently use multiple fieldable instances for indexing sentences of a
document.
When there is only one single fieldable instance, the token offset
generation performed in DocumentWriter is correct.
The problem appears when there is two or more fieldable instances. In
DocumentWriter$Fiel
13 matches
Mail list logo