One thing to mention
that I am using a
MultiSearcher to rewrite the queries. I tried...
Ah. I remember this got a little ugly. The highlighter
has a Junit test that demonstrates highlighting fuzzy
queries when using a multisearcher. Take a look at
that.
I can't remember the ins and outs of the
See the highlighter's package.html for a description
of how query.rewrite should be used to solve this.
Cheers,
Mark
--- lucuser4851 [EMAIL PROTECTED] wrote:
Dear All,
We have been using the highlighter from the lucene
sandbox, which works
very nicely most of the time. However when we
A GUI plugin for Squirrel SQL (
http://squirrel-sql.sourceforge.net/) would make a
great way of configuring the mapping.
It already does all the heavy lifting for connecting
to different types of database and poking around the
internals.
I've got the bare bones of a plugin sorted (Connect to
any
Here's a rough example using a database:
Hits hits=searcher.search(q);
int numDocs=Math.min(10, hits.length());
Analyzer analyzer=new WhitespaceAnalyzer();
PreparedStatement ps=conn.prepareStatement(select
docText from myTable where pk=?);
for(int i=0;inumDocs;i++)
{
Also need http://jcifs.samba.org/ so you can spider
windows file shares.
That project also has a very nice servlet filter that
is used to provide automatic authentication of Windows
clients using the NTLM protocol.
I've added some user-defined lucene functions to
HSQLDB and I've been able to run queries like the
following one:
select top 10 lucene_highlight(adText) from ads where
pricePounds 200 and lucene_query('bass guitar
drums',id)0 order by lucene_score(id) DESC
I've had similar success with Derby
sometimes the return Stirng is none.
Is the code analyzer dependancy ?
When the highlighter.getBestFragments returns nothing
this is because there was no match found for query
terms in the TokenStream supplied.
This is nearly always because of Analyzer issues.
Check the post-analysis tokens
As Erik says,If your content is in the database surely
all you need Stored in Lucene is the primary key
anyway? (Obviously all other fields are indexed in
Lucene - just not stored)
I've been playing around with this approach using
HSQLDB and Derby (Cloudscape). This relies on having a
key map for
1 - I'm a bit concerned that reasonable stemming
(Porter/Snowball)
apparently produces non-word stems .. i.e. not
really human readable.
It is possible to derive the human-readable form of a
stemmed term using either re-analysis of indexed
content or TermPositionVector. Either of these
There is no API for this, but I recall somebody
talking about adding support for this a few months
back
See
http://marc.theaimsgroup.com/?l=lucene-devm=109485996612177w=2
This implementation was working on a version of Lucene
before compression was introduced so things may have
changed a
It still reads the data for every field in the
document
No, not if your fields are positioned in the right
order. It stops reading fields after it has got what
is needed.
If your doc has fields in the order:
smallFrequentlyReadField, largeRarelyReadField
then the patch will not read
Hi Steve,
Possibly the easiest way to handle this is to tag the
documents with a field listing the permitted
roles/groups (not the individual users).
I would be tempted to keep the information that
associates users to groups outside of the Lucene index
eg in a relational DB.
This way you do not
thank you, while I've seen the query.rewrite API, I
failed to see the application.
Lucene internally uses rewrite() to turn a
multi-term query into a simpler OR query.
Kenne* is rewritten as Kennedy OR Kennel OR Kenneth
Of course the exact terms used for expansion depends
on the contents of
Thanks, Max.
Another schoolboy error in TokenSources.java :)
More haste, less speed required on my part.
I have updated my code and will post to website
tonight. This change doesn't appear to have made a
noticeable difference in performance but the code is
cleaner.
Cheers
Mark
Hi Aviran,
The code you are calling assumes that you have indexed with TermVector support for
offsets (and optionally positions) ie code like this:
doc.add(new Field(contents, content,
Field.Store.COMPRESS, Field.Index.TOKENIZED,
I intend to release a new version of the highlighter soon that should (hopefully)
address some of the issues under discussion.
The re-design will be based on the following principles:
* A TokenStream will be passed to the highlighter to provide the source of tokens. The
token stream could be
Thanks for the info on write.lock Otis,
If that is so, should there not be 'N' at delete/delete intersection?
I'm using the same IndexReader so, no:
public synchronized final void delete(int docNum)
The same goes for write/write intersection. That should then be an 'N'
as well, no?
Again,
I think in many respects the table may be an over-simplification of
lower-level detail eg it does not show if each of the concurrent threads are
actually using the same IndexReader objects, IndexWriter objects or are even
operating in the same process (I think I read that write.lock file
Is it possible to configure your app server to have just one message driven
bean instance in the pool? Obviously this is not a solution in general to
concurrent access to Lucene but would remove the need for multiple
IndexWriters in your particular case and give you the same overall throughput
19 matches
Mail list logo