RE: Using the highlighter from the sandbox with a prefix query.

2005-02-21 Thread mark harwood
One thing to mention that I am using a MultiSearcher to rewrite the queries. I tried... Ah. I remember this got a little ugly. The highlighter has a Junit test that demonstrates highlighting fuzzy queries when using a multisearcher. Take a look at that. I can't remember the ins and outs of the

Re: Using the highlighter from the sandbox with a prefix query.

2005-02-17 Thread mark harwood
See the highlighter's package.html for a description of how query.rewrite should be used to solve this. Cheers, Mark --- lucuser4851 [EMAIL PROTECTED] wrote: Dear All, We have been using the highlighter from the lucene sandbox, which works very nicely most of the time. However when we

Re: Configurable indexing of an RDBMS, has it been done before?

2005-02-09 Thread mark harwood
A GUI plugin for Squirrel SQL ( http://squirrel-sql.sourceforge.net/) would make a great way of configuring the mapping. It already does all the heavy lifting for connecting to different types of database and poking around the internals. I've got the bare bones of a plugin sorted (Connect to any

Re: Highlighter: how to specify text from external source?

2005-02-08 Thread mark harwood
Here's a rough example using a database: Hits hits=searcher.search(q); int numDocs=Math.min(10, hits.length()); Analyzer analyzer=new WhitespaceAnalyzer(); PreparedStatement ps=conn.prepareStatement(select docText from myTable where pk=?); for(int i=0;inumDocs;i++) {

Re: rackmount lucene/nutch - Re: google mini? who needs it when Lucene is there

2005-01-28 Thread mark harwood
Also need http://jcifs.samba.org/ so you can spider windows file shares. That project also has a very nice servlet filter that is used to provide automatic authentication of Windows clients using the NTLM protocol.

Re: lucene query (sql kind)

2005-01-28 Thread mark harwood
I've added some user-defined lucene functions to HSQLDB and I've been able to run queries like the following one: select top 10 lucene_highlight(adText) from ads where pricePounds 200 and lucene_query('bass guitar drums',id)0 order by lucene_score(id) DESC I've had similar success with Derby

Re: text highlighting

2005-01-27 Thread mark harwood
sometimes the return Stirng is none. Is the code analyzer dependancy ? When the highlighter.getBestFragments returns nothing this is because there was no match found for query terms in the TokenStream supplied. This is nearly always because of Analyzer issues. Check the post-analysis tokens

Re: reading fields selectively /

2005-01-25 Thread mark harwood
As Erik says,If your content is in the database surely all you need Stored in Lucene is the primary key anyway? (Obviously all other fields are indexed in Lucene - just not stored) I've been playing around with this approach using HSQLDB and Derby (Cloudscape). This relies on having a key map for

Re: Newbie: Human Readable Stemming, Lucene Architecture, etc!

2005-01-21 Thread mark harwood
1 - I'm a bit concerned that reasonable stemming (Porter/Snowball) apparently produces non-word stems .. i.e. not really human readable. It is possible to derive the human-readable form of a stemmed term using either re-analysis of indexed content or TermPositionVector. Either of these

Re: reading fields selectively

2005-01-07 Thread mark harwood
There is no API for this, but I recall somebody talking about adding support for this a few months back See http://marc.theaimsgroup.com/?l=lucene-devm=109485996612177w=2 This implementation was working on a version of Lucene before compression was introduced so things may have changed a

Re: reading fields selectively

2005-01-07 Thread mark harwood
It still reads the data for every field in the document No, not if your fields are positioned in the right order. It stops reading fields after it has got what is needed. If your doc has fields in the order: smallFrequentlyReadField, largeRarelyReadField then the patch will not read

Re: Permissioning Documents

2004-12-10 Thread mark harwood
Hi Steve, Possibly the easiest way to handle this is to tag the documents with a field listing the permitted roles/groups (not the individual users). I would be tempted to keep the information that associates users to groups outside of the Lucene index eg in a relational DB. This way you do not

RE: API suggestion

2004-12-07 Thread mark harwood
thank you, while I've seen the query.rewrite API, I failed to see the application. Lucene internally uses rewrite() to turn a multi-term query into a simpler OR query. Kenne* is rewritten as Kennedy OR Kennel OR Kenneth Of course the exact terms used for expansion depends on the contents of

Re[2]: Faster highlighting with TermPositionVectors (update)

2004-11-11 Thread mark harwood
Thanks, Max. Another schoolboy error in TokenSources.java :) More haste, less speed required on my part. I have updated my code and will post to website tonight. This change doesn't appear to have made a noticeable difference in performance but the code is cleaner. Cheers Mark

RE: Faster highlighting with TermPositionVectors

2004-11-04 Thread mark harwood
Hi Aviran, The code you are calling assumes that you have indexed with TermVector support for offsets (and optionally positions) ie code like this: doc.add(new Field(contents, content, Field.Store.COMPRESS, Field.Index.TOKENIZED,

RE : Performance of hit highlighting and finding term positions for a specific document

2004-03-31 Thread mark harwood
I intend to release a new version of the highlighter soon that should (hopefully) address some of the issues under discussion. The re-design will be based on the following principles: * A TokenStream will be passed to the highlighter to provide the source of tokens. The token stream could be

Re: Thread safety

2002-06-21 Thread Mark Harwood
Thanks for the info on write.lock Otis, If that is so, should there not be 'N' at delete/delete intersection? I'm using the same IndexReader so, no: public synchronized final void delete(int docNum) The same goes for write/write intersection. That should then be an 'N' as well, no? Again,

Re: Thread safety

2002-06-14 Thread Mark Harwood
I think in many respects the table may be an over-simplification of lower-level detail eg it does not show if each of the concurrent threads are actually using the same IndexReader objects, IndexWriter objects or are even operating in the same process (I think I read that write.lock file

Re: Locking with IndexWriter

2002-05-23 Thread Mark Harwood
Is it possible to configure your app server to have just one message driven bean instance in the pool? Obviously this is not a solution in general to concurrent access to Lucene but would remove the need for multiple IndexWriters in your particular case and give you the same overall throughput