Re: RAMDirectory vs MemoryIndex

2006-11-21 Thread Wolfgang Hoschek
On Nov 21, 2006, at 12:38 PM, jm wrote: Ok, thanks, I'll give MemoryIndex a go, and if that is not good enoguh I will explore the other options then. To get started you can use something like this: for each document D: MemoryIndex index = createMemoryIndex(D, ...) for each query Q:

Re: a fair similarity

2006-11-21 Thread Bob Carpenter
Michael D. Curtin wrote: Daniel Naber wrote: Hi, as some of you may have noticed, Lucene prefers shorter documents over longer ones, i.e. shorter documents get a higher ranking, even if the ratio matched terms / total terms in document is the same. There's even more interesting kinds of

Re: is there any way to find unique records ?

2006-11-21 Thread Steven Rowe
Bhavin, Mark Harwood gives a solution that looks almost exactly like what you want: http://www.mail-archive.com/java-user@lucene.apache.org/msg05154.html Steve Chris Hostetter wrote: serach the archives for faceted searching and category counts and you should find lots of discussions on

Re: Sorting on distance from a long/lat

2006-11-21 Thread Chris Hostetter
I'm not really sure what an approach like this gaines you ... it provides a mechanism for ensuring that the lat/lon of all results are within a bounding box arround your start location -- but those bounding boxes are fixed when building your index. couldn't you achieve the same thing using a lat

RE: Q: Highlighter + Search symbols *, ?, ~

2006-11-21 Thread Storey, Jeff
Thanks for the quick reply. I'll be implementing this in the next couple of days. Appreciate it! Jeff -Original Message- From: Stephan Spat [mailto:[EMAIL PROTECTED] Sent: Monday, November 20, 2006 8:43 AM To: java-user@lucene.apache.org Subject: Re: Q: Highlighter + Search symbols *,

Re: Limiting QueryParser

2006-11-21 Thread Steven Rowe
static String QueryParser.escape(String) should do the trick: http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html#escape(java.lang.String) Look at the bottom of the below-linked page for the list of characters that the above method will escape:

Re: Limiting QueryParser

2006-11-21 Thread Chris Hostetter
: important:conference agenda : I want to end up with : : +subject:important +subject:conference +subject:agenda : : I've written something to do this, but I know it is not as clever as QP as : currently it can only create BooleanQueries with TermQueries and cannot handle : PhraseQuery, so would

is there any way to find unique records ?

2006-11-21 Thread Bhavin Pandya
Hi, In lucene, is there any way to find only unique records from a single field ..? otherwise unnecessary i have to itereate through Hits and find out unique... plz help.. - Bhavin pandya

Re: NOT queries

2006-11-21 Thread Daniel Noll
Antony Bowesman wrote: Hi, I'm writing a mapping mechanism between an existing search interface and Lucene and wondered how to support a single NOT/- query. Given the query -attribute, then from an ealier comment by Chris Hostetter where he says you can't have a negative clause in isolation

Re: Analysis/tokenization of compound words (German, Chinese, etc.)

2006-11-21 Thread Bob Carpenter
eks dev wrote: Depends what yo need to do with it, if you need this to be only used as kind of stemming for searching documents, solution is not all that complex. If you need linguisticly correct splitting than it gets complicated. This is a very good point. Stemming for high recall is

Limiting QueryParser

2006-11-21 Thread Antony Bowesman
Hi, I have a search UI that allows search criteria to be input against specific fields, e.g. Subject. In order to create a suitable Lucene Query, I must analyze that String so that it becomes a set of Tokens which I can then turn into Terms. QueryParser seems to fit the bill for that,

Re: how to search string with words

2006-11-21 Thread spinergywmy
Hi guys, I have this problem searching all fields (metadata) using SpanFirstQuery. My scenario is if I just searching on one thing using SpanFirstQuery is not a problem. However, if I would have to search everything than I will not have any result return. For example, I search

Re: is there any way to find unique records ?

2006-11-21 Thread Erick Erickson
Ok, I think I get it now. You're right that you probably don't want to iterate the Hits object since that has performance issues once you get beyond 100 docs or so. Although, I don't know how big your result sets are. If they are guaranteed to be small, this may not matter. I'm guessing you want

Re: does anyone know of a 'smart' categorizing text pattern finder?

2006-11-21 Thread Bob Carpenter
Vladimir Olenin wrote: Hi, I wonder if anyone here knows if there is a 'smart' text pattern finder, ideally written in Java. The library I'm looking for should be able to 'guess' the category of the particular text on the page, most probably by finding similarities between the bulk of the

Re: Fwd: Hibernate Lucene trademark issues

2006-11-21 Thread adasal
Thanks for link and your write up. On 19/11/06, Shay Banon [EMAIL PROTECTED] wrote: Since I do not want to invade Lucene user list regarding a discussion about Compass and Hiberante Search, but I still think that it is something that needs to get answered, here is a link to my blog post

Re: How to do a starts with search

2006-11-21 Thread Antony Bowesman
Martin Braun wrote: Please refer to the answers to my question on this list: http://www.nabble.com/forum/ViewPost.jtp?post=7337585framed=y Shortly spoken: SpanFirstQuery works like a charm :) Thanks Martin, that looks just right. I'll try it. Antony

Re: Sorting on distance from a long/lat

2006-11-21 Thread Dennis Watson
Hi, I apologize if this is slightly off topic. I have not implemented this, but the idea came to me after reading another post about measuring distance in lucene. It may be completely impractical, however it seems it COULD work at least if the area to be indexed could be constrained. What

Re: how to search string with words

2006-11-21 Thread Martin Braun
spinergywmy schrieb: Hi Erick, I did take a look at the link that u provided me, and I have try myself but I have no return reesult. My search string is third party license readme hhm with a quick look I would suggest that you have to split the string into individual terms, and

Re: is there any n-gram analyzer available??

2006-11-21 Thread Bob Carpenter
heritrix.lucene wrote: Thanks for your reply. This analyzer creates combination of words. I am looking for analyzer where you can break up the words into their n-grams. For example: 2-grams of google - go, oo, og, gl, le like that. This is also easy. You can check out our sample in

Re: NOT queries

2006-11-21 Thread Daniel Naber
On Tuesday 21 November 2006 23:14, Antony Bowesman wrote: I assume that you first have to create a BooleanClause that finds everything and then another Clause that removes the attribute. Is this right or is there another way to do it? That's correct. For the find everything part you can

Re: is there any way to find unique records ?

2006-11-21 Thread Erick Erickson
I don't think I understand what only unique records from a single field means. If it's a unique value in a filed, there'll only be one document in the hits object and there's no cost to iterating, so I doubt that's what you mean. If your asking for a list of all the unique values for a

Querying performance decrease in 1.9.1 and 2.0.0

2006-11-21 Thread Stanislav Jordanov
Hi guys, We've identified a significant querying performance decrease after switching from Lucene 1.4.3 to 1.9.1. It is steadily demonstrated no mater if the concurrent querying threads are 1, 2, 4 or 8 (or even more) - If N queries are executed against 1.9.1 for a given time, then 1.4.3

Re: Combining scores

2006-11-21 Thread Erick Erickson
This is a *really* simplistic approach, but why not just submit all 4 or 5 queries at once ina BooleanQuery and let Lucene do all the work for you? Or are the 4 or 5 queries such that they don't combine easily with MUST, MUST_NOT or SHOULD in a BooleanQuery? Best Erick On 11/21/06, Luis Rodrigo

Re: NOT queries

2006-11-21 Thread Antony Bowesman
Daniel Naber wrote: That's correct. For the find everything part you can use MatchAllDocsQuery. Thanks - I hadn't noticed that Query. Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail:

Re: how to search string with words

2006-11-21 Thread spinergywmy
Hi, Thanks Martin. I have one question, what does that slop does within span near query? What is the difference between 0 and 1? I have seen the source from Lucene, one of the example putting slop as 4. Could u pls explain that to me. Thanks. regards, Wooi Meng -- View this message in

Sorting on distance from a long/lat

2006-11-21 Thread spamsucks
I am successfully able to search for nearbys given a longitude and a latitude. The basic summary of how I do this is that I add 1000 to the long/lat values and use a RangeFilter in my query. In my display results, I display the results ordered by distance from the original long/lat. What I

Re: Use case for term vector's token position/offset?

2006-11-21 Thread Grant Ingersoll
Hi Jong, I think these are useful for things like highlighting (I think contrib/highlighter can use them); other post processing algorithms such as: question answering, calculating co-occurrences (find the 6 terms to the left and right of the term at position 16). Perhaps you want to

Re: RAMDirectory vs MemoryIndex

2006-11-21 Thread karl wettin
21 nov 2006 kl. 16.43 skrev jm: Any thoughts? You can also try InstantiatedIndex, similair in speed and design with a MemoryIndex, but can handle multiple documents, IndexReader, IndexWriter, IndexModifier et.c. just like any Directory implementation. It requires a minor patch to the

Re: how to search string with words

2006-11-21 Thread spinergywmy
Hi Erick, I did take a look at the link that u provided me, and I have try myself but I have no return reesult. My search string is third party license readme Below r the codes that I wrote, please point me out where I have done wrong. readerA =

Re: Querying performance decrease in 1.9.1 and 2.0.0

2006-11-21 Thread Paul Elschot
Stanislav, Could you also try a nightly build to test the later performance improvement on BooleanScorer2? The nightly builds are here: http://people.apache.org/builds/lucene/java/nightly/ The jar is called lucene-core-nightly.jar in the .tar.gz build. It's not likely that this is faster than

Re: Limiting QueryParser

2006-11-21 Thread Antony Bowesman
Chris Hostetter wrote: : important:conference agenda : I want to end up with : : +subject:important +subject:conference +subject:agenda : : I've written something to do this, but I know it is not as clever as QP as : currently it can only create BooleanQueries with TermQueries and cannot handle

Federated search (lucene custom and nutch)?

2006-11-21 Thread spamsucks
Hi Everybody, I am successfully using lucene to index/display results for a hugely successful tourism site... We even for nearby's of attractions of different categories. Love it. The next step is to start indexing all the legacy content, which numbers around 3000 or so JSP's that will

Re: RAMDirectory vs MemoryIndex

2006-11-21 Thread Wolfgang Hoschek
On Nov 21, 2006, at 7:43 AM, jm wrote: Hi, I have to decide between using a RAMDirectory and MemoryIndex, but not sure what approach will work better... I have to run many items (tens of thousands) against some queries (100 at most), but I have to do it one item at a time. And I already have

Re: is there any way to find unique records ?

2006-11-21 Thread Chris Hostetter
serach the archives for faceted searching and category counts and you should find lots of discussions on this topic. : Date: Tue, 21 Nov 2006 20:30:22 +0530 : From: Bhavin Pandya [EMAIL PROTECTED] : Reply-To: java-user@lucene.apache.org, Bhavin Pandya [EMAIL PROTECTED] : To:

Re: RAMDirectory vs MemoryIndex

2006-11-21 Thread jm
Ok, thanks, I'll give MemoryIndex a go, and if that is not good enoguh I will explore the other options then. On 11/21/06, Wolfgang Hoschek [EMAIL PROTECTED] wrote: On Nov 21, 2006, at 7:43 AM, jm wrote: Hi, I have to decide between using a RAMDirectory and MemoryIndex, but not sure what

Re: Querying performance decrease in 1.9.1 and 2.0.0

2006-11-21 Thread Chris Hostetter
: Could you also try a nightly build to test the later performance improvement : on BooleanScorer2? The nightly builds are here: : http://people.apache.org/builds/lucene/java/nightly/ : The jar is called lucene-core-nightly.jar in the .tar.gz build. : : It's not likely that this is faster than

Combining scores

2006-11-21 Thread Luis Rodrigo Aguado
Hi all, I am working in a project that, for each query from the user, builds four or five different queries and tries to combine the results. The first part is already working, but, as I have read that the scores from different queries are not comparable at all among them, I am a bit stuck

Re: Implementing scoring in Lucene

2006-11-21 Thread Chris Hostetter
: Is there a step by step guide on how to implement the scoring function : for Apache Lucene? : The help given on the website is not easy to follow. : : How do I integrate the search function into my website? First off, what help did you look at? ... did you start with the tutorial?

Re: Querying performance decrease in 1.9.1 and 2.0.0

2006-11-21 Thread Yonik Seeley
On 11/21/06, Stanislav Jordanov [EMAIL PROTECTED] wrote: We've identified a significant querying performance decrease after switching from Lucene 1.4.3 to 1.9.1. It is steadily demonstrated no mater if the concurrent querying threads are 1, 2, 4 or 8 (or even more) - If N queries are executed

Delete contents from index

2006-11-21 Thread spinergywmy
Hi, How can I delete the contents from Index file? Is there any example that I can refer to? Thanks. regards, Wooi Meng -- View this message in context: http://www.nabble.com/Delete-contents-from-index-tf2668566.html#a7441161 Sent from the Lucene - Java Users mailing list archive at

Re: Fw: Urgent : Specific search problem with whitespace analyzer

2006-11-21 Thread Chris Hostetter
: I have modified the tokenizer class by making it return characters in : lower case. there is really no reason to do this ... have your analyzer use the WhitespaceTokenizer, wrapped in a LowerCaseFilter ... that will illiminate some of your custom code, and perhaps some of your problems as

Re: Querying performance decrease in 1.9.1 and 2.0.0

2006-11-21 Thread Stanislav Jordanov
Switch to the old scorer (via BooleanQuery.setUseScorer14(true) ) solved the performance issue - now Lucene 1.9.1 2.0.0 perform on the same load test just as 1.4.3 does Thanks a lot Yonik! Any chance there exists a non-professional explanation what's the difference between old and new

Re: Combining scores

2006-11-21 Thread José Ramón Pérez Agüera
i've some code to do that, but it is not really friendly yet :-( Anyway is quite simple. You need merge the postings that you obtain for the differents queries using TermDocs. With TermDocs you obtain the internal ids for the docs related to terms. If you merge the TermDocs for each word that

Re: Limiting QueryParser

2006-11-21 Thread Antony Bowesman
Mark Miller wrote: if you scan the query and escape all colons (ie \:) then you should be good (I have not verified). Of course you will not be able to do a field search, but that seems to be what your after. Thanks for that suggestion. However, a standard un-escaped parse gives Input -

Re: is there any way to find unique records ?

2006-11-21 Thread Bhavin Pandya
Hi Erick, If your asking for a list of all the unique values for a particular field, see TermDocs and/or TermEnum which will allow you to look at, say, all the values stored for some field. A trick here is to seek (new Term(field, ));. By putting nothing in the value, you effectively enumerate

Ordered Proximity searching, does it exist?

2006-11-21 Thread Adam
Dear Lucene Users, Is there a way or has someone been able to implement an ordered proximity search. Lucene currently uses the word1 word2~5 query to find tokens that are within 5 words of each other in any order. What I've been asked to do is find only the results that are for instance

Multi-Index Spellchecker

2006-11-21 Thread Mark Miller
Does anyone have any interested in making the spellchecker work across more than one index? Does the coder of the spellchecker have any advice/dont do that moron info etc ? - Mark

Re: Multi-Index Spellchecker

2006-11-21 Thread Mark Miller
Thanks Hoss, I hadn't looked at the indexDictionary method yet. It does not appear to be what I am looking for though...I should have been more explicit - I am using the spellchecker for a 'did you mean search', so I am not using a dedicated spell check index. Instead I am passing the index

Re: Q: Highlighter + Search symbols *, ?, ~

2006-11-21 Thread Stephan Spat
Hey Jeff! Storey, Jeff schrieb: Could you explain what you did for your solution? This is a problem I'm currently facing as well. But, for example, if the user searches for head~ would you also be able to highlight read and dead if they are returned or just head without the ~. It is