Re: IndexSearcher and RAMDirectory

2008-01-17 Thread Karl Wettin
17 jan 2008 kl. 16.42 skrev Cam Bazz: Hello, Hi, I understand after writing some documents in an index with an indexwriter, the IndexSearcher object has to be reinstantiated for it to find newly instantiated objects. that is correct, given you instantiate the new IndexSearcher(directory

Re: Inverted search / Search on profilenet

2008-01-17 Thread markharw00d
There is a trick to indexing queries in this way... you need only index the rarest term in queries which have one or more mandatory terms. As an example - for the phrase query "XYZ Group limited" you need only index the rarest term "XYZ" and thus avoid the selecting the query for execution with

Highlighting marked up documents

2008-01-17 Thread John Byrne
Hi, Has anyone found a way to use search term highlighting in a marked up document, such as HTML or .DOC? My problem is, the lucene highlighter works on plain text, the limitation being that you have to use the text you indexed for highlighitng, so your tags are gone by then. Although it's po

Re: Inverted search / Search on profilenet

2008-01-17 Thread Vieri
I believe that a mixed 1+3 approach should mimic quite well what Verity does. In fact, what I would do is to index "profile net" queries in a dedicated index, using exclusively exact terms (i.e.: removing Boolean operators and wildcards). This gives you an approximate profile index you can use to

Re: Unable to match strings with Underscore / . etc...

2008-01-17 Thread Erick Erickson
You have to look at your analyzers. StandardAnalyzer tries to respect things like e-mail addresses. Various other analyzers you could use do things like break on punctuation. I'd suggest you get a copy of Luke and examine what your index actually holds and you can look at the parsed form of a quer

Unable to match strings with Underscore / . etc...

2008-01-17 Thread DURGA DEEP
HI folks, We are facing this problem with able to find the following strings. For Example: a.b.c a_b_c Are we doing some thing wrong ?. Seems like the ( . / Period Character ) ( _ / Underscore Character ) are being tossed away. And if we are trying to search for a.b.c we are unable

Re: constructing query from string

2008-01-17 Thread prabin meitei
For the query I am using *newQ.toString* equals *queryFrom.ToString *well, what i am trying to accomplish is that I need to search an index (quite often say interval of around 15 mins) and the query depends on other activities done by the user till that point of a time. but i don't want the query

SV: SV: SV: Integrating dynamic data into Lucene search/ranking

2008-01-17 Thread Marcus Falk
I heard from a friend that this behavior (AddWithoutMerge) has been added into 2.1 or 2.2 of lucene. /M -Ursprungligt meddelande- Från: Marcus Falk [mailto:[EMAIL PROTECTED] Skickat: den 17 januari 2008 16:34 Till: java-user@lucene.apache.org Ämne: SV: SV: SV: Integrating dynamic data i

SV: Inverted search / Search on profilenet

2008-01-17 Thread Marcus Falk
Yes a profilenet is what Mark describes. In our Verity profilenet we got ~50.000 profiles (queries) the performance is fine around 20-25 documents / second. >From what we can tell the matches are accurate unfortunately I don't have any >ideas on how verity does this under the hood so I don't k

IndexSearcher and RAMDirectory

2008-01-17 Thread Cam Bazz
Hello, I understand after writing some documents in an index with an indexwriter, the IndexSearcher object has to be reinstantiated for it to find newly instantiated objects. And this reinstantiation of IndexSearcher is costly from what I understand. I am working on a caching scheme that will allo

Re: constructing query from string

2008-01-17 Thread Erick Erickson
I just thought of an interesting test for whether toString() is reasonable. You could log/flag when the reloaded query differs. I.e. String queryFromToString; // your stored form Query newQ = parser.parse(queryFromToString); if (newQ.toString != queryFromToString) { log some stuff or throw an ex

SV: SV: SV: Integrating dynamic data into Lucene search/ranking

2008-01-17 Thread Marcus Falk
I think that would work. But I'm not 100% sure of what you are trying to achieve. Just a notice: Sorting on results has poor performance, if you have a large index, we ran into severe performance problems with just a coupe of million articles which lead us to modify the ranking instead. Code

Re: constructing query from string

2008-01-17 Thread Erick Erickson
I believe, but I'm not sure, that query and newQuery are not guaranteed to be equivalent. So I'd be cautious about this approach. But if it works for you I'm assuming that you're somehow programmatically constructing the query and therefore can't just store the original string. I'd *always* st

Re: Multiple searchers (Was: CachingWrapperFilter: why cache per IndexReader?)

2008-01-17 Thread Erick Erickson
There's a section on the Lucene Wiki for real world experiences etc. After you are satisfied with your tests, it'd be great if you could add your measurements to the Wiki! Best Erick On Jan 17, 2008 5:31 AM, Toke Eskildsen <[EMAIL PROTECTED]> wrote: > On Fri, 2008-01-11 at 11:34 +0100, Toke Esk

Re: SV: SV: Integrating dynamic data into Lucene search/ranking

2008-01-17 Thread Tobias Lohr
Thanks for your hint. If its possible I would take a look into the code, but the approach is interesting. What would you say to this approach I developed in my mind: - Having an additional quite smaller index, were only the dynamic data resides and is incorporated every N seconds with increment

SV: SV: Integrating dynamic data into Lucene search/ranking

2008-01-17 Thread Marcus Falk
In our solution we used a RAMDir for the newest incoming articles and a FSDir for older ones. Then we had a limit for the ramdir like 10.000 documents when that limit were hit we used mergesegments to move the content from ramdir -> fsdir, actually we had to do some modification in the mergeseg

Re: Inverted search / Search on profilenet

2008-01-17 Thread Mark Miller
Verity, autonomy, whatever, has a what they call a reverse query system called profilenet. A profile is just a query (or I guess more than one query?) and you can setup a bunch of them. Then you supply the document and you will get the matching queries as well as a score. They say its the oppos

Re: constructing query from string

2008-01-17 Thread prabin meitei
Hi Erick, Thanks for your response. I have tried the folowing way and seems to be working. Tell me if there is any problem with the approach. String str = query.toString(); QueryParser parser = new QueryParser("", new StandardAnalyzer); Query newQuery = parser.parse(str); now use *newQuery* fo

Re: Inverted search / Search on profilenet

2008-01-17 Thread Endre Stølsvik
Mark Miller wrote: In any case, it shouldnt be that difficult to rig something. Is the profilenet system even that valuable? Sounds a bit hokey to me, but then im just a kid that has never used it May I ask: What IS a profilenet? I ask since this obviously is something that you two hit off o

Multiple searchers (Was: CachingWrapperFilter: why cache per IndexReader?)

2008-01-17 Thread Toke Eskildsen
On Fri, 2008-01-11 at 11:34 +0100, Toke Eskildsen wrote: > As for shared searcher vs. individual searchers, there was just a > slight penalty for using individual searchers. Whoops! Seems like I need better QA for my test-code. I didn't use individual searchers for each thread when I thought I was

RE: How?

2008-01-17 Thread spring
> A non-clustered and clustered index has resovle the problem, > but Lucene can > not do the same thing like that? Well, I bet the database solution is the best, as long as you do not search in big text fields or you need special fulltext features like fuzzy search etc. Synchronizing a lucene in

Re: SV: Integrating dynamic data into Lucene search/ranking

2008-01-17 Thread Andrzej Bialecki
Tobias Lohr wrote: I'm not really sure, if this approach is possible for working in changes every - let's say - 30 seconds!? The conventional wisdom is to use RAMDirectory in such scenarios. I.e. you commit frequent updates to a RAMDirectory and frequently reopen its Searcher (which should b

Re: SV: Integrating dynamic data into Lucene search/ranking

2008-01-17 Thread Tobias Lohr
I'm not really sure, if this approach is possible for working in changes every - let's say - 30 seconds!? Original-Nachricht > Datum: Thu, 17 Jan 2008 05:35:13 +0100 > Von: "Marcus Falk" <[EMAIL PROTECTED]> > An: java-user@lucene.apache.org, java-user@lucene.apache.org > Betreff