17 jan 2008 kl. 16.42 skrev Cam Bazz:
Hello,
Hi,
I understand after writing some documents in an index with an
indexwriter,
the IndexSearcher object has to be reinstantiated for it to find newly
instantiated objects.
that is correct, given you instantiate the new IndexSearcher(directory
There is a trick to indexing queries in this way... you need only index
the rarest term in queries which have one or more mandatory terms.
As an example - for the phrase query "XYZ Group limited" you need only
index the rarest term "XYZ" and thus avoid the selecting the query for
execution with
Hi,
Has anyone found a way to use search term highlighting in a marked up
document, such as HTML or .DOC? My problem is, the lucene highlighter
works on plain text, the limitation being that you have to use the text
you indexed for highlighitng, so your tags are gone by then. Although
it's po
I believe that a mixed 1+3 approach should mimic quite well what Verity does.
In fact, what I would do is to index "profile net" queries in a
dedicated index, using exclusively exact terms (i.e.: removing Boolean
operators and wildcards).
This gives you an approximate profile index you can use to
You have to look at your analyzers. StandardAnalyzer
tries to respect things like e-mail addresses. Various
other analyzers you could use do things like break
on punctuation.
I'd suggest you get a copy of Luke and examine what
your index actually holds and you can look at the
parsed form of a quer
HI folks,
We are facing this problem with able to find the following strings.
For Example:
a.b.c
a_b_c
Are we doing some thing wrong ?. Seems like the ( . / Period Character ) (
_ / Underscore Character ) are being tossed away.
And if we are trying to search for a.b.c we are unable
For the query I am using *newQ.toString* equals *queryFrom.ToString
*well, what i am trying to accomplish is that I need to search an index
(quite often say interval of around 15 mins) and the query depends on other
activities done by the user till that point of a time. but i don't want the
query
I heard from a friend that this behavior (AddWithoutMerge) has been added into
2.1 or 2.2 of lucene.
/M
-Ursprungligt meddelande-
Från: Marcus Falk [mailto:[EMAIL PROTECTED]
Skickat: den 17 januari 2008 16:34
Till: java-user@lucene.apache.org
Ämne: SV: SV: SV: Integrating dynamic data i
Yes a profilenet is what Mark describes.
In our Verity profilenet we got ~50.000 profiles (queries) the performance is
fine around 20-25 documents / second.
>From what we can tell the matches are accurate unfortunately I don't have any
>ideas on how verity does this under the hood so I don't k
Hello,
I understand after writing some documents in an index with an indexwriter,
the IndexSearcher object has to be reinstantiated for it to find newly
instantiated objects. And this reinstantiation of IndexSearcher is costly
from what I understand.
I am working on a caching scheme that will allo
I just thought of an interesting test for whether
toString() is reasonable. You could log/flag
when the reloaded query differs. I.e.
String queryFromToString; // your stored form
Query newQ = parser.parse(queryFromToString);
if (newQ.toString != queryFromToString) {
log some stuff or throw an ex
I think that would work. But I'm not 100% sure of what you are trying to
achieve.
Just a notice:
Sorting on results has poor performance, if you have a large index, we ran into
severe performance problems with just a coupe of million articles which lead us
to modify the ranking instead.
Code
I believe, but I'm not sure, that query and newQuery
are not guaranteed to be equivalent. So I'd be cautious
about this approach. But if it works for you
I'm assuming that you're somehow programmatically
constructing the query and therefore can't just
store the original string. I'd *always* st
There's a section on the Lucene Wiki for real world
experiences etc. After you are satisfied with your
tests, it'd be great if you could add your measurements
to the Wiki!
Best
Erick
On Jan 17, 2008 5:31 AM, Toke Eskildsen <[EMAIL PROTECTED]> wrote:
> On Fri, 2008-01-11 at 11:34 +0100, Toke Esk
Thanks for your hint. If its possible I would take a look into the code, but
the approach is interesting.
What would you say to this approach I developed in my mind:
- Having an additional quite smaller index, were only the dynamic data resides
and is incorporated every N seconds with increment
In our solution we used a RAMDir for the newest incoming articles and a FSDir
for older ones. Then we had a limit for the ramdir like 10.000 documents when
that limit were hit we used mergesegments to move the content from ramdir ->
fsdir, actually we had to do some modification in the mergeseg
Verity, autonomy, whatever, has a what they call a reverse query system
called profilenet. A profile is just a query (or I guess more than one
query?) and you can setup a bunch of them. Then you supply the document
and you will get the matching queries as well as a score. They say its
the oppos
Hi Erick,
Thanks for your response. I have tried the folowing way and seems to be
working. Tell me if there is any problem with the approach.
String str = query.toString();
QueryParser parser = new QueryParser("", new StandardAnalyzer);
Query newQuery = parser.parse(str);
now use *newQuery* fo
Mark Miller wrote:
In any case, it shouldnt be that difficult to rig something. Is the
profilenet system even that valuable? Sounds a bit hokey to me, but then im
just a kid that has never used it
May I ask: What IS a profilenet? I ask since this obviously is something
that you two hit off o
On Fri, 2008-01-11 at 11:34 +0100, Toke Eskildsen wrote:
> As for shared searcher vs. individual searchers, there was just a
> slight penalty for using individual searchers.
Whoops! Seems like I need better QA for my test-code. I didn't use
individual searchers for each thread when I thought I was
> A non-clustered and clustered index has resovle the problem,
> but Lucene can
> not do the same thing like that?
Well, I bet the database solution is the best, as long as you do not search
in big text fields or you need special fulltext features like fuzzy search
etc.
Synchronizing a lucene in
Tobias Lohr wrote:
I'm not really sure, if this approach is possible for working in changes every
- let's say - 30 seconds!?
The conventional wisdom is to use RAMDirectory in such scenarios. I.e.
you commit frequent updates to a RAMDirectory and frequently reopen its
Searcher (which should b
I'm not really sure, if this approach is possible for working in changes every
- let's say - 30 seconds!?
Original-Nachricht
> Datum: Thu, 17 Jan 2008 05:35:13 +0100
> Von: "Marcus Falk" <[EMAIL PROTECTED]>
> An: java-user@lucene.apache.org, java-user@lucene.apache.org
> Betreff
23 matches
Mail list logo