On Nov 21, 2006, at 12:38 PM, jm wrote:
Ok, thanks, I'll give MemoryIndex a go, and if that is not good enoguh
I will explore the other options then.
To get started you can use something like this:
for each document D:
MemoryIndex index = createMemoryIndex(D, ...)
for each query Q:
Michael D. Curtin wrote:
Daniel Naber wrote:
Hi,
as some of you may have noticed, Lucene prefers shorter documents over
longer ones, i.e. shorter documents get a higher ranking, even if the
ratio matched terms / total terms in document is the same.
There's even more interesting kinds of
Bhavin,
Mark Harwood gives a solution that looks almost exactly like what you want:
http://www.mail-archive.com/java-user@lucene.apache.org/msg05154.html
Steve
Chris Hostetter wrote:
serach the archives for faceted searching and category counts and you
should find lots of discussions on
I'm not really sure what an approach like this gaines you ... it provides
a mechanism for ensuring that the lat/lon of all results are within a
bounding box arround your start location -- but those bounding boxes
are fixed when building your index.
couldn't you achieve the same thing using a lat
Thanks for the quick reply. I'll be implementing this in the next couple
of days. Appreciate it!
Jeff
-Original Message-
From: Stephan Spat [mailto:[EMAIL PROTECTED]
Sent: Monday, November 20, 2006 8:43 AM
To: java-user@lucene.apache.org
Subject: Re: Q: Highlighter + Search symbols *,
static String QueryParser.escape(String) should do the trick:
http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html#escape(java.lang.String)
Look at the bottom of the below-linked page for the list of characters
that the above method will escape:
: important:conference agenda
: I want to end up with
:
: +subject:important +subject:conference +subject:agenda
:
: I've written something to do this, but I know it is not as clever as QP as
: currently it can only create BooleanQueries with TermQueries and cannot handle
: PhraseQuery, so would
Hi,
In lucene, is there any way to find only unique records from a single field ..?
otherwise unnecessary i have to itereate through Hits and find out unique...
plz help..
- Bhavin pandya
Antony Bowesman wrote:
Hi,
I'm writing a mapping mechanism between an existing search interface and
Lucene and wondered how to support a single NOT/- query.
Given the query -attribute, then from an ealier comment by Chris
Hostetter where he says you can't have a negative clause in isolation
eks dev wrote:
Depends what yo need to do with it, if you need this to be only used as kind of
stemming for searching documents, solution is not all that complex. If you need
linguisticly correct splitting than it gets complicated.
This is a very good point. Stemming for
high recall is
Hi,
I have a search UI that allows search criteria to be input against specific
fields, e.g. Subject.
In order to create a suitable Lucene Query, I must analyze that String so that
it becomes a set of Tokens which I can then turn into Terms. QueryParser seems
to fit the bill for that,
Hi guys,
I have this problem searching all fields (metadata) using SpanFirstQuery.
My scenario is if I just searching on one thing using SpanFirstQuery
is not a problem. However, if I would have to search everything than I will
not have any result return.
For example, I search
Ok, I think I get it now. You're right that you probably don't want to
iterate the Hits object since that has performance issues once you get
beyond 100 docs or so. Although, I don't know how big your result sets are.
If they are guaranteed to be small, this may not matter.
I'm guessing you want
Vladimir Olenin wrote:
Hi,
I wonder if anyone here knows if there is a 'smart' text pattern finder,
ideally written in Java. The library I'm looking for should be able to 'guess'
the category of the particular text on the page, most probably by finding
similarities between the bulk of the
Thanks for link and your write up.
On 19/11/06, Shay Banon [EMAIL PROTECTED] wrote:
Since I do not want to invade Lucene user list regarding a discussion
about
Compass and Hiberante Search, but I still think that it is something that
needs to get answered, here is a link to my blog post
Martin Braun wrote:
Please refer to the answers to my question on this list:
http://www.nabble.com/forum/ViewPost.jtp?post=7337585framed=y
Shortly spoken: SpanFirstQuery works like a charm :)
Thanks Martin, that looks just right. I'll try it.
Antony
Hi,
I apologize if this is slightly off topic. I have not implemented this, but
the idea came to me after reading another post about measuring distance in
lucene. It may be completely impractical, however it seems it COULD work at
least if the area to be indexed could be constrained.
What
spinergywmy schrieb:
Hi Erick,
I did take a look at the link that u provided me, and I have try myself
but I have no return reesult.
My search string is third party license readme
hhm with a quick look I would suggest that you have to split the string
into individual terms, and
heritrix.lucene wrote:
Thanks for your reply.
This analyzer creates combination of words. I am looking for analyzer where
you can break up the words into their n-grams. For example:
2-grams of
google - go, oo, og, gl, le
like that.
This is also easy. You can check out our
sample in
On Tuesday 21 November 2006 23:14, Antony Bowesman wrote:
I
assume that you first have to create a BooleanClause that finds
everything and then another Clause that removes the attribute.
Is this right or is there another way to do it?
That's correct. For the find everything part you can
I don't think I understand what only unique records from a single field
means. If it's a unique value in a filed, there'll only be one document in
the hits object and there's no cost to iterating, so I doubt that's what you
mean.
If your asking for a list of all the unique values for a
Hi guys,
We've identified a significant querying performance decrease after
switching from Lucene 1.4.3 to 1.9.1.
It is steadily demonstrated no mater if the concurrent querying threads
are 1, 2, 4 or 8 (or even more) -
If N queries are executed against 1.9.1 for a given time, then 1.4.3
This is a *really* simplistic approach, but why not just submit all 4 or 5
queries at once ina BooleanQuery and let Lucene do all the work for you? Or
are the 4 or 5 queries such that they don't combine easily with MUST,
MUST_NOT or SHOULD in a BooleanQuery?
Best
Erick
On 11/21/06, Luis Rodrigo
Daniel Naber wrote:
That's correct. For the find everything part you can use
MatchAllDocsQuery.
Thanks - I hadn't noticed that Query.
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:
Hi,
Thanks Martin. I have one question, what does that slop does within span
near query? What is the difference between 0 and 1? I have seen the source
from Lucene, one of the example putting slop as 4. Could u pls explain that
to me. Thanks.
regards,
Wooi Meng
--
View this message in
I am successfully able to search for nearbys given a longitude and a
latitude. The basic summary of how I do this is that I add 1000 to the
long/lat values and use a RangeFilter in my query.
In my display results, I display the results ordered by distance from the
original long/lat. What I
Hi Jong,
I think these are useful for things like highlighting (I think
contrib/highlighter can use them); other post processing algorithms
such as: question answering, calculating co-occurrences (find the 6
terms to the left and right of the term at position 16). Perhaps you
want to
21 nov 2006 kl. 16.43 skrev jm:
Any thoughts?
You can also try InstantiatedIndex, similair in speed and design with
a MemoryIndex, but can handle multiple documents, IndexReader,
IndexWriter, IndexModifier et.c. just like any Directory
implementation. It requires a minor patch to the
Hi Erick,
I did take a look at the link that u provided me, and I have try myself
but I have no return reesult.
My search string is third party license readme
Below r the codes that I wrote, please point me out where I have done
wrong.
readerA =
Stanislav,
Could you also try a nightly build to test the later performance improvement
on BooleanScorer2? The nightly builds are here:
http://people.apache.org/builds/lucene/java/nightly/
The jar is called lucene-core-nightly.jar in the .tar.gz build.
It's not likely that this is faster than
Chris Hostetter wrote:
: important:conference agenda
: I want to end up with
:
: +subject:important +subject:conference +subject:agenda
:
: I've written something to do this, but I know it is not as clever as QP as
: currently it can only create BooleanQueries with TermQueries and cannot handle
Hi Everybody,
I am successfully using lucene to index/display results for a hugely
successful tourism site... We even for nearby's of attractions of different
categories. Love it.
The next step is to start indexing all the legacy content, which numbers
around 3000 or so JSP's that will
On Nov 21, 2006, at 7:43 AM, jm wrote:
Hi,
I have to decide between using a RAMDirectory and MemoryIndex, but
not sure what approach will work better...
I have to run many items (tens of thousands) against some queries (100
at most), but I have to do it one item at a time. And I already have
serach the archives for faceted searching and category counts and you
should find lots of discussions on this topic.
: Date: Tue, 21 Nov 2006 20:30:22 +0530
: From: Bhavin Pandya [EMAIL PROTECTED]
: Reply-To: java-user@lucene.apache.org, Bhavin Pandya [EMAIL PROTECTED]
: To:
Ok, thanks, I'll give MemoryIndex a go, and if that is not good enoguh
I will explore the other options then.
On 11/21/06, Wolfgang Hoschek [EMAIL PROTECTED] wrote:
On Nov 21, 2006, at 7:43 AM, jm wrote:
Hi,
I have to decide between using a RAMDirectory and MemoryIndex, but
not sure what
: Could you also try a nightly build to test the later performance improvement
: on BooleanScorer2? The nightly builds are here:
: http://people.apache.org/builds/lucene/java/nightly/
: The jar is called lucene-core-nightly.jar in the .tar.gz build.
:
: It's not likely that this is faster than
Hi all,
I am working in a project that, for each query from the user, builds
four or five different queries and tries to combine the results. The
first part is already working, but, as I have read that the scores from
different queries are not comparable at all among them, I am a bit stuck
: Is there a step by step guide on how to implement the scoring function
: for Apache Lucene?
: The help given on the website is not easy to follow.
:
: How do I integrate the search function into my website?
First off, what help did you look at? ... did you start with the tutorial?
On 11/21/06, Stanislav Jordanov [EMAIL PROTECTED] wrote:
We've identified a significant querying performance decrease after
switching from Lucene 1.4.3 to 1.9.1.
It is steadily demonstrated no mater if the concurrent querying threads
are 1, 2, 4 or 8 (or even more) -
If N queries are executed
Hi,
How can I delete the contents from Index file? Is there any example that
I can refer to?
Thanks.
regards,
Wooi Meng
--
View this message in context:
http://www.nabble.com/Delete-contents-from-index-tf2668566.html#a7441161
Sent from the Lucene - Java Users mailing list archive at
: I have modified the tokenizer class by making it return characters in
: lower case.
there is really no reason to do this ... have your analyzer use the
WhitespaceTokenizer, wrapped in a LowerCaseFilter ... that will illiminate
some of your custom code, and perhaps some of your problems as
Switch to the old scorer (via BooleanQuery.setUseScorer14(true) )
solved the performance issue - now Lucene 1.9.1 2.0.0 perform on the
same load test just as 1.4.3 does
Thanks a lot Yonik!
Any chance there exists a non-professional explanation what's the
difference between old and new
i've some code to do that, but it is not really friendly yet :-(
Anyway is quite simple. You need merge the postings that you obtain for the
differents queries using TermDocs. With TermDocs you obtain the internal ids
for the docs related to terms. If you merge the TermDocs for each word that
Mark Miller wrote:
if you scan the query and escape all colons (ie \:) then you should be
good (I have not verified). Of course you will not be able to do a field
search, but that seems to be what your after.
Thanks for that suggestion. However, a standard un-escaped parse gives
Input -
Hi Erick,
If your asking for a list of all the unique values for a particular field,
see TermDocs and/or TermEnum which will allow you to look at, say, all the
values stored for some field. A trick here is to seek (new Term(field,
));. By putting nothing in the value, you effectively enumerate
Dear Lucene Users,
Is there a way or has someone been able to implement an ordered proximity
search. Lucene currently uses the word1 word2~5 query to find tokens that are
within 5 words of each other in any order. What I've been asked to do is find
only the results that are for instance
Does anyone have any interested in making the spellchecker work across more
than one index? Does the coder of the spellchecker have any advice/dont do
that moron info etc ?
- Mark
Thanks Hoss,
I hadn't looked at the indexDictionary method yet. It does not appear to
be what I am looking for though...I should have been more explicit -
I am using the spellchecker for a 'did you mean search', so I am not
using a dedicated spell check index. Instead I am passing the index
Hey Jeff!
Storey, Jeff schrieb:
Could you explain what you did for your solution? This is a problem I'm currently facing as well. But, for example, if
the user searches for head~ would you also be able to highlight read and dead if
they are returned or just head without the ~.
It is
49 matches
Mail list logo