Karl Wettin wrote:
>
>
>> Or just set the boost to zero on the individual filter fields, or on
>> the whole filter expression.
>>
>> +(my query) +(filter1 OR filter2 AND filter3)^0
>
>
That sounds perfect! I thought that boosts would be multiplied together to
give 0 for the whole expressio
martinoleary wrote:
Hi there...
im trying to get MoreLikeThis documents from my lucene index given a
sentence... just one line of text lets say... but i also want to get the
returned results only where a field has a specific value
so for example if i have my index and it contains a categ
Hi there
I am trying to use YANFS (see https://yanfs.dev.java.net/) to allow
administrators to configure Lucene index that is accessible via NFS on a
remote drive. Is there a way to easily modify lucene such that when it
reads / writes from the Index it uses the XFile object instead of File?
Thanks Chris (or if you prefer, Hoss) - I will definitely try that for
matching no docs, but one of the problems I'm having is that I'm
indexing multiple terms for one field and I need ALL the terms to
match it.
Maybe this is easier ... suppose what I'm indexing is a phone number,
and the
15 jul 2008 kl. 18.44 skrev Yonik Seeley:
On Tue, Jul 15, 2008 at 10:24 AM, Karl Wettin
<[EMAIL PROTECTED]> wrote:
I have a number of fields that are used to filter documents from a
search.
They should not contribute to the score of the document but merely
decide
which documents are valid.
Assuming i understand your question: the fact that your first clause is a
wildcard query is irrelevant, to generalize your request you want a way to
query for all docs which either match some sub query, or have no terms in
the field at all. to find all docs with no terms for a given field, you
Hi Erik,
I'm seeing the same problem - here's an excerpt from the headers of a bounce I
just got (note the address "[EMAIL PROTECTED]" in the last couple of
"Received:" headers):
Received: from spwiki.spsoftware.com (static61.17.14-87.vsnl.eth.net
[61.17.14.87] (may be forged))
for <[E
Karl Wettin wrote:
>
>
> Feel free to post it as an issue in the Jira when it's implemented.
>
>
Thanks a lot! Will do
John
--
View this message in context:
http://www.nabble.com/Mixing-non-scored-an-scored-queries-tp18460018p18470916.html
Sent from the Lucene - Java Users mailing list a
Thanks Steve.
Steven A Rowe wrote:
Hi Chris,
The PhraseQuery class does no parsing; tokenization is expected to happen before you feed
anything to it. So unless you have an index-time analyzer that outputs terms that look
like "aaa ddd" -- that is, terms with embedded spaces -- then attempti
Hi Chris,
The PhraseQuery class does no parsing; tokenization is expected to happen
before you feed anything to it. So unless you have an index-time analyzer that
outputs terms that look like "aaa ddd" -- that is, terms with embedded spaces
-- then attempting to use PhraseQuery or any other qu
Erik:
I'm having the same problem, I expect this e-mail to get a bounce-back.
Could I ask you to take a glance at it?
It's no big deal, I just have to delete the bounce-back.
Thanks
[EMAIL PROTECTED]
On Tue, Jul 15, 2008 at 11:57 AM, Erik Hatcher <[EMAIL PROTECTED]>
wrote:
> I've finally succe
15 jul 2008 kl. 18.11 skrev John Patterson:
So it seems that creating a constant scoring TermQuery is the best
suggestion so far.
Would be really great if I could call
BooleanQuery.setConstantScore(1.0f) or
something.
You might be looking at implementing something like
public class N
On Tue, Jul 15, 2008 at 10:24 AM, Karl Wettin <[EMAIL PROTECTED]> wrote:
>> I have a number of fields that are used to filter documents from a search.
>> They should not contribute to the score of the document but merely decide
>> which documents are valid. i.e. it doesn't matter how rare they are
Karl Wettin wrote:
>
>
> I think all you need to do is to create a custom query (sounds like
> you want a clone of TermQuery) that uses a Scorer that always return 1f.
>
>
Actually, I just thought that it would probably be better to create an
adapter Query that always returns a constant s
John Patterson wrote:
>
>
> I don't think filters are the way to go here because I need to use boolean
> style logic e.g.
>
> Search for free text "open fire" restricted to "London" OR "Brighton" in
> category "Pubs and bars" OR "Restaurants"
>
> which means I need to construct and run a Boo
eks dev wrote:
>
> do not forget that Filter does not have to be loaded in memory, not any
> more since LUECEN-584 commit! Now it is only skipping iterator what you
> need.
>
>
> translated, you could use:
> ConstantScoreQuery created with Filter made from TermDocs (you need to
> implement on
I've finally successfully removed the offending address from the
list. I had tried earlier, but somehow it failed to take, but this
time I think it has worked. Let me know off the list if you continue
to get this bounce (something I've never seen personally, for the
record).
E
In other words, for my first question, what I want to know is how I might
consistently and correctly get the same max score for any two pairs of
identical documents without having to rewrite major parts of lucene. I
could find ALL the scores and divide them by the max, but that seems
somehow wron
do not forget that Filter does not have to be loaded in memory, not any more
since LUECEN-584 commit! Now it is only skipping iterator what you need.
translated, you could use:
ConstantScoreQuery created with Filter made from TermDocs (you need to
implement only DocIdSet / DocIdSetIterator, thi
Karl Wettin wrote:
>
> I think all you need to do is to create a custom query (sounds like
> you want a clone of TermQuery) that uses a Scorer that always return 1f.
>
That sounds exactly like what is required. I imagine that would be quite
useful to have in the core project?
--
View this
15 jul 2008 kl. 10.07 skrev John Patterson:
I have a number of fields that are used to filter documents from a
search.
They should not contribute to the score of the document but merely
decide
which documents are valid. i.e. it doesn't matter how rare they are
in the
index. I also have a
Erick Erickson wrote:
>
> No, you create the filter via TermDocs/TermEnum. You can also cache
> them. Creating filters is *much* faster than you think .
>
But I can have many terms in the query. With over 10 million documents and
many concurrent searches, creating a filter for every search w
Erick Erickson wrote:
>
> One way would be to create Filters and add them in with
>
I could possibly wrap the standard BooleanQuery in an adapter which also
wraps its Weight and Scorer to return a constant value.
But that seems like a hell of a lot of internal jiggery pokery for something
th
No, you create the filter via TermDocs/TermEnum. You can also cache
them. Creating filters is *much* faster than you think .
Alternatively, you could boost everything *else* by some large factor
and then the unimportant fields would add relatively little to the final
score.
Best
Erick
On Tue, Ju
Erick Erickson wrote:
>
> One way would be to create Filters and add them in with
> ConstantScoreRangeQuery
>
Would that mean running the query twice? i.e. once to create the filter and
once to rank the results?
--
View this message in context:
http://www.nabble.com/Mixing-non-scored-a
I guess I don't understand this. Somewhere, you have to be
opening the text file to feed it's contents to Lucene. Why can't
you just print things then? If you're using the demo, you need
to look into the code and you'll see something like this.
Best
Erick
On Tue, Jul 15, 2008 at 4:55 AM, starz10d
One way would be to create Filters and add them in with
ConstantScoreRangeQuery
Best
Erick
On Tue, Jul 15, 2008 at 4:07 AM, John Patterson <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I have a number of fields that are used to filter documents from a search.
> They should not contribute to the sco
Hi all:
For people who are using Lucene Oracle integration project:
http://marceloochoa.blogspot.com/2008/07/lucene-ojvm-native-rest-ws.html
Best regards, Marcelo.
--
Marcelo F. Ochoa
http://marceloochoa.blogspot.com/
http://marcelo.ochoa.googlepages.com/home
__
Do you Know DBPris
Hi John
Thanks for your continued interest in my travails!
==I'm not sure I understand. You want a phrase query so they should be
==passed as a phrase in quotes.
Ok... well I must be missing something then :-(
This fails to return any hits for me:
PhraseQuery pq = new PhraseQuery();
Hi Steve,
It would be simpler if I have a query called SubPhraseQuery in which case I
do not have to either generate extra terms during ingestion or generate
extra queries during querying. As a user, the best I would hope for is, to
ingest the data from some feed into different fields, run the use
[EMAIL PROTECTED] wrote:
This isn't quite true. If you open IndexWriter with
autoCommit=false,
then none of the changes you do with it will be visible to an
IndexReader, even one reopened while IndexWriter is doing its work,
until you close the IndexWriter.
Where are the docs for this tran
The list subscriber [EMAIL PROTECTED] is not a known email
address and the MX server (spsoftindia.com) sends the bounce back to
you. I'm not sure if this is because some header is missing or if
spsoftindia.com does not follow protocol. My guess is the latter.
A list moderator should remove
15 jul 2008 kl. 10.20 skrev Sascha Fahl:
There is a big difference between GeoSearch and GeoSort. GeoSearch
means you are looking for data within a certain range. To implement
this index structures like R-Trees help, because they make it a lot
easier to think in "boxes". GeoSort is just to
> This isn't quite true. If you open IndexWriter with autoCommit=false,
> then none of the changes you do with it will be visible to an
> IndexReader, even one reopened while IndexWriter is doing its work,
> until you close the IndexWriter.
Where are the docs for this transaction buffered?
> How about just copying and performing your indexing (or index write
> related)
> operations on the copy and then performing a rename operation followed by
> reopening of the index readers.
This is how we did it until now. But the indexes become bigger and bigger (50
GB and more) and so we are
Hi All,
It might be easy question, but for new one as me in lucene it is not that
easy. I want to print the text files before indexing them in lucene , I did
try to do it , but i could just print the index content where we see the
kewowrds and document nr and frequency. I need beside that to pr
Hi,
Every time I send a mail to this list, I get the below error.
Any idea where is the problem ?
It also appears that my mails are actually reaching the list.
Any help in rectifying this is appreciated.
Thanks
Preetam
-- Forwarded message --
From: Mail Delivery Subsystem <[EMAIL
Anshum wrote:
But the downside to this would be, in case your daemon crashes in the
meantime or you need to restart the daemon, the index would not be
usable
until you have completed your indexing processs.
This isn't quite true. If you open IndexWriter with autoCommit=false,
then none
Hi there...
im trying to get MoreLikeThis documents from my lucene index given a
sentence... just one line of text lets say... but i also want to get the
returned results only where a field has a specific value
so for example if i have my index and it contains a categoryId and
content... i
That is very good performance.
But, If I take, on an average, 6 terms per user query, and looking at
shingles of size 2 I will have a boolean OR of 5 shingle phrase queries.
How better is this compared to a single sub phrase query which would
internally be just like another phrase query with som
There is a big difference between GeoSearch and GeoSort. GeoSearch
means you are looking for data within a certain range. To implement
this index structures like R-Trees help, because they make it a lot
easier to think in "boxes". GeoSort is just to sort the data in
relation to a given poin
Hi,
I have a number of fields that are used to filter documents from a search.
They should not contribute to the score of the document but merely decide
which documents are valid. i.e. it doesn't matter how rare they are in the
index. I also have a single "combined" field that is used for free
15 jul 2008 kl. 09.50 skrev Sascha Fahl:
I read the chapter about custom sort methods and hacked around with
the GeoSort. Are there ways to improve the algorithm? Espacially
calculating the distance for ALL documents in the index is a bad
idea because only the distance for hitted documents
I'm not sure what it is you say you want to do. If what you want to do
is to measure distance between two documents then the easiet way is to
extract the feature vectors (document TermFreqVector) from those two
documents and measure the distance using something like the Tanimoto
coefficient
Hi,
I read the chapter about custom sort methods and hacked around with
the GeoSort. Are there ways to improve the algorithm? Espacially
calculating the distance for ALL documents in the index is a bad idea
because only the distance for hitted documents are of interest. That
could save lo
Couldn't you create multiple "shingle phrase queries" from the user
query and add them all to a BooleanQuery?
"example input query"^10 OR
"example input"^5 OR
"input query"^5
SpanNear and PhraseQueries are rather expensive though. Not too long
ago I replaced phrase queries with a shingles in
13 jul 2008 kl. 16.58 skrev miztaken:
What sort of operations do you use the matrix for? How large can it
grow? Can you give an example of what the matrix might contain?
What was the reason to solve your problem using Lucene? Is there some
specific feature that made something easier or faster
47 matches
Mail list logo