: If I do a query.toString(), both queries give different results, which
: is probably a clue (additional paren's with the BooleanQuery)
:
: Query.toString the old way using queryParser:
: +(id:1^2.0 id:2 ... ) +type:CORE
:
: Query.toString the new way using BooleanQuery:
: +((id:1^2.0)
First off Karl, thanks for your reply and your time.
karl wettin-3 wrote:
>
> One could also say you are classifying your data based on keywords in
> the text?
>
I probably didn't explain myself very well or more specifically provide a
good example. In my case, there really isn't any relatio
Beard, Brian skrev:
I'm using lucene 2.2.0.
I'm in the process of re-writing some queries to build BooleanQueries
instead of using query parser.
Bypassing query parser provides almost an order of magnitude improvement
for very large queries, but then the search performance takes 20-30%
longer.
1world1love skrev:
Greetings all. I am indexing a set of documents where I am extracting terms
and mapping them to a controlled vocabulary and then placing the matched
vocabulary in a keyword field.
One could also say you are classifying your data based on keywords in
the text?
What I want to
Sumit,
The class you'll end up subclassing from would be:
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/Similarity.htmlor
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/DefaultSimilarity.html
On an IndexSearcher
Ha, you know it never occurred to me that the driver might do this for
me...I'll test it out. Thanks,
--tim
- "Michael McCandless" <[EMAIL PROTECTED]> wrote:
> Oh then I don't think you need a custom deletion policy.
>
> A single NFS client emulates delete-on-last-close semantics. Ie, i
is there any way to change the score of the documents.
Actually i want to modify the scores of the documents dynamically, everytime
for a given query the results will be sorted according to "lucene scoring
formula + an equation".
how can i do that...i saw that lucene scoring page but i am not gett
I'm using lucene 2.2.0.
I'm in the process of re-writing some queries to build BooleanQueries
instead of using query parser.
Bypassing query parser provides almost an order of magnitude improvement
for very large queries, but then the search performance takes 20-30%
longer. I'm adding boost valu
Greetings all. I am indexing a set of documents where I am extracting terms
and mapping them to a controlled vocabulary and then placing the matched
vocabulary in a keyword field. What I want to know is if there is a way to
store the original term location with the keyword field?
Example Text: "T
Thanks Mark-
I'm very much a newbie in all this patching stuff, but I don't think I'm
using anything other than built-in Eclipse functionality using team->apply
patch. And it's clearly not working well.
I took a look at TortoiseSVN but I think it's way overkill for me-- oh
well; maybe I'll just
Hi,
Thanks for your reply.
I can't think of any way to ensure fair file descriptor usage when
there are many active instances of IndexSearcher (all containing
IndexReader) running. Our project installations tend to run on heavily
loaded sites, where a lot of information is read and written at the
Actually you do need to make a new IndexSearcher every time you
reopen a new IndexReader.
However, that should not lead to leaking file descriptors. All open
files are held by IndexReader (not IndexSearcher), so as long as you
are properly closing your IndexReader's you shouldn't use up
Hi,
Another newbie here...using Lucene 2.3.1 on Linux. Hopefully anyone
could advice me on /subj/.
Both IndexSearcher Javadoc and Lucene FAQ says the IndexSearcher
should be reused as it's thread safe. That's OK.
Now if I have index changed, I need to reopen the IndexReader that is
associated wit
Are you using subclipse to apply the patch? Its not very good at it. I
use TortoiseSVN for patching, as its much smarter about these things.
With TortoiseSVN, you just patch from the root dir and it knows you are
referring to the contrib folder thats under the root directory (the
directory you
I have downloaded the Lucene (core, 2.3.1) code and created a project
using Eclipse (pointing to src/java) to use it. That works fine, along
with the contrib highlighter jar file from the standard distribution.
I have also successfully added an additional Eclipse project for the
(standard) High
Nutch has a ontology plugin based on Jena.
http://wiki.apache.org/nutch/OntologyPlugin
I haven't used it. Just by looking at the source code, it seems it just
a Owl parser. So apparently it only works with sources defined in OWL
format, not others such as RDF. I think you need to extend the sou
Borgman, Lennart a écrit :
Is there any possibility to use a thesaurus or an onthology when
indexing/searching with Lucene?
Yes. the WordNet contrib do that. And with a token filter, it's easy to
use your own.
What do you wont to do?
M.
---
Is there any possibility to use a thesaurus or an onthology when
indexing/searching with Lucene?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hi Donna - See previous post below that may help. Tom
Hi,
In case this is of help to others:
Crux of problem:
I wanted numbers and characters such as # and + to be considered.
Solution:
implement a LowercaseWhitespaceAnalyzer and a
Lowerc
Hi,
We have built a data import tool which can read from Databases and add
them to Solr. We found that making content available for full text
search and faceted search was a common use case and usually everyone
ends up writing a custom ETL based tool for this task. Therefore we're
contributing thi
Oh then I don't think you need a custom deletion policy.
A single NFS client emulates delete-on-last-close semantics. Ie, if
the deletion of the file, and the held-open file handles, happen
through a single NFS client, then it emulates delete-on-last-close
semantics for you, by creating t
Well, first off, sometimes the thing being indexed isn't a string, so
you have no stringValue to get its length. It could be a Reader or a
TokenStream.
Second off, it's conceivable that an analyzer computes its own
"interesting" offsets that are not in fact simple indices into the
stri
No, I have multiple readers in the same VM. I track open readers within my VM
and save those reader's commit points until the readers are gone.
--tim
- "Michael McCandless" <[EMAIL PROTECTED]> wrote:
> OK got it. Yes, sharing an index over NFS requires your own
> DeletionPolicy.
>
>
NO_NORMS means "do index the field as a single token (ie, do not
tokenize the field), and, do not store norms for it".
Mike
On Mar 5, 2008, at 5:20 AM, <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> wrote:
Hm, what exactly does NO_NORM mean?
Thank you
--
Hm, what exactly does NO_NORM mean?
Thank you
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Jake Mannix a écrit :
Gabriel,
You can make this search much more efficient as follows: say that you have
a method
public BooleanQuery createQuery(Collection allowedUUIDs);
that works as you describe. Then you can easily create a useful reusable
filter as follows:
Filter filter = new Ca
Do you know if there will be side-effects if we replace in
DocumentWriter$FieldData#invertField
offset = offsetEnd+1;
by
offset = stringValue.length();
I still not understand the reason of such choice for the incrementation
of the start offset.
Regards.
Michael McCandless wrote:
This is ho
OK got it. Yes, sharing an index over NFS requires your own
DeletionPolicy.
So presumably you have readers on different machines accessing the
index over NFS, and then one machine that does the writing to that
index? How so you plan to get which commit point each of these
readers is c
This is how Lucene has worked for quite some time (since 1.9).
When there are multiple fields with the same name in one Document,
each field's offset starts from the last offset (offset of the last
token) seen in the previous field. If tokens are skipped at the end
there's no way IndexWri
Correct, they are logically orthogonal, and I agree the API is
somewhat confusing since "NO_NORMS" is mixing up two things.
To get a tokenized field without norms you can create the field with
Index.TOKENIZED, and then call setOmitNorms(true).
Note that norms "spread" during merges, so, i
30 matches
Mail list logo