: No, as far as I know you can't combine wildcards in phrases. This would
The QueryParser doesn't support it, and there is no native query type
for it, but if you are willing to do the query expansion yourself, you can
build a MultiPhraseQuery (where you generate the terms using a
WildcardTerm
Aaron Schon wrote:
...I was wondering if taking a bag of words approach might work. For example chunking the
sentences to be analyzed and running a Lucene query against an index storing sentiment
polarity. Has anyone had success with this approach? I do not need a super accurate
system, someth
Hi Erick,
Thanks for the response. I think I'm starting to get the hang of
this. That's a really good insight, but I'm wondering how to handle that
if a document can have multiple instances of the same field. So, instead
of Author, say, City names that are mentioned. But, as you said, I co
Our mails are crossing
Not that I know of. But why don't you just index (or maybe just store)
a separate field containing your offset information? Something like
title_offset with, say, a comma-separated pair denoting char position
and length that you then read in at search time and parse.
What is your analyzer doing? Let's assume you're trying
to index the title and that your entire text is
"this is a book and HERE IS THE TITLE."
I *think* your underlying analyzer should be returning
4 tokens with starts of 20 for HERE, 25 for IS,
28 for THE and 32 for TITTLE, with appropriate en
: > additional parens normally indicates that you are actually creating an
: > extra layer of BooleanQueries (ie: a BooleanQuery with only one clause for
: look here,
: parens will also be add is each term has a boost value larger than 1.0.
i think you are missreading that code. the "needParens
Have you considered indexing that field UN_TOKENIZED? Make sure
you build your queries that way too. I'm not at *all* clear about
how this works with wildcards, so you'll have to test that.
This assumes you never want to just be able to search on LA
and get a hit.
Best
Erick
On Fri, Mar 7, 2008
OK, I think I understand what's going on - it looks like I am able to set
the token for the full author name (Say, "Steve Suppe") with the correct
offsets, but the analyzer takes it once step further and tokenizes 'Steve'
and 'Suppe' which is giving me a lot more generated offsets and is
confus
Hi all,
I'm trying to index documents so that a) I have all the documents indexed
'normally' (in that I can search for documents that match certain words,
and b) parts of the document that I consider important, such as author and
title are ALSO stored in their own indexed fields.
I have (a)
From the command line, assuming you have the patch command:
patch -p 0 -i
Is how I apply patches that were generated using svn diff.
-Grant
On Mar 5, 2008, at 1:30 PM, Donna L Gresh wrote:
Thanks Mark-
I'm very much a newbie in all this patching stuff, but I don't think
I'm
using anything
hi again,
referring to my second issue, i've got another question. I mean, this field
thing works pretty well but:
My fields look like:
signature: LA A 100
signature: LA A 201
signature: LA A 202
signature: LA B 200
signature: LC B 300
Now i use getFields and search them.
Let's assume i'm searchi
> With a commit after every add: (286 sec / 10,000 docs) 28.6 ms.
> With a commit after every 100 add: (12 sec / 10,000 docs) 1.2 ms.
> Only one commit: (8 sec / 10,000 docs) 0.8 ms.
Of couse. If you need so less time to create a document than a commit which
may take, lets say 10 - 500 ms, will s
On Thu, 2008-03-06 at 18:40 +0100, [EMAIL PROTECTED] wrote:
> > > With a commit after every add: 30 min.
> > > With a commit after 100 add: 23 min.
> > > Only one commit: 20 min.
[...]
> I think it is a real world scenario because one has always the read the docs
> from somewhere and offen has
On Fri, 2008-03-07 at 00:03 +0100, Ray wrote:
> I am currently running a small random text indexer with 400 docs/second.
> It will reach 2 billion in around 45 days.
If you are just doing it to test large indexes (in terms of document
count), then you need to look into your index-generation code.
You sure can. Or you can use the SetBasedFieldSelector that already
exists in o.a.lucene.document.
-Grant
On Mar 7, 2008, at 5:26 AM, Sergey Kabashnyuk wrote:
Hi.
I have a question about retrieving information.
Lets say I have an index which contents a millions of documents with
2-3 small
Hi.
I have a question about retrieving information.
Lets say I have an index which contents a millions of documents with 2-3
small fields an a 10 large fields.
Then I run a query which returns me a 1000 of hits. But I am interested
only one small field, and I don't want to load other fields.
16 matches
Mail list logo