Re: combine wildcard and phrase query

2008-03-07 Thread Chris Hostetter
: No, as far as I know you can't combine wildcards in phrases. This would The QueryParser doesn't support it, and there is no native query type for it, but if you are willing to do the query expansion yourself, you can build a MultiPhraseQuery (where you generate the terms using a WildcardTerm

Re: Lucene for Sentiment Analysis

2008-03-07 Thread Bob Carpenter
Aaron Schon wrote: ...I was wondering if taking a bag of words approach might work. For example chunking the sentences to be analyzed and running a Lucene query against an index storing sentiment polarity. Has anyone had success with this approach? I do not need a super accurate system, someth

Re: Offset Questions

2008-03-07 Thread Steve Suppe
Hi Erick, Thanks for the response. I think I'm starting to get the hang of this. That's a really good insight, but I'm wondering how to handle that if a document can have multiple instances of the same field. So, instead of Author, say, City names that are mentioned. But, as you said, I co

Re: Offset Questions (Follow-Up)

2008-03-07 Thread Erick Erickson
Our mails are crossing Not that I know of. But why don't you just index (or maybe just store) a separate field containing your offset information? Something like title_offset with, say, a comma-separated pair denoting char position and length that you then read in at search time and parse.

Re: Offset Questions

2008-03-07 Thread Erick Erickson
What is your analyzer doing? Let's assume you're trying to index the title and that your entire text is "this is a book and HERE IS THE TITLE." I *think* your underlying analyzer should be returning 4 tokens with starts of 20 for HERE, 25 for IS, 28 for THE and 32 for TITTLE, with appropriate en

Re: Boolean Query search performance

2008-03-07 Thread Chris Hostetter
: > additional parens normally indicates that you are actually creating an : > extra layer of BooleanQueries (ie: a BooleanQuery with only one clause for : look here, : parens will also be add is each term has a boost value larger than 1.0. i think you are missreading that code. the "needParens

Re: combine wildcard and phrase query

2008-03-07 Thread Erick Erickson
Have you considered indexing that field UN_TOKENIZED? Make sure you build your queries that way too. I'm not at *all* clear about how this works with wildcards, so you'll have to test that. This assumes you never want to just be able to search on LA and get a hit. Best Erick On Fri, Mar 7, 2008

Re: Offset Questions (Follow-Up)

2008-03-07 Thread Steve Suppe
OK, I think I understand what's going on - it looks like I am able to set the token for the full author name (Say, "Steve Suppe") with the correct offsets, but the analyzer takes it once step further and tokenizes 'Steve' and 'Suppe' which is giving me a lot more generated offsets and is confus

Offset Questions

2008-03-07 Thread Steve Suppe
Hi all, I'm trying to index documents so that a) I have all the documents indexed 'normally' (in that I can search for documents that match certain words, and b) parts of the document that I consider important, such as author and title are ALSO stored in their own indexed fields. I have (a)

Re: applying patch in Eclipse to get SpanHighlighter functionality

2008-03-07 Thread Grant Ingersoll
From the command line, assuming you have the patch command: patch -p 0 -i Is how I apply patches that were generated using svn diff. -Grant On Mar 5, 2008, at 1:30 PM, Donna L Gresh wrote: Thanks Mark- I'm very much a newbie in all this patching stuff, but I don't think I'm using anything

Re: combine wildcard and phrase query

2008-03-07 Thread JensBurkhardt
hi again, referring to my second issue, i've got another question. I mean, this field thing works pretty well but: My fields look like: signature: LA A 100 signature: LA A 201 signature: LA A 202 signature: LA B 200 signature: LC B 300 Now i use getFields and search them. Let's assume i'm searchi

RE: Swapping between indexes

2008-03-07 Thread spring
> With a commit after every add: (286 sec / 10,000 docs) 28.6 ms. > With a commit after every 100 add: (12 sec / 10,000 docs) 1.2 ms. > Only one commit: (8 sec / 10,000 docs) 0.8 ms. Of couse. If you need so less time to create a document than a commit which may take, lets say 10 - 500 ms, will s

RE: Swapping between indexes

2008-03-07 Thread Toke Eskildsen
On Thu, 2008-03-06 at 18:40 +0100, [EMAIL PROTECTED] wrote: > > > With a commit after every add: 30 min. > > > With a commit after 100 add: 23 min. > > > Only one commit: 20 min. [...] > I think it is a real world scenario because one has always the read the docs > from somewhere and offen has

Re: MultiSearcher to overcome the Integer.MAX_VALUE limit

2008-03-07 Thread Toke Eskildsen
On Fri, 2008-03-07 at 00:03 +0100, Ray wrote: > I am currently running a small random text indexer with 400 docs/second. > It will reach 2 billion in around 45 days. If you are just doing it to test large indexes (in terms of document count), then you need to look into your index-generation code.

Re: Hits.

2008-03-07 Thread Grant Ingersoll
You sure can. Or you can use the SetBasedFieldSelector that already exists in o.a.lucene.document. -Grant On Mar 7, 2008, at 5:26 AM, Sergey Kabashnyuk wrote: Hi. I have a question about retrieving information. Lets say I have an index which contents a millions of documents with 2-3 small

Hits.

2008-03-07 Thread Sergey Kabashnyuk
Hi. I have a question about retrieving information. Lets say I have an index which contents a millions of documents with 2-3 small fields an a 10 large fields. Then I run a query which returns me a 1000 of hits. But I am interested only one small field, and I don't want to load other fields.