Re: index enforcing query terms to appear within the same sentence

2011-03-11 Thread Ian Lea
The example code in http://lucene.472066.n3.nabble.com/Problem-searching-in-the-same-sentence-td1501269.html reads custom standard analyzer: public class MyStandardAnalyzer extends StandardAnalyzer implements IndexFields { public MyStandardAnalyzer(Version matchVersion) {

Re: index enforcing query terms to appear within the same sentence

2011-03-10 Thread Michael Wiegand
Conceptually, I think I know what to do. Unfortunately, with the given interfaces of Lucene I have some difficulty. If I add the content of a document sentence by sentence, i.e. line by line, (using a multi-valued field), there are only two constructors possible: Field(String name, String val

Re: index enforcing query terms to appear within the same sentence

2011-03-04 Thread Ian Lea
Another index, or a different field in the same index but without the modified gaps. Maybe PerFieldAnalyzerWrapper would help - one Analyzer for field x with modified gaps and a different one for field y with standard gaps. -- Ian. On Fri, Mar 4, 2011 at 2:40 PM, Michael Wiegand wrote: > Than

Re: index enforcing query terms to appear within the same sentence

2011-03-04 Thread Michael Wiegand
Thank you for all these useful hints! If I use the multi-valued fields in combination with "modified" position increments, I would actually distort the shape of a document. For instance, if I would like to compare a retrieval enforcing query term co-occurrence within the same sentence with a co

Re: index enforcing query terms to appear within the same sentence

2011-03-04 Thread Ian Lea
You can use multi valued fields if you play with the position increment gap. See e.g. http://lucene.472066.n3.nabble.com/Problem-searching-in-the-same-sentence-td1501269.html A google search for "lucene indexing sentences" or similar finds that, and more. Different docs can have different field

index enforcing query terms to appear within the same sentence

2011-03-03 Thread Michael Wiegand
Hi, I would like to create an index with Lucene to a document collections of text files. The index should be created in such a way, that for the search I can enforce that query term A and query term B are contained within the same sentence. How should implement the index? Should I have for e