Hi Stephen,
We precompute a variant of P(z,d) during indexing, and do the first 3
steps. The resulting documents are ordered by payload score, which is
basically z in our case. We don't currently care about P(t,z) but it
seems like a good thing to have for disambiguation purposes.
So anyway, I ha
Hi,
Be sure to use the same Solr version as your Lucene version (if >= 3.1) and
this is example code from test case:
WordDelimiterFilterFactory fact = new WordDelimiterFilterFactory();
// we dont need this if we dont load external exclusion files:
// ResourceLoader loader = new Solr
How do you use the WordDelimiterFilterFactory()? I tried the following code:
TokenStream out = new LowerCaseTokenizer(reader);
WordDelimiterFilterFactory wdf = new WordDelimiterFilterFactory();
out = wdf.create(out);
...
But I am getting a runtime error:
Exception in thread "main" java.lang.Ab
Hi,
There is WordDelimiterFilter in Solr that was also ported to Lucene Analysis
module in Lucene trunk (4.0). In 3.x yu can still add solr.jar to your
classpath and WordDelimiterFilterFactory to produce one (WordDelimiterFilter
itself is package-private).
-
Uwe Schindler
H.-H.-Meier-Allee 63
List,
I have written my own CustomAnalyzer, as follows:
public TokenStream tokenStream(String fieldName, Reader reader) {
// TODO: add calls to RemovePuncation, and SplitIdentifiers here
// First, convert to lower case
TokenStream
Again there is nothing wrong with the quotes: its instead how you are
configuring the analysis for this field.
If you put stuff in quotes and your analyzer breaks it into multiple
tokens, then queryparser forms a phrase query. You must index
positions to support phrase queries.
Normally DOCS_ONLY
Still no difference, it may be because of some other hidden
bug.Anyway, adding freq and
positions will be a no - no because of space :) so
bye bye quotes.
Thank you
Sujit,
Thanks for your reply, and the link to your blog post, which was
helpful and got me thinking about Payloads.
I still have one more question. I need to be able to compute the
Sim(query q, doc d) similarity function, which is defined below:
Sim (query q, doc d) = sum_{t in q} sum_{z} P(t, z
if you use standardanalyzer it will break "john doe" into 2 tokens and
form a phrase query.
if you want to do phrase queries, don't set the indexoptions to
DOCS_ONLY. otherwise they won't work.
if what you want is for "john doe" to only be 1 term without
positions, then use KeywordAnalyzer, and DO
field = new Field("author",(author).toLowerCase(),Field.Store.NO,
Field.Index.NOT_ANALYZED);
field.setIndexOptions(FieldInfo.IndexOptions.DOCS_ONLY);
field.setOmitNorms(true);
When in the above configuration i switched from NOT_ANALYZED to ANALYZED,
luke's results for autho
Close the first index writer?
http://lmgtfy.com/?q=lucene+Cannot+overwrite+%22_0.fdt%22+file
If you can't find the answer and need to post again, include as a
minimum details of the OS and lucene version that you are using.
--
Ian.
On Tue, Nov 29, 2011 at 12:15 PM, Rohan A Ambasta
wrote:
>
>
Hi,
I get the error - "Cannot Overwrite 0.fdt" when I start indexing.
Detail TestCase -
1) Performing indexing for the first time work fine.
2) Then I do search and I get the search results
3) After search, If I again start indexing I get the error - "Cannot
overwrite 0.fdt"
Has anybody faced
A google search of "lucene stemming wildcards" finds some hits
implying these don't work well together.
http://lucene.472066.n3.nabble.com/Conflicts-with-Stemming-and-Wildcard-Prefix-Queries-td540479.html
may be a solution.
--
Ian.
On Tue, Nov 29, 2011 at 10:39 AM, SBS wrote:
>> This is very
> This is very hard to follow. I for one don't recall what you
> described or what you are looking for.
Sorry about that, I am using the web interface where the context of my post
is visible to all.
To sum up, my original post was:
> It seems that when I use a PorterStemFilter in my custom an
This is very hard to follow. I for one don't recall what you
described or what you are looking for.
Have you worked through
http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F?
--
Ian.
On Tue, Nov 29, 2011 at 7:25 AM, SBS wrote:
> I am applying the P
15 matches
Mail list logo