As you have seen the example code for PartOfSpeechTaggingFilter at
http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/analysis/package-summary.html
You can use a custom analyzer to inject metadata tokens into the index at
the same position as the source tokens.
For example, given
Hi Mike,
thanks for the infos.
As far as I know a write.lock is created from an IndexWriter.
So I have to dig into it why an IndexWriter is created just
on starting solr with an optimized index.
The problem, this is only with a huge index.
And also old parts of the index are not cleaned up.
May
On 02/05/2011 23:36, Paul Taylor wrote:
Hi
Nearing completion on a new version of a lucene search component for
the http://www.musicbrainz.org music database and having a problem
with performance. There are a number of indexes each built from data
in a database, there is one index for
Well, it is not only with a huge index.
It is only if ReplicationHandler is in use on a master.
If ReplicationHandler is configured to replicateAfter startup it first
sends a commit via IndexWriter to have a stable index. The left over
of this operation is the write.lock.
So removing
Sorry for coming back to my issue. Can anybody explain why my simple unit
test below fails? Any hint/help appreciated.
Directory directory = new RAMDirectory();
IndexWriter indexWriter = new IndexWriter( directory, new StandardAnalyzer(
Version.LUCENE_31 ), IndexWriter.MaxFieldLength.UNLIMITED
Mer != mer. The latter will be what is indexed because
StandardAnalyzer calls LowerCaseFilter.
--
Ian.
On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss clemens...@mysign.ch wrote:
Sorry for coming back to my issue. Can anybody explain why my simple unit
test below fails? Any hint/help
Unfortunately lowercasing doesn't help.
Also, doesn't the FuzzyQuery ignore casing?
-Ursprüngliche Nachricht-
Von: Ian Lea [mailto:ian@gmail.com]
Gesendet: Dienstag, 3. Mai 2011 11:06
An: java-user@lucene.apache.org
Betreff: Re: fuzzy prefix search
Mer != mer. The latter
I'd assumed that FuzzyQuery wouldn't ignore case but I could be wrong.
What would be the edit distance between mer and merlot? Would it
be less that 1.5 which I reckon would be the value of length(term)*0.5
as detailed in the javadocs? Seems unlikely, but I don't really know
anything about the
PrefixQuery
I'd like the combination of prefix and fuzzy ;-) because people could also type
menlo or märl and in any of these cases I'd like to get a hit on Merlot
(for suggesting Merlot)
-Ursprüngliche Nachricht-
Von: Ian Lea [mailto:ian@gmail.com]
Gesendet: Dienstag, 3. Mai
Hi,
I have been experimenting with using a int payload as a unique identifier, one
per Document. I have successfully loaded them in using the TermPositions API
with something like:
public static void loadPayloadIntArray(IndexReader reader, Term term, int[]
intArray, int from, int to)
Have you tried
Query q = new FuzzyQuery( new Term( test, Mer ), 0.499f);
Sven
-Ursprüngliche Nachricht-
Von: Clemens Wyss [mailto:clemens...@mysign.ch]
Gesendet: Dienstag, 3. Mai 2011 10:57
An: java-user@lucene.apache.org
Betreff: AW: fuzzy prefix search
Sorry for coming back to my
Then why not do that? Add a PrefixQuery and a FuzzyQuery to a
BooleanQuery and use that.
--
Ian.
On Tue, May 3, 2011 at 10:25 AM, Clemens Wyss clemens...@mysign.ch wrote:
PrefixQuery
I'd like the combination of prefix and fuzzy ;-) because people could also
type menlo or märl and in any of
I feel like we are back to Basic ;)
If you keep running line 40 over and over on the same memory index, do
you see a slowdown?
Mike
http://blog.mikemccandless.com
On Mon, May 2, 2011 at 1:19 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
Hi,
I think this describes what's going on:
I had a look into the 3.0 implementation
The calculation of the similarity is
1 - (edit distance / min (string 1 length, string 2 length)
As opposed to the levenstein in spellchecker
1 - (edit distance / max (string 1 length, string 2 length)
So, the similarity is 1 - ( 3 /
Is this calculation intended or a bug?
-Ursprüngliche Nachricht-
Von: Biedermann,S.,Fa. Post Direkt [mailto:s.biederm...@postdirekt.de]
Gesendet: Dienstag, 3. Mai 2011 12:00
An: java-user@lucene.apache.org
Betreff: AW: fuzzy prefix search
I had a look into the 3.0 implementation
(11/03/01 21:16), Amel Fraisse wrote:
Hello,
The MoreLikeThisHandler could include higlighting ?
Is it true to define a MoreLikeThisHandler like this: ?
requestHandler name=/mlt
class=org.apache.solr.handler.MoreLikeThisHandler
lst name=defaults
bool
On Tue, May 3, 2011 at 5:35 AM, Chris Bamford
chris.bamf...@talktalk.net wrote:
Hi,
I have been experimenting with using a int payload as a unique identifier,
one per Document. I have successfully loaded them in using the TermPositions
API with something like:
public static void
I don't know.
But changing it now would cause trouble in many applications...
For our applications we reimplemented fuzzy query so that we can pass along a
org.apache.lucene.search.spell.StringDistance instance that holds the
similarity algorithm of choice.
--
Sven
-Ursprüngliche
Hi,
I didn't read this thread closely, but just in case:
* Is this something you can handle with synonyms?
* If this is for English and you are trying to handle typos, there is a list of
common English misspellings out there that you could use for this perhaps.
* Have you considered n-gramming
Hi,
2011/5/3 Michael McCandless luc...@mikemccandless.com:
I feel like we are back to Basic ;)
If you keep running line 40 over and over on the same memory index, do
you see a slowdown?
Yes. I've tested running same query list (~3,5 k queries) on the same
MemoryIndex instance and after a
Hi,
2011/5/3 Michael McCandless luc...@mikemccandless.com:
I feel like we are back to Basic ;)
If you keep running line 40 over and over on the same memory index, do
you see a slowdown?
Yes. I've tested running same query list (~3,5 k queries) on the same
MemoryIndex instance and
Im receiving a number of searches with many ORs so that the total number
of matches is huge ( 1 million) although only the first 20 results are
required. Analysis shows most time is spent scoring the results. Now it
seems to me if you sending a query with 10 OR components, documents that
Hi All,
I want to know any inbuilt method in lucene that can help me to fix the
number of searched terms for a given field e.g.
Suppose I have given content:(text1 text2 text3 text4 text5) to search and
want to limit it to 3 words only i.e. content:(text1 text2 text3)
Please help.
Thanks,
Why do you want to do this? I'm wondering if this is an XY problem...
See: http://people.apache.org/~hossman/#xyproblem
Best
Erick
On Tue, May 3, 2011 at 7:55 AM, harsh srivastava harshc...@gmail.com wrote:
Hi All,
I want to know any inbuilt method in lucene that can help me to fix the
That seems to work. Thank you!
Sincerely,
Chris Salem
Development Team
Main Sequence Technologies, Inc.
PCRecruiter.net - PCRecruiter Support
ch...@mainsequence.net
P: 440.946.5214 ext 5458
F: 440.856.0312
This email and any files transmitted with it may contain confidential
information
How can I convert this Similariity method to use 3.1 (currently using
3.0.3), I understand I have to replace lengthNorm() wuth computerNorm()
, but fieldlName is not a provided parameter in computerNorm() and
FieldInvertState does not contain the fieldname either. I need the field
because I
On Tue, May 3, 2011 at 9:57 AM, Paul Taylor paul_t...@fastmail.fm wrote:
How can I convert this Similariity method to use 3.1 (currently using
3.0.3), I understand I have to replace lengthNorm() wuth computerNorm() ,
but fieldlName is not a provided parameter in computerNorm() and
On 03/05/2011 15:06, Robert Muir wrote:
On Tue, May 3, 2011 at 9:57 AM, Paul Taylorpaul_t...@fastmail.fm wrote:
How can I convert this Similariity method to use 3.1 (currently using
3.0.3), I understand I have to replace lengthNorm() wuth computerNorm() ,
but fieldlName is not a provided
On Tue, May 3, 2011 at 10:29 AM, Paul Taylor paul_t...@fastmail.fm wrote:
I assume this would be the correct way to fix the code for 3.1.0
Yes, thats correct.
public float computeNorm(String field, FieldInvertState state) {
//This will match both artist and label aliases and is
How does an simple Analyzer look that just n-grams the docs/fields.
class SimpleNGramAnalyzer extends Analyzer
{
@Override
public TokenStream tokenStream ( String fieldName, Reader reader )
{
EdgeNGramTokenFilter... ???
}
}
-Ursprüngliche Nachricht-
Von: Otis Gospodnetic
Clemens,
Something a la:
public TokenStream tokenStream (String fieldName, Reader r) {
return nw EdgeNGramTokenFilter(new KeywordTokenizer(r),
EdgeNGramTokenFilter.Side.FRONT, 1, 4);
}
Check out page 265 of Lucene in Action 2.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene -
But doesn't the KeyWordTokenizer extract single words out oft he stream? I
would like to create n-grams on the stream (field content) as it is...
-Ursprüngliche Nachricht-
Von: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Gesendet: Dienstag, 3. Mai 2011 21:31
An:
Clemens - that's just an example. Stick another tokenizer in there, like
WhitespaceTokenizer in there, for example.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
From: Clemens Wyss
33 matches
Mail list logo