Hi Chitra,
Without having the knowledge of the language, but can you solve the problem not in TokenFilter level
but in CharFilter level, by setting your own mapping definition using MappingCharFilter?
Koji
On 2017/09/27 21:39, Chitra wrote:
Hi Ahmet,
Thank you so much
Hello everyone!
I've developed KEA-lucene [1]. It is an Apache Lucene implementation of KEA [2].
KEA is a program developed by the University of Waikato in New Zealand that automatically extracts
key phrases (keywords) from natural language documents. KEA stands for Keyphrase Extraction
Algo
Hi Taher,
Solr has the function of result grouping.
I think it has two steps. First, it tries to find how many groups are there in
the result
and choose top groups (say 10 groups) using a priority queue. Second, provide
10 priority
queues for each groups and search again to collect second or a
Hi ajinkya,
In last week, I had a technical talk about NLP4L at Lucene/Solr meetup:
http://www.meetup.com/Downtown-SF-Apache-Lucene-Solr-Meetup/events/223899054/
In my talk, I told about the implementation idea of Learning to Rank using
Lucene.
Please take a look at page 48 to 50 of the follow
Hi Clemens,
NLP4L, which stands for Natural Language Processing for Lucene, has a function
for browsing Lucene index aside from NLP tools. It supports 5.x index format.
https://github.com/NLP4L/nlp4l#using-lucene-index-browser
Thanks,
Koji
On 2015/04/24 15:10, Clemens Wyss DEV wrote:
From ti
Hi Prateek,
Using Luke, which is a GUI based browser tool for Lucene index, may be a good
start
to see the structure of Lucene index for you.
https://github.com/DmitryKey/luke/
NLP4L also provides CUI based index browser for Lucene users aside from NLP
functions.
https://github.com/NLP4L/nlp
hub.com/INL/BlackLab/wiki/Blacklab-query-tool
-- Jack Krupansky
On Tue, Feb 24, 2015 at 1:40 AM, Koji Sekiguchi
wrote:
Hello,
Doesn't Lucene have a Tokenizer/Analyzer for Brown Corpus?
There doesn't seem to be such tokenizers/analyzers in Lucene.
As I didn't want re-inventing th
Hello,
Doesn't Lucene have a Tokenizer/Analyzer for Brown Corpus?
There doesn't seem to be such tokenizers/analyzers in Lucene.
As I didn't want re-inventing the wheel, so I googled, I got
the list of snippets that include "the quick brown fox..." :)
Koji
---
Hi Tomoko,
Please don't hesitate to open a JIRA issue and give your patch to fix
the error you found.
Koji
--
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
(2014/12/14 11:11), Tomoko Uchida wrote:
Sorry again,
I checked the o.a.l.u.fst.TestFSTs.j
ays rather pleasant for the LSI/LSA-like approach,
but precisely this is mathematically opaque.
Maybe it's more a question of presentation.
Paul
On 20 nov. 2014, at 16:24, Koji Sekiguchi wrote:
Hi Paul,
I cannot compare it to SemanticVectors as I don't know SemanticVectors.
But w
At least I see more transparent math in the web-page.
> Maybe this helps a bit?
>
> SemanticVectors has always rather pleasant for the LSI/LSA-like approach, but
> precisely this is mathematically opaque.
> Maybe it's more a question of presentation.
>
> Paul
>
>
Rome'), and vector('king') - vector('man') + vector('woman')
is close to
vector('queen')
Thanks,
Koji
(2014/11/20 20:01), Paul Libbrecht wrote:
> Hello Koji,
>
> how would you compare that to SemanticVectors?
>
> paul
>
> On
Hello,
It's my pleasure to share that I have an interesting tool "word2vec for Lucene"
available at https://github.com/kojisekig/word2vec-lucene .
As you can imagine, you can use "word2vec for Lucene" to extract word vectors
from Lucene index.
Thank you,
Koji
--
http://soleami.com/blog/compar
Hi Michael,
I haven't executed this yet, but can you try this:
SpanNotQuery(SpanNearQuery("George Washington"), SpanNearQuery("George Washington
Carver"))
Koji
--
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
(2014/07/11 23:20), Michael Ryan wro
Hi Priyanka,
> How can I add Maching Learning Part in Apache Lucene .
I think your question is too wide to asnwer because machine learning
covers a lot of things...
Lucene has already got a text categorization function which is a well
known task of NLP and NLP is a part of machine learning. I'v
Hi Priyanka,
> How can I add Maching Learning Part in Apache Lucene .
I think your question is too wide to asnwer because machine learning
covers a lot of things...
Lucene has already got a text categorization function which is a well
known task of NLP and NLP is a part of machine learning. I'v
ili wrote:
cool Koji, thanks a lot for sharing.
Some useful points / suggestions come out of it, let's see if we can follow
up :)
Regards,
Tommaso
2014-03-07 3:30 GMT+01:00 Koji Sekiguchi :
Hello,
I just posted an article on Comparing Document Classification Functions
of Lucene and Mahout.
Hello,
I just posted an article on Comparing Document Classification Functions
of Lucene and Mahout.
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
Comments are welcome. :)
Thanks!
koji
--
http://soleami.com/blog/comparing-document-classification
Hi Russell,
Seems that the error messages says that the implementing class for
OffsetAttribute
cannot be found in your classpath on the (Pig?) environment.
There seems to be implementing classes OffsetAttributeImpl and Token, according
to Javadoc:
http://lucene.apache.org/core/4_6_0/core/org/a
(13/11/27 9:19), Scott Smith wrote:
I'm doing some highlighting with the following code fragment:
formatter = new SimpleHTMLFormatter(,
);
Scorer score = new QueryScorer(myQuery);
ht = new Highlighter(formatter, score);
ht.
x for only English..
I need to create Dictionary Index for all languages.I want to know whether
anything like wordnet which i can readily plugin in my application ..
Please Kindly Guide me..
Thanks and Regards
Vignesh Srinivasan.
On Wed, Oct 9, 2013 at 5:56 PM, Koji Sekiguchi wrote:
Hi VIGNESH,
wikipedia is giving for all
languages.
Please kindly help.
On Mon, Oct 7, 2013 at 8:06 PM, Koji Sekiguchi wrote:
(13/10/07 18:33), VIGNESH S wrote:
Hi,
How to implement synonym Search for All languages..
As far as i know,Wordnet has only English Support..Is there any other we
can use to get
(13/10/07 18:33), VIGNESH S wrote:
Hi,
How to implement synonym Search for All languages..
As far as i know,Wordnet has only English Support..Is there any other we
can use to get support for all languages.
I think most people make synonym data manually...
I've never explored Wordnet, but I t
(13/09/04 2:33), David Miranda wrote:
Is there any way to check the similarity of texts with Lucene?
I have the DBpedia indexed and wanted to get the texts more similar
between the abstract and DBpedia another text. If I do a search in the
abstract field, with a particular text the result is not
(13/08/02 17:16), Ankit Murarka wrote:
Hello All,
Just like spellcheck feature which after lot of trouble was Implemented, is it
possible to implement
Complete Phrase Suggest Feature in Lucene 4.3 . So if I enter an incorrect
phrase it can suggest me
few possible valid phrases.
One way could
(13/07/11 22:56), gtkesh wrote:
Hi everyone! I have two questions:
1. What are the cases where Lucene's default tf-idf overperforms BM25? What
are the best use cases where I should use tf-idf or BM25?
2. Are there any user-friendly guide or something about how can I use BM25
algorithm instead
Hi Oliver,
> My questions are:
>
> 1. Why are the overrided lengthNorm() (under Lucene410) or
> computeNorm() (under Lucene350) methods not called during a searching
> process?
Regardless of whether you override the method or not, Lucene framework
calls the method during index time only be
e you shared source code / jar for the same so at it could be used ?
Thanks,
Rajesh
On Mon, May 27, 2013 at 8:44 PM, Koji Sekiguchi wrote:
Hello,
Sorry for cross post. I just wanted to announce that I've written a blog
post on
how to create synonyms.txt file automatically from Wikiped
Hello,
Sorry for cross post. I just wanted to announce that I've written a blog post on
how to create synonyms.txt file automatically from Wikipedia:
http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Hope that the article gives someone a good experience!
koji
(12/04/06 2:34), okayndc wrote:
Hello,
I currently use Lucene version 3.0...probably need to upgrade to a more
current version soon.
The problem that I have is when I test search for a an HTML tag (ex.
), Lucene returns
the highlighted HTML tag ~ which is what I DO NOT want. Is there a way to
"
(12/03/13 2:38), Hassane Cabir wrote:
Hi guys,
I'm using Lucene for my project and I need to calcule how similar two (or
more) documents are, using TFIDF. How to get TFIDF with lucene?
Any insights on this?
Solr has TermVectorComponent which can return tf, df and tf-idf of each term
in a docu
Hi Thushara,
Please use lucene-gosen mailing list for lucene-gosen questions:
http://groups.google.com/group/lucene-gosen
Thanks,
koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/
(12/03/03 6:41), Thushara Wijeratna wrote:
> I'm testing lucene-gosen for Japanese tokenization an
re they running? We added this check
originally as a workaround for a JRE bug... but usually when that bug
strikes the file size is very close (like off by just 1 byte or 8
bytes or something).
Mike McCandless
http://blog.mikemccandless.com
2011/9/9 Koji Sekiguchi:
A user here hit the exception th
Also: what java version are they running? We added this check
originally as a workaround for a JRE bug... but usually when that bug
strikes the file size is very close (like off by just 1 byte or 8
bytes or something).
I think they are using 1.6, but I should ask the minor number.
Could you show
e is very close (like off by just 1 byte or 8
bytes or something).
I think they are using 1.6, but I should ask the minor number.
Could you show me the pointer of the JRE bug you mentioned?
Thank you very much!
koji
Mike McCandless
http://blog.mikemccandless.com
2011/9/9 Koji Sekiguchi:
A user here hit the exception the title says when optimizing. They're using
Solr 1.4
(Lucene 2.9) running on a server that mounts NFS for index.
I think I know the famous "Stale NFS File Handle IOException" problem, but I
think it causes
FileNoutFoundException. Is there any chance to hit the exc
(11/06/22 2:03), Anupam Tangri wrote:
Hi,
We are using lucene 3.2 for our project where I needed to highlight search
matches. I earlier used default highlighter which did not work correctly
all the time.
So, I started using FHV which worked worked beautifully till I started
searching multiple t
Mike,
FVH used to be faster for large docs. I wrote FVH section for Lucene in Action
and it said:
In contrib/benchmark (covered in appendix C), there’s an algorithm
file called highlight-vs-vector-highlight.alg that lets you see the difference
between two highlighters in processing time. As of
(11/05/27 19:57), Joel Halbert wrote:
Hi,
I'm using Lucene 3.0.3. I'm extracting snippets using
FastVectorHighlighter, for some snippets (I think always when searching
for exact matches, quoted) the fragment is null.
Code looks like:
query = QueryParser.escape(query);
(11/05/27 20:56), Pierre GOSSE wrote:
Hi,
Maybe is it related to :
https://issues.apache.org/jira/browse/LUCENE-3087
No, because Joel's problem is FastVectorHighlighter, but LUCENE-3087
is for Highlighter.
koji
--
http://www.rondhuit.com/en/
--
(11/05/24 3:28), Sujit Pal wrote:
> Hello,
>
> My version: Lucene 3.1.0
>
> I've had to customize the snippet for highlighting based on our
> application requirements. Specifically, instead of the snippet being a
> set of relevant fragments in the text, I need it to be the first
> sentence where
(11/05/23 14:36), Weiwei Wang wrote:
> 1. source string: 7
> 2. WhitespaceTokenizer + EGramTokenFilter
> 3. FastVectorHighlighter,
> 4. debug info: subInfos=(777((8,11))777((5,8))777((2,5)))/3.0(2,102),
> srcIndex is not correctly computed for the second loop of the outer for-loop
>
How
(11/03/01 21:16), Amel Fraisse wrote:
Hello,
The MoreLikeThisHandler could include higlighting ?
Is it true to define a MoreLikeThisHandler like this: ?
true
contenu
Thank you for your help.
Amel.
Amel,
1. I think you shou
(11/04/06 14:01), shrinath.m wrote:
If there is a phrase in search, the highlighter highlights every word
separately..
Like this :
I love Lucene
Instead what I want is like this :
I love Lucene
Not sure my mailer problem or not, I don't see the difference between above two.
But reading t
(11/04/01 21:32), shrinath.m wrote:
I was wondering whats the difference between the Lucene's 2 implementation of
highlighters...
I saw the javadoc of FVH, but it only says "another implementation of Lucene
Highlighter" ...
Description section in the javadoc shows the features of FVH:
https://
(11/03/19 6:16), madhuri_1...@yahoo.com wrote:
Hi,
I am new to lucene ... I have a question while implementing similarity search
using MoreLikeThis query. I have written a small program but it is not giving
any results. In my index file I have both strored and unstored(analyzed) fields.
Sampl
Does IndexWriter (or somewhere else) have the method such that
it gets the number of updated documents before commit?
you have maxDocs which gives you the maxdocid-1 but this might not be
super accurate since there might have been merges going on in the
background. I am not sure if this number yo
Hello,
Does IndexWriter (or somewhere else) have the method such that
it gets the number of updated documents before commit?
I have an optimized index and I'm using iw.updateDocument(Term,Document)
with the index, and before commit, I'd like to know the number of updated
documents from IndexWrite
(11/03/07 1:16), Joel Halbert wrote:
Hi,
I'm using FastVectorHighlighter for highlighting, 3.0.3.
At the moment this is highlighting a field which is stored, but not
compressed. It all works perfectly.
I'd like to compress the field that is being highlighted, but it seems
like the new way to c
(11/01/25 2:14), Paul Taylor wrote:
On 22/01/2011 15:43, Koji Sekiguchi wrote:
(11/01/20 22:19), Paul Taylor wrote:
Trying to extend MappingCharFilter so that it only changes a token if the
length of the token
matches the length of singleMatch in NormalizeCharMap (currently the
singleMatch
(11/01/20 22:19), Paul Taylor wrote:
Trying to extend MappingCharFilter so that it only changes a token if the
length of the token
matches the length of singleMatch in NormalizeCharMap (currently the
singleMatch just has to be
found in the token I want ut to match the whole token). Can this be
Hi Mike,
Hmm are you only gathering the MUST_NOT TermScorers? (In which case
I'd expect that the .docID() would not match the docID being
collected). Or do you also see .docID() not matching for SHOULD and
MUST sub queries?
The snippet I copy-n-paste at previous mail was not appropriate.
Sor
Hello,
I'd like to know which field got hit in each doc in the hit results.
To implement it, I thought I could use Scorer.freq() which
was introduced 3.1/4.0:
https://issues.apache.org/jira/browse/LUCENE-2590
But I didn't become successful so far. What I did is:
- in each visit methods in MockS
(10/09/22 3:24), Devshree Sane wrote:
I am using the FastVectorHighlighter for retrieving snippets from the index.
I am a bit confused about the parameters that are passed to the
FastVectorHighlighter.getBestFragments() method. One parameter is a document
id and another is the maximum number o
(10/07/20 7:31), Joe Hansen wrote:
Hey All,
I am using Apache Lucene (2.9.1) and its fast and it works great! I
have a question in connection with Apache PDFBox.
The following command creates a Lucent Document from a PDF file:
Document document =
org.apache.pdfbox.searchengine.lucene.LucenePDFD
(10/07/09 19:30), manjula wijewickrema wrote:
Uwe, thanx for your comments. Following is the code I used in this case.
Could you pls. let me know where I have to insert UNLIMITED field length?
and how?
Tanx again!
Manjula
Manjula,
You can set UNLIMITED field length to IW constructor:
http
(10/05/19 13:58), Li Li wrote:
hi all,
I read lucene in action 2nd Ed. It says SimpleSpanFragmenter will
"make fragments that always include the spans matching each document".
And also a SpanScorer existed for this use. But I can't find any class
named SpanScorer in lucene 3.0.1. And the res
(10/05/12 20:32), Midhat Ali wrote:
Is it possible to return entire field contents instead of a fixed size
fragment. In Highlightrer, there is a Nullfragmenter. Whats's its
counterpart in FastVectorhighlighter
Currently, FVH doesn't have such function. I've opened a JIRA issue:
https://iss
Yonik Seeley wrote:
On Sat, May 1, 2010 at 8:23 PM, Koji Sekiguchi wrote:
Yonik Seeley wrote:
Values are not interned, but in a single field cache entry (String[])
the same String object is used for all docs with that same value.
Yeah, you are right. Because I could see the
Yonik Seeley wrote:
2010/4/30 Koji Sekiguchi :
Are Strings that are got via FieldCache.DEFAULT.getStrings( reader,
field ) interned?
Since I have a requirement for having FieldCaches of some
fields in 250M docs index, I'd like to estimate memory
consumed by FieldCache.
By looki
Hello,
Are Strings that are got via FieldCache.DEFAULT.getStrings( reader,
field ) interned?
Since I have a requirement for having FieldCaches of some
fields in 250M docs index, I'd like to estimate memory
consumed by FieldCache.
By looking at FieldCacheImpl source code, it seems that
field name
Stephen Greene wrote:
Hi Koji,
Thank you. I implemented a solution based on the FieldTermStackTest.java
and if I do a search like "iron ore" it matches iron or ore. The same is
true if I specify iron AND ore.
The termSetMap[0].value[0] = ore and termSetMap[0].value[1] = iron.
What am I missing
Hi Steve,
> is there a way to access a TermVector containing only matched terms,
> or is my previous approach still the
So you want to access FieldTermStack, I understand.
The way to access it, I wrote it at previous mail:
You cannot access FieldTermStack from FVH, but I think you
can create i
Stephen Greene wrote:
Hi Koji,
An additional question. Is it possible to access the FieldTermStack from
the FastVectorHighlighter after the it has been populated with matching
terms from the field?
I think this would provide an ideal solution for this problem, as
ultimately I am only concerned
Stephen Greene wrote:
Hi Koji,
Thank you for your reply. I did try the QueryScorer without success, but
I was using Lucene 2.4.x
Hi Steve,
I thought you were using 2.9 or later because you mentioned
FastVectorHighlighter in your previous mail (FVH was first
introduced in 2.9). If I remembe
Stephen Greene wrote:
Hello,
I am trying to determine begin and end offsets for terms and phrases
matching a query.
Is there a way using either the highlighter or fast vector highlighter
in contrib?
I have already attempted extending the highlighter which would match
terms but would not
Paul Taylor wrote:
I'm trying to create a CharFilter which works like MappingCharFilter
but only changes the matchString if the match String matches the whole
field rather than a portion in the field (this is to handle some
exceptions wiyout effecting other data). Trouble is the code in
Mappin
-Arne- wrote:
Hi Koji,
thanks for your answer. Can you help me a once again? What exactly I
suposse to do?
The concrete program in my mind here:
public class TestHighlightTruncatedSearchQuery {
static Directory dir = new RAMDirectory();
static Analyzer analyzer = new BiGramAnalyzer();
-Arne- wrote:
Hi,
I'm using Lucene 3.0.0 and have large documents to search (logfiles
0,5-20MB). For better search results the query tokens are truncated left and
right. A search for "user" is made to "*user*". The performance of searching
even complex queries with more than one searchterm is qu
halbtuerderschwarze wrote:
query.rewrite() didn't help, for queries like ipod* or *ipod I still didn't
get fragments.
Arne
You're right. This is still an open issue:
https://issues.apache.org/jira/browse/LUCENE-1889
Koji
--
http://www.rondhuit.com/en/
--
Marc Sturlese wrote:
I have FastVectorHighlighter working with a query like:
title:Ipod OR title:IPad
but it's not working when (0 snippets are returned):
title:Ipod OR content:IPad
This is true when you are going to highlight IPad in title field and
set fieldMatch to true at the FVH constr
Paul Taylor wrote:
CharStream.Found it at
http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/analysis/PatternReplaceFilter.java?revision=804726&view=markup,
BTW why not ad this to the Lucene coebase rather than solr code base.
Unfortunately it doesn't address my problem be
Weiwei Wang wrote:
Hi, all
I currently need a TokenFilter to break token season07 into two tokens
season 07
I'd recommend you to refer WordDelimiterFilter in Solr.
Koji
--
http://www.rondhuit.com/en/
-
To unsubscri
Weiwei Wang wrote:
The offset is incorrect for PatternReplaceCharFilter so the hilighting
result is wrong.
How to fix it?
As I noted in the comment of the source, if you produce a phrase from a term
and try to highlight a term in the produced phrase, the highlighted snippet
will be undesira
Koji Sekiguchi wrote:
Paul Taylor wrote:
I want my search to treat 'No. 1' and 'No.1' the same, because in our
context its one token I want 'No. 1' to become 'No.1', I need to do
this before tokenizing because the tokenizer would split one value
into
Paul Taylor wrote:
I want my search to treat 'No. 1' and 'No.1' the same, because in our
context its one token I want 'No. 1' to become 'No.1', I need to do
this before tokenizing because the tokenizer would split one value
into two terms and one into just one term. I already use a
NormalizeM
MappingCharFilter can be used to convert c++ to cplusplus.
Koji
--
http://www.rondhuit.com/en/
Anshum wrote:
How about getting the original token stream and then converting c++ to
cplusplus or anyother such transform. Or perhaps you might look at
using/extending(in the non java sense) some ot
Or you can use MappingCharFilter if you are using Lucene 2.9.
You can convert "c++" into "cplusplus" prior to running Tokenizer.
Koji
--
http://www.rondhuit.com/en/
Ian Lea wrote:
You need to make sure that these terms are getting indexed, by using
an analyzer that won't drop them and using
Hi Paul,
CharFilter should work for this case. How about this?
public class MappingAnd {
static final String[] DOCS = {
"R&B", "H&M", "Hennes & Mauritz", "cheeseburger and french fries"
};
static final String F = "f";
static Directory dir = new RAMDirectory();
static Analyzer analyzer =
Hi Ryan,
I've looked for it when I implemented SOLR-64 patch, but not there.
So I implemented HierarchicalTokenFilterFactory.
I've not looked into your patch yet, but my impression is that probably
we can share such TokenFilter.
Thanks,
Koji
Ryan McKinley wrote:
Hello-
I'm looking for a way
. Thanks a lot for
> the test case - made this one fun.
>
> - Mark
>
> Koji Sekiguchi wrote:
>
>> Hello,
>>
>> This problem was reported by my customer. They are using Solr 1.3
>> and uni-gram, but it can be reproduced with Lucene 2.9 and
>> White
Hello,
This problem was reported by my customer. They are using Solr 1.3
and uni-gram, but it can be reproduced with Lucene 2.9 and
WhitespaceAnalyzer.
The program for reproducing is at the end of this mail.
Query:
(f1:"a b c d" OR f2:"a b c d") AND (f1:"b c g" OR f2:"b c g")
The snippet we expe
tsuraan wrote:
Make that "Collector" (new as of 2.9).
HitCollector is the old (deprecated as of 2.9) way, which always
pre-computed the score of each hit and passed the score to the collect
method.
Where can I find docs for 2.9? Do I just have to check out the lucene
trunk and run javado
CHANGES.txt said that we can use HitCollectorWrapper:
12. LUCENE-1575: HitCollector is now deprecated in favor of a new
Collector abstract class. For easy migration, people can use
HitCollectorWrapper which translates (wraps) HitCollector into
Collector.
But it looks package private?
Thank you,
I'm not sure this is the same case, but there is a report and patch for
CJKTokenizer in JARA:
https://issues.apache.org/jira/browse/LUCENE-973
Koji
Zhang, Lisheng wrote:
Hi,
When I use lucene 2.4.1 QueryParser with CJKAnalyzer, somehow
it always generates an extra space, for example, if the
Another possible factor, if you are using omitTf feature, it causes
phrase query doesn't work.
Koji
Ian Lea wrote:
What does query.toString() say? Are you using standard analyzers with
standard lowercasing, stop words etc?
Knocking up a very simple program/index that demonstrates the problem
John Seer wrote:
Hello,
There is any way that a single document fields can have different analyzers
for different fields?
I think one way of doing it to create custom analyzer which will do field
spastic analyzes..
Any other suggestions?
There is PerFieldAnalyzerWrapper
http://hudson.z
Dan OConnor wrote:
Thanks for the feed back Chris.
Can you (or someone else on the list) tell me about the IndexMerge tool?
Please see:
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/misc/IndexMergeTool.html
Koji
-
John Seer wrote:
Koji Sekiguchi-2 wrote:
If you omit norms when indexing the name field, you'll get same score
back.
Koji
During building I set omit norms, but result doesn't change at all. I am
still getting the same score
I meant if you set nameField.setOmitN
Steven A Rowe wrote:
Hi Ariel,
As Koji mentioned, https://issues.apache.org/jira/browse/SOLR-448 contains a NumberFilter. It
filters out tokens that successfully parse as Doubles. I'm not sure, since the examples you gave
seem to use "," as the decimal character, how this interacts with the
If you omit norms when indexing the name field, you'll get same score back.
Koji
The Seer wrote:
Hello,
I have 5 lucene documents
name: Apple
name: Apple martini
name: Apple drink
name: Apple sweet drink
I am using lucene default similarity and standard analyzer .
When I am searching for
Ariel wrote:
Hi everybody:
I would want to know how Can I make an analyzer that ignore the numbers o
the texts like the stop words are ignored ??? For example that the terms :
3.8, 100, 4.15, 4,33 don't be added to the index.
How can I do that ???
Regards
Ariel
There is a patch for filter
This problem is filed at:
https://issues.apache.org/jira/browse/LUCENE-1489
You may want to take a look at LUCENE-1522 for highlighting N-gram tokens:
https://issues.apache.org/jira/browse/LUCENE-1522
Koji
ito hayato wrote:
> Hi All,
> My name is Hayato.
>
> I have a question for Highlighter
. :)
Program snippets are there regarding Payload/BoostTermQuery/scorePayload().
Koji
On 3/24/09, Koji Sekiguchi wrote:
Seid Mohammed wrote:
Hi All
I want my lucene to index documents and making some terms to have more
boost value.
so, if I index the document "The quick fox jumps ove
Seid Mohammed wrote:
Hi All
I want my lucene to index documents and making some terms to have more
boost value.
so, if I index the document "The quick fox jumps over the lazy dog"
and I want the term fox and dog to have greater boost value.
How can I do that
Thanks a lot
seid M
How about
> first, I rewrite the Similarity(include lengthNorm), but it not
works..., so I modify the lucene source, by set the norm_table =
1.0(all). it can work
If you overrides lengthNorm(), reindexing is needed to take effect.
Koji
There is no additional setting for me...
Koji
Seid Mohammed wrote:
I have trioed Amharic fonts, it displays square like character, may be
there is a kind of setting for it?
Seid M
On 2/19/09, Koji Sekiguchi wrote:
Seid Mohammed wrote:
great,
I have got it
do luke support unicode
Seid Mohammed wrote:
great,
I have got it
do luke support unicode? I am trying lucene in non-english languaguage
Of course. I can see Japanese terms without problems.
Koji
-
To unsubscribe, e-mail: java-user-unsubscr...@
o investigate the stemmers would that work? I
confess that I've never examined the output in detail, but
they might help.
I don't know of any synonym lists offhand, but then again I haven't
looked.
Best
er...@miminallyhelpful.com
On Mon, Jan 26, 2009 at 8:51 AM, Koji Sekiguchi wrot
Hello,
I have a requirement to search English words with taking into account
conjugation of verbs and comparative and superlative of adjectives.
I googled but couldn't find solution so far. Do I have to have a synonym
table
to solve this problem or is there someone who have good solution in this
l
1 - 100 of 179 matches
Mail list logo