A weighted OR, of course.
On 6 May 2024, at 12:43, Paul Libbrecht wrote:
Do I mistake or “ “ makes an OR if there’s no other?
On 6 May 2024, at 12:41, Saha, Rajib wrote:
Hi Experts,
As per the definition in
https://lucene.apache.org/core/2_9_4/queryparsersyntax.html
Do I mistake or “ “ makes an OR if there’s no other?
On 6 May 2024, at 12:41, Saha, Rajib wrote:
Hi Experts,
As per the definition in
https://lucene.apache.org/core/2_9_4/queryparsersyntax.html
'-' and 'NOT' in query string stands for same reason theoretically.
Isn’t that what Semantic-Vectors is doing?
E.g. https://github.com/Ontotext-AD/semanticvectors
Paul
On 30 Jan 2024, at 20:50, William Zhou wrote:
> Is there a way of directly executing an exact nearest neighbor search? It
> seems like the API provides some general functionality, and we can
Explain is a heavyweight thing. Maybe it helps you, maybe you need
something high-performance.
I was asking a similar question ~10 years ago and got a very interesting
answer on this list. If you want I can try to dig this to find it. At
the end, and with some limitation in the number of
Hello Philip,
I’ll answer with a possibility that might be outdated and predates the
existence of payloads (which I think are non-analysed parts so not
appropriate).
Lucene has fields and you can include the metadata within fields in form
of particular tokens. Then you can enrich every
Hello Trevor,
I don’t know of an analyzer for mixes of code and text but I know of
an analyser for mixes of code and formulæ.
Clearly, you could build a custom analyzer that would tokenize
differently depending on weather you’re in code or in text. That’s
no super hard.
However, where
Hello Koji,
how would you compare that to SemanticVectors?
paul
On 20 nov. 2014, at 10:10, Koji Sekiguchi k...@r.email.ne.jp wrote:
Hello,
It's my pleasure to share that I have an interesting tool word2vec for
Lucene
available at https://github.com/kojisekig/word2vec-lucene .
As you
('France') + vector('Italy') results in a
vector that is very
close to vector('Rome'), and vector('king') - vector('man') + vector('woman')
is close to
vector('queen')
Thanks,
Koji
(2014/11/20 20:01), Paul Libbrecht wrote:
Hello Koji,
how would you compare that to SemanticVectors
The project semanticvectors might be doing what you are looking for.
paul
On 11 nov. 2014, at 22:37, parnab kumar parnab.2...@gmail.com wrote:
hi,
While indexing the documents , store the Term Vectors for the content
field. Now for each document you will have an array of terms and their
My trick would be to replace .net with dotNet (or use some funky Unicode-letter
to replace the dot).
If you use consistently the same analyzer-chain, then it will match cleanly.
paul
On 6 nov. 2014, at 12:42, Rajendra Rao rajendra@launchship.com wrote:
I have some word which contain
two fields?
paul
On 19 sept. 2014, at 15:07, John Cecere john.cec...@oracle.com wrote:
Is there a way to set up Lucene so that both case-sensitive and
case-insensitive searches can be done without having to generate two indexes?
--
John Cecere
Principal Engineer - Oracle Corporation
Ashok,
I would look at solr which has an amount more field types to support more
queries.
E.g. there you have a nice query syntax for times-spans and fantastic caching.
I think there's very few initiatives for indexing logs and I would be
interested to see the results of your entreprise.
paul
Le 27 oct. 2012 à 11:43, Tom a écrit :
Aha! Exactly the problem! And only because the user-agent is one language,
doesn't mean all search terms will be!
For example, someone might type in the name of an English event (such as
Halloween) first, and then type in the name of their home town
My experience in the Lucene 1.x times were a factor of at least four in writing
to NFS and about two when reading from there. I'd discourage this as much as
possible!
(rsync is way more your friend for transporting and replication à la solr
should also be considered)
paul
Le 2 oct. 2012 à
anyone run into such trouble? Or is it strictly
just a performance issue?
/Jong
On Tue, Oct 2, 2012 at 5:17 AM, Paul Libbrecht p...@hoplahup.net wrote:
My experience in the Lucene 1.x times were a factor of at least four in
writing to NFS and about two when reading from there. I'd discourage
most sentences around Lucene what I searched out aren't compiled correctly.
wondering if we build our local mailing list...
Which language?
paul
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For
Nilesh,
the StandardAnalyzer is full of generally useful special cases, including
emails and numbers detection.
I am supposing you met one such special case which has a justification of some
sort.
I can't tell you why but I can tell it's really hard to change because others
rely on this
I would use a different field per language and use PerFieldAnalyzer indeed.
This is also important for queries whose language is not always clear.
paul
Le 9 févr. 2012 à 13:01, Vinaya Kumar Thimmappa a écrit :
Hello All,
I have a requirement of using different analyzer per document. How
Le 3 janv. 2012 à 13:56, heikki a écrit :
In our case, it is known in which language the user is searching (because
he tells us, and if he doesn't, we use the current GUI language).
On the web it is often hard to trust such (e.g. because of people working in
multiple languages, internet
Heikki,
it does solve your main concern: a term in lucene is a pair of a token and
field name.
The term frequency is, thus, the frequency of a token in a field.
So the term-frequency of text-stemmed-de:firewall is independent of the
term-frequency of text-stemmed-en:firewall (for example).
indexes, the relevance scoring is
more accurate.
Kind regards,
Heikki Doeleman
On Tue, Jan 3, 2012 at 3:29 PM, Paul Libbrecht p...@hoplahup.net wrote:
Heikki,
it does solve your main concern: a term in lucene is a pair of a token and
field name.
The term frequency is, thus
Michael,
from a physical point of view, it would seem like the order in which the
documents are read is very significant for the reading speed (feel the random
access jump as being the issue).
You could:
- move to ram-disk or ssd to make a difference?
- use something different than a searcher
hao,
this is a java question not a lucene question.
Here's a short answer:
Those options are to be fed to the java command.
Running on the command-line is where you could put them.
Running in IDEs there is generally such a feature ready, or the possibility to
connect to the socket address.
We've been using
http://www.tangentum.biz/en/products/phonetix/
which does double-metaphone.
Maybe that helps.
paul
Le 9 nov. 2011 à 11:29, Felipe Carvalho a écrit :
Using PerFieldAnalyzerWrapper seems to be working for what I need!
On indexing:
PerFieldAnalyzerWrapper
That uses Lucene 2.9.2 indeed.
paul
Le 9 nov. 2011 à 11:43, Felipe Carvalho a écrit :
Which version of Lucene are you using? I had tried it with Lucene 3.3 and
had some problems, did you have to do any customizations?
On Wed, Nov 9, 2011 at 8:38 AM, Paul Libbrecht p...@hoplahup.net wrote
Felipe,
I do not have a tutorial but what you are describing is what I have been doing
in ActiveMath.
I have a little paper for you if you want that explains how it goes there
(http://www.hoplahup.net/paul_pubs/AccessRetrievalAM.html) and the software is
open-source
Felipe,
in Lucene in Action there's a little bit on that.
Basically it's just about using the right analyzer.
paul
Le 8 nov. 2011 à 01:45, Felipe Carvalho a écrit :
Hello,
I'm using Lucene 3.2 on a phone book app and phonetic search is a
requirement. I've googled up lucene phonetic search
Raf,
I always do this: query expansion.
Take the Lucene QueryParser, default field default, default analyzer
whitespace analyzer... feed the query in.
You typically get a BooleanQuery which you can now process to perform the query
expansion.
For example I replace all termQueries by a boolean
Zarrinkalam,
have a look at semanticvectors.
paul
Le 29 août 2011 à 15:55, zarrinkalam a écrit :
hi,
I want to use LSI for clustring ducuments indexed with lucene, I dont know
how, plz help me
thanks,
-
To
I think we're getting out of topic about Lucene usage for SSDs but I fully
acknowledge that below mail: SSDs are faster than normal disk for development.
Actually, one of the things that got real faster with the SSD is IntelliJ
indexing and reboot; I could not tell if it is using Lucene sadly.
Funnily, I had such an experience: an SSD on the laptop of the brand SanDisk,
guaranteed for 80 TB of writes.
Well, I had it twice changed under guarantee. Then the shop provided me an OCZ.
Maybe that lasts longer... I'm still in guarantee.
paul
Le 23 août 2011 à 17:11, Toke Eskildsen a écrit
Sorry Toke, I do not know.
The service shop replaced it fairly blindly.
paul
Le 23 août 2011 à 20:46, Toke Eskildsen a écrit :
On Tue, 2011-08-23 at 17:20 +0200, Paul Libbrecht wrote:
Funnily, I had such an experience: an SSD on the laptop of the brand
SanDisk, guaranteed for 80 TB
Diego,
The semanticvectors project has a mailing list and his author, Dominic Widdows,
is responding actively there.
paul
Le 24 mai 2011 à 02:34, Diego Cavalcanti a écrit :
Sorry, I thought the blog was yours! I will read the post and see if it
helps me. Thank you!
About the Semantic
Richard,
in SOLR at least there's an analyzer that avoids duplicates.
I think that would solve it.
There's also somewhere the option to ignore IDF (in similarity? in solrconfig?).
paul
Le 18 mai 2011 à 21:30, Rich Heimann a écrit :
Hello all,
This is my first time on the list and my first
Le 6 mai 2011 à 00:20, Otis Gospodnetic a écrit :
thus far, only search-testing has provided some analytics measures for us
(precision and recall ones). We, of course, construct the test-suites from
the
logs.
Interesting. It sounds like you don't currently utilize any sort of
Patrick if the question is about the code snippert at the page you mention,
which I copy below, I believe the answer is no and the author is aware of it
since he is adding a comment about not-normalized in the second example.
ScoreDocs and TopDocs are not returning normalized scores.
Normalized
java -Dfile.encoding=utf-8
should do the trick.
Or... which java app are you using?
paul
Le 28 mars 2011 à 09:03, Patrick Diviacco a écrit :
When I run my Lucene app and a parse a xml file I get the following error
due to some fonts such as é written in the text file.
If I save the text
Stephane,
I think that you have the freedom to put what you want in the stored value of a
field.
The simplest would even be to make it that the fields that you want to use for
display are stored, preformatted, xml-ished, owl-ified, or json-ized, to be
separate from the indexed fields (where
Erm,
google DIH SOLR
or
http://wiki.apache.org/solr/DataImportHandler
paul
Le 10 mars 2011 à 14:37, karl.wri...@nokia.com a écrit :
Karl,
can you give, in one paragraph, the difference between ManifoldCF and DIH?
thanks in advance
paul
I am unfamiliar with DIH as an acronym
David,
I'm sure that if you request something more precise you might get enthusiasts
over here easily. I heard several committers of Lucene have gone into
LucidImagination and they offer paid services specialized for Lucene.
hope it helps.
paul
Le 3 mars 2011 à 21:13, Jarrin, David a écrit
, then that defaults
to IW.close(true) which means wait for all BG merges to finish.
So normally IW.close() reserves the right to take a long time.
But IW.close(false) should finish relatively quickly...
Mike
On Fri, Jan 21, 2011 at 9:20 AM, Paul Libbrecht p...@hoplahup.net wrote:
Would that happen
Would that happen automagically at finalization?
paul
Le 21 janv. 2011 à 15:13, Michael McCandless a écrit :
If you call optimize(false), that'll return immediately but run the
optimize in the background (assuming you are using the default
ConcurrentMergeScheduler).
Later, when it's time
Hello list,
I am hitting a stupid bug where a unit test shows me that QueryParser analyzes
fierciely anything it finds hence... I have to tune the analyzer to not
decompose the terms with fields that should be non-analyzed.
For indexing, you can choose to have something not_analyzed.
For
Isn't this approach somewhat bad for term-frequency?
Words that would appear in several languages would be a lot more frequent
(hence less significative).
I'm still preferring the split-field method with a proper query expansion.
This way, the term-frequency is evaluated on the corpus of one
Grant Ingersoll gsing...@apache.org wrote:
Where do you get your Lucene/Solr downloads from?
[x] ASF Mirrors (linked in our release announcements or via the Lucene
website)
[X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[X] I/we build them from source via an
exceeds 3-4 languages, I know of some
that handle 10. If you're careful enough, it just works.
Hope this helps.
Shai
On Wed, Jan 19, 2011 at 9:44 AM, Paul Libbrecht p...@hoplahup.net wrote:
But for this, you need a skillfully designed:
- set of fields
- multiplexing analyzer
- query
So you are only indexing analyzed and querying analyzed. Is that correct?
Wouldn't it be better to prefer precise matches (a field that is analyzed with
StandardAnalyzer for example) but also allow matches are stemmed.
paul
Le 19 janv. 2011 à 19:21, Bill Janssen a écrit :
Clemens Wyss
I think you should use a TermQuery.
paul
Le 19 janv. 2011 à 20:03, Yuhan Zhang a écrit :
Hi all,
I am trying to use
*IndexSearcherhttp://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/IndexSearcher.html#IndexSearcher%28org.apache.lucene.store.Directory%29
* to retrieve a
Le 19 janv. 2011 à 20:56, Bill Janssen a écrit :
Paul Libbrecht p...@hoplahup.net wrote:
So you are only indexing analyzed and querying analyzed. Is that correct?
Yes, that's correct. I fall back to StandardAnalyzer if no
language-specific analyzer is available.
Wouldn't
But for this, you need a skillfully designed:
- set of fields
- multiplexing analyzer
- query expansion
In one of my projects, we do not split language by fields and it's a pain...
I'm having recurring issues in one sense or the other.
- the die example that Oti s mentioned is a good one:
Hello list,
has anyone built a log-analyzer based on Lucene?
Our logs are so big that grep takes more hours to do what I want it to do.
I'm sure Lucene would solve it.
Thanks in advance
paul
-
To unsubscribe, e-mail:
Somehow, I had the impression that the TrebleCLEF and EuroMatrix european
projects are meant to gather this kind of information sources.
But honestly, it's not as homogeneous as in OpenOffice.
Mozilla also has dictionaries.
Wiktionary can also be helpful.
paul
Le 7 janv. 2011 à 22:26, Robert
Hello list,
is it a good or bad thing to open to index-searchers on FSDirectories of the
same path?
(namely, one short-lived, one long-lived).
thanks in advance
paul
-
To unsubscribe, e-mail:
I'm more and more involved into preparing dedicated pages that list resources
of our servers according to an elaborate query I received in a human
description and implement as a query-parser query. Doing this I regularly use
indexed-doc views.
The implementation is thus a query that could be
I also not that this is a fundamental characteristic of the great performance
of Lucene and its related products since it allows cleanly managed resources.
this is generally called paging.
paul
Le 28 déc. 2010 à 10:32, Uwe Schindler a écrit :
The TopDocs returning methods are not intended
Allow me to recommend a little trick to track the origin of a class which works
often:
org.apache.lucene.analysis.WhitespaceAnalyzer.class.getResource(WhitespaceAnalyzer.class)
will give you a URL that should be the URL of the jar, followed by an
exclamation mark, followed by the
hello list,
more and more I seem to encounter situations where the delivery of a prebuilt
lucene index is desirable.
The binary format probably works (experience hints would be welcome) but I fear
it would be fragile with versioning (it certainly fails at version-downgrading).
Did anyone work
Mahmoud,
Lucene's documents' fields can be, when stored, compressed on disk. I think
that answers your question.
paul
On 17 oct. 2010, at 09:16, Mahmoud Abdelkader wrote:
Hello,
We're currently evaluating utilizing Lucene to index a large English corpus
and we were are optimizing for
ping!
Any hope for help here?
I'm a bit stuck before deploying a release.
thanks in advance
paul
On 3 sept. 2010, at 14:05, Paul Libbrecht wrote:
Hello list,
I'm strugging again with the highlighter. I don't understand why I obtain
sporadically InvalidTokenOffsetsException
Hello list,
I'm strugging again with the highlighter. I don't understand why I obtain
sporadically InvalidTokenOffsetsException.
The mission: given a query, detect which field was matched, among the names of
the concepts: there can be several names for a given concept, also in one
language.
Le 26-juil.-10 à 16:01, Michael McCandless a écrit :
You can make a custom Collector? Ie, it'd just increment a counter
for each hit.
As long as it does not call the Scorer.score() method then no
scoring is done.
I've done that.
Code below.
It feels a bit stupid to have to do that
Le 13-juil.-10 à 23:49, Christopher Condit a écrit :
* are there performance optimizations that I haven't thought of?
The first and most important one I'd think of is get rid of NFS.
You can happily do a local copy which might, even for 10 Gb take less
than 30 seconds at server start.
Le 12-mai-10 à 10:55, mark harwood a écrit :
two terminology questions:
- is multiplier in the mail mentioned there the same as boost?
This factor controls how many decimal places precision is retained
in the adjusted scores. Pick to low a multiplier and scores that are
only
don't know what to do for b).
thanks for hints.
paul
Le 31-mars-10 à 23:00, Paul Libbrecht a écrit :
I've been wandering around but I see no solution yet: I would like
to intersect two query results: going through the list of one query
and indicating which ones actually match the other query
intended to use prefix and fuzzyqueries. I believe this is
contradictory to this or?
paul
Le 11-mai-10 à 12:02, mark harwood a écrit :
See https://issues.apache.org/jira/browse/LUCENE-1999
- Original Message
From: Paul Libbrecht p...@activemath.org
To: java-user@lucene.apache.org
Hello Luceners,
I am sure I'm not the only one having such a snippet in my dedicated
analyzer:
m.put(en, new SnowballAnalyzer(English));
m.put(es, new SnowballAnalyzer(Spanish));
m.put(de, new SnowballAnalyzer(German));
m.put(dk, new
Le 01-avr.-10 à 16:29, henrib a écrit :
By issuing multiple queries, one against each localized index,
results being
clustered by locale.
You can further refine by translating the end-user input query terms
for
each locale and issue translated queries against the respective
indices.
I've
How?
paul
Le 01-avr.-10 à 14:19, henrib a écrit :
Finally, query expansion can also be used in the multiple indices
case and
might even use automated/guided translation.
-
To unsubscribe, e-mail:
David,
I'm doing exactly that.
And I think there's one crucial advantage aside: multilingual queries:
if your user requests segment you have no way to know which language
he is searching for; erm, well, you have the user-language(s) (through
the browser Accept-Language header for example)
Hello list,
I've been wandering around but I see no solution yet: I would like to
intersect two query results: going through the list of one query and
indicating which ones actually match the other query or, even better,
indicating that passed this, nothing matches that query anymore.
I would wish a highlighting feature that's fully integrated.
paul
On 24-févr.-10, at 14:42, Grant Ingersoll wrote:
What would it be?
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands,
On 16-févr.-10, at 17:40, luciusvorenus wrote:
how can I build a webinterface for my aplication ? I read
something with
HTML table and php but i had no idea?
Can anobody help me?
Lucius,
try solr.
paul
-
To unsubscribe,
Hello luceners,
In our project, we are building queries from long list of possible
terms (expanded through ontology deduction). I would like, however,
that the rank is unaffected by the number of matches: one or thirty
occurrences of one of the many words should give the same score.
Did
Hello list,
for some strange reason I wish to cache very frequent (and big, ~3000
terms) queries.
Now, this might mean that a query is searched for in several threads
on the same index. Do I run a risk?
thanks in advance
paul
Zhou,
Lucene is a back-end library, it's very useful for developer but it is
not a complete site-search-engine.
A lucene-based site-search-engine is Nutch, it does crawl.
Solr also provides functions close to these with a large amount of
thoughts on flexible integration; crawling methods
Because I like to have Luke always sitting at hand, I have packed this
release as a MacOSX disk-image and applcation.
http://www.activemath.org/~paul/tmp/Luke-0.9.9.dmg
The icon could be better (I need a hires of Lucene's icon, haven't
found it yet).
Potentially the packaging
Can the dictionary have weights?
überwachungsgesetz alone probably needs a higher rank than überwachung
and gesetzt or?
paul
Le 21-oct.-09 à 21:09, Benjamin Douglas a écrit :
OK, that makes sense. So I just need to add all of the sub-compounds
that are real words at posIncr=0, even if
, maxDocs=1)
1.6294457 = queryNorm
0.15342641 = (MATCH) fieldWeight(field:gesetz in 0), product of:
1.0 = tf(termFreq(field:gesetz)=1)
0.30685282 = idf(docFreq=1, maxDocs=1)
0.5 = fieldNorm(field=field, doc=0)
On Wed, Oct 21, 2009 at 3:16 PM, Paul Libbrecht
p
Not something for the very soon future, but I'd be interested to base
on such an infrastructure for a mathematical-formulæ search corpus
(both semantic and presentation math).
I believe the OpenRelevance infrastructure might present a best
practice or infrastructure to be based on for
Mehdi,
your requirements sound to be fulfilled mostly by Apache Solr which is
a web-based packaging of Lucene.
paul.
Le 08-oct.-09 à 10:11, Mehdi Ben Hamida a écrit :
Hello,
I'm reviewing and doing some researches on Lucene Java 2.9.0, to
check if it
meets our needs.
Hello list,
I will need to use phonetic analyzers to do phonetic search. I know
of the Metaphone analyzers and use them but they're really only known
to work for English.
Does anyone have pointers to projects that encode phonetically words
of other languages?
I'm interested to French,
Le 23-août-09 à 17:05, Petite Abeille a écrit :
I will need to use phonetic analyzers to do phonetic search. I
know of the Metaphone analyzers and use them but they're really
only known to work for English.
Double Metaphone?
http://en.wikipedia.org/wiki/Double_Metaphone
thanks,
I hadn't
Le 25-juin-09 à 01:28, Mark Miller a écrit :
im figgering about the following problem. in my index i cant find
the word BE, but it exists in two documents. im usinglucene 2.4
with the standardanalyzer.
other querys with words like de, et or de la works good. any ideas?
be is a stopword. Do
Le 08-juin-09 à 23:55, Ian Vink a écrit :
Is there a Mac port of the Lucene engine?
I don't get it, are you asking whether Lucene java works on MacOSX?
answer is yes.
Are you asking for a Cocoa and ObjC port? (don't know)
paul
smime.p7s
Description: S/MIME cryptographic signature
Kumar,
you'll have to make your own documents with after parsing yourself the
HTML (e.g. with Nekohtml to dom).
As for the weights of tokens, supplementarily to IDF, you can do that
per field, i.e. when you add a field into the document.
paul
Le 28-mai-09 à 12:22, Gaurav Kumar a écrit :
Various servlets or various webapps?
Various servlets is trivial, indeed using ServletContext.getAttribute().
Various webapps is more difficult:
- you need to set cross context so that context.getContext(/
otherpath) is accessible (a config of context in tomcat)
- you need classes to be shared
daniel,
have a look at solr DIH, it has prebuilt tools to do just that.
http://wiki.apache.org/solr/DataImportHandler
This bases on solr which is a web-application that bases on lucene.
It does not need imperatively to be run as a web application though,
it can be embedded.
paul
Le
I am sorry Nittin, I may have injected you the doubt about this...
semantic-vectors is a project based on Lucene:
http://code.google.com/p/semanticvectors/
you probably want to look there and ask questions on the forum there.
paul
Le 06-avr.-09 à 22:45, Richard Marr a écrit :
Hi Nitin,
there's TextFragment(stringbuffer) and the
pass through the tokenizers but removing any of them breaks my unit-
test. I guess this is the whole idead behind LUCENE-1522 which I would
up-take later.
paul
Le 23-mars-09 à 11:35, Paul Libbrecht a écrit :
Thanks Erick,
I browsed but no full answer
...
On Sun, Mar 22, 2009 at 4:30 PM, Paul Libbrecht
p...@activemath.org wrote:
in an auto-completion task, I would like to show to the user the
field
that's been matched against the query in the found document.
Typically, my documents have multiple fields for each field-name
and I
would like
searcher.explain definitely seems to do the trick, going through the
sub-queries.
paul
Le 23-mars-09 à 13:12, Wouter Heijke a écrit :
I want to know for each term in a query if it matched the result or
not.
What is the best way to implement this?
Highlighter seems to be able to do the
Hello list,
in an auto-completion task, I would like to show to the user the field
that's been matched against the query in the found document.
Typically, my documents have multiple fields for each field-name and I
would like the index's findings to give me the field used. How can I
do
Hello luceners,
query.toString() does a fair job at being reparsed by QueryParser but
is there a safe way to do so?
I have a lucene query object and want a string that QueryParser will
reparse fairly exacty.
thanks in advance
paul
smime.p7s
Description: S/MIME cryptographic signature
Nitin,
LSI is patented so it's not been a flurry of implementation attempts.
However, SemanticVectors is a library that does similar approaches to
LSA/LSI for indexing and is based on Lucene's term-vectors.
paul
Le 18-mars-09 à 07:09, nitin gopi a écrit :
hi all , has any body tried to
Hello Luceners,
what is the official pom.xml fragment to be used for the contribs
package of lucene?
It seems to be only of type pom inside the maven repository... does it
mean that I have to fetch sub-contribs ?
paul
smime.p7s
Description: S/MIME cryptographic signature
-09 à 00:03, Daniel Noll a écrit :
Paul Libbrecht wrote:
Hello fellows of Lucene,
I just discovered that the _ character is a word separator in the
StandardAnalyzer.
Can it be?
It broke our usage of a field that stores a comma-separated list of
uri-fragments
If I were analysing a URI, I
Hello fellows of Lucene,
I just discovered that the _ character is a word separator in the
StandardAnalyzer.
Can it be?
It broke our usage of a field that stores a comma-separated list of
uri-fragments which, of course, contain _: the standard-analyzer
splits these as separate term which
We have a suggestion engine and we only auto-complete from 3
characters (or a number).
http://draft.i2geo.net/SearchI2G/skills-text-box-editor.jsp?language=en
What would be nice for your case and maybe for ours is that this
expansion done in PrefixQuery is made more explicit so that one
(sorry to respond to myself)
Le 15-janv.-09 à 08:13, Paul Libbrecht a écrit :
We have a suggestion engine and we only auto-complete from 3
characters (or a number).
http://draft.i2geo.net/SearchI2G/skills-text-box-editor.jsp?language=en
What would be nice for your case and maybe for ours
Shouldn't your analyzer also convert Rochelle Rochelle to Rochelle ?
paul
Le 25-déc.-08 à 14:20, Israel Tsadok a écrit :
A recurring problem I have with Lucene results is when a document
contains
the same word over and over again. If for some reason I have a
document
containing badger
1 - 100 of 138 matches
Mail list logo