from:"Paul Libbrecht"

Re: Difference between '-' and 'NOT' in Lucene Query.

2024-05-06 Thread Paul Libbrecht

A weighted OR, of course. On 6 May 2024, at 12:43, Paul Libbrecht wrote: Do I mistake or “ “ makes an OR if there’s no other? On 6 May 2024, at 12:41, Saha, Rajib wrote: Hi Experts, As per the definition in https://lucene.apache.org/core/2_9_4/queryparsersyntax.html '-' an

Re: Difference between '-' and 'NOT' in Lucene Query.

2024-05-06 Thread Paul Libbrecht

Do I mistake or “ “ makes an OR if there’s no other? On 6 May 2024, at 12:41, Saha, Rajib wrote: Hi Experts, As per the definition in https://lucene.apache.org/core/2_9_4/queryparsersyntax.html '-' and 'NOT' in query string stands for same reason theoretically. [cid:image001.png@01DA9FCF.

Re: Exact KNN

2024-01-30 Thread Paul Libbrecht

Isn’t that what Semantic-Vectors is doing? E.g. https://github.com/Ontotext-AD/semanticvectors Paul On 30 Jan 2024, at 20:50, William Zhou wrote: > Is there a way of directly executing an exact nearest neighbor search? It > seems like the API provides some general functionality, and we can force

Re: Search results/criteria validation

2021-03-17 Thread Paul Libbrecht

Explain is a heavyweight thing. Maybe it helps you, maybe you need something high-performance. I was asking a similar question ~10 years ago and got a very interesting answer on this list. If you want I can try to dig this to find it. At the end, and with some limitation in the number of queri

Re: Document metadata in ranking?

2021-02-25 Thread Paul Libbrecht

Hello Philip, I’ll answer with a possibility that might be outdated and predates the existence of payloads (which I think are non-analysed parts so not appropriate). Lucene has fields and you can include the metadata within fields in form of particular tokens. Then you can enrich every query

Re: Using Lucene for technical documentation

2020-11-23 Thread Paul Libbrecht

Hello Trevor, I don’t know of an analyzer for mixes of code and text but I know of an analyser for mixes of code and formulæ. Clearly, you could build a custom analyzer that would tokenize differently depending on weather you’re in code or in text. That’s no super hard. However, where thin

Re: [ANN] word2vec for Lucene

2014-11-20 Thread Paul Libbrecht

t; operations vector('Paris') - vector('France') + vector('Italy') results in a > vector that is very > close to vector('Rome'), and vector('king') - vector('man') + vector('woman') > is close to > vector('

Re: [ANN] word2vec for Lucene

2014-11-20 Thread Paul Libbrecht

Hello Koji, how would you compare that to SemanticVectors? paul On 20 nov. 2014, at 10:10, Koji Sekiguchi wrote: > Hello, > > It's my pleasure to share that I have an interesting tool "word2vec for > Lucene" > available at https://github.com/kojisekig/word2vec-lucene . > > As you can imagin

Re: Document Term matrix

2014-11-11 Thread Paul Libbrecht

The project semanticvectors might be doing what you are looking for. paul On 11 nov. 2014, at 22:37, parnab kumar wrote: > hi, > > While indexing the documents , store the Term Vectors for the content > field. Now for each document you will have an array of terms and their > corresponding fre

Re: how to ignore full stop for specific word

2014-11-06 Thread Paul Libbrecht

My trick would be to replace .net with dotNet (or use some funky Unicode-letter to replace the dot). If you use consistently the same analyzer-chain, then it will match cleanly. paul On 6 nov. 2014, at 12:42, Rajendra Rao wrote: > I have some word which contain full stop (.) itself ,examp

Re: Case sensitivity

2014-09-19 Thread Paul Libbrecht

two fields? paul On 19 sept. 2014, at 15:07, John Cecere wrote: > Is there a way to set up Lucene so that both case-sensitive and > case-insensitive searches can be done without having to generate two indexes? > > -- > John Cecere > Principal Engineer - Oracle Corporation > 732-987-4317 / j

Re: Lucene for Log file indexing and search

2013-09-19 Thread Paul Libbrecht

Ashok, I would look at solr which has an amount more field types to support more queries. E.g. there you have a nice query syntax for times-spans and fantastic caching. I think there's very few initiatives for indexing logs and I would be interested to see the results of your entreprise. paul

Re: international stop set?

2012-10-27 Thread Paul Libbrecht

Le 27 oct. 2012 à 11:43, Tom a écrit : > Aha! Exactly the problem! And only because the user-agent is one language, > doesn't mean all search terms will be! > For example, someone might type in the name of an English event (such as > Halloween) first, and then type in the name of their home town

Re: Lucene index on NFS

2012-10-02 Thread Paul Libbrecht

delete semantics, > etc.) when using NFS? Has anyone run into such trouble? Or is it strictly > just a performance issue? > > /Jong > On Tue, Oct 2, 2012 at 5:17 AM, Paul Libbrecht wrote: > >> My experience in the Lucene 1.x times were a factor of at least four in >&g

Re: Lucene index on NFS

2012-10-02 Thread Paul Libbrecht

My experience in the Lucene 1.x times were a factor of at least four in writing to NFS and about two when reading from there. I'd discourage this as much as possible! (rsync is way more your friend for transporting and replication à la solr should also be considered) paul Le 2 oct. 2012 à 11

Re: let's use our native language

2012-09-14 Thread Paul Libbrecht

> most sentences around Lucene what I searched out aren't compiled correctly. > wondering if we build our local mailing list... Which language? paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For addition

Re: Lucene tokenization

2012-03-27 Thread Paul Libbrecht

Nilesh, the StandardAnalyzer is full of generally useful special cases, including emails and numbers detection. I am supposing you met one such special case which has a justification of some sort. I can't tell you why but I can tell it's really hard to change because others rely on this somehow

Re: analyzer per document

2012-02-09 Thread Paul Libbrecht

I would use a different field per language and use PerFieldAnalyzer indeed. This is also important for queries whose language is not always clear. paul Le 9 févr. 2012 à 13:01, Vinaya Kumar Thimmappa a écrit : > Hello All, > > I have a requirement of using different analyzer per document. How

Re: Designing a multilingual index

2012-01-03 Thread Paul Libbrecht

that if different domains use separate indexes, the relevance scoring is > more accurate. > > > Kind regards, > Heikki Doeleman > > > > > On Tue, Jan 3, 2012 at 3:29 PM, Paul Libbrecht wrote: > >> Heikki, >> >> it does solve your main concern: a

Re: Designing a multilingual index

2012-01-03 Thread Paul Libbrecht

Heikki, it does solve your main concern: a term in lucene is a pair of a token and field name. The term frequency is, thus, the frequency of a token in a field. So the term-frequency of text-stemmed-de:firewall is independent of the term-frequency of text-stemmed-en:firewall (for example). But

Re: Designing a multilingual index

2012-01-03 Thread Paul Libbrecht

Le 3 janv. 2012 à 13:56, heikki a écrit : > In our case, it is "known" in which language the user is searching (because > he tells us, and if he doesn't, we use the current GUI language). On the web it is often hard to trust such (e.g. because of people working in multiple languages, internet c

Re: Retrieving large numbers of documents from several disks in parallel

2011-12-21 Thread Paul Libbrecht

Michael, from a physical point of view, it would seem like the order in which the documents are read is very significant for the reading speed (feel the random access jump as being the issue). You could: - move to ram-disk or ssd to make a difference? - use something different than a searcher w

Re: how to do remote debug on benchmark test or whatever test?

2011-12-09 Thread Paul Libbrecht

hao, this is a java question not a lucene question. Here's a short answer: Those options are to be fed to the java command. Running on the command-line is where you could put them. Running in IDEs there is generally such a feature ready, or the possibility to connect to the socket address. pau

Re: Phonetic search with Lucene 3.2

2011-11-09 Thread Paul Libbrecht

That uses Lucene 2.9.2 indeed. paul Le 9 nov. 2011 à 11:43, Felipe Carvalho a écrit : > Which version of Lucene are you using? I had tried it with Lucene 3.3 and > had some problems, did you have to do any customizations? > > On Wed, Nov 9, 2011 at 8:38 AM, Paul Libbrecht wrote:

Re: Phonetic search with Lucene 3.2

2011-11-09 Thread Paul Libbrecht

We've been using http://www.tangentum.biz/en/products/phonetix/ which does double-metaphone. Maybe that helps. paul Le 9 nov. 2011 à 11:29, Felipe Carvalho a écrit : > Using PerFieldAnalyzerWrapper seems to be working for what I need! > > On indexing: > >PerFieldAnalyzerWrappe

Re: Phonetic search with Lucene 3.2

2011-11-08 Thread Paul Libbrecht

Felipe, I do not have a tutorial but what you are describing is what I have been doing in ActiveMath. I have a little paper for you if you want that explains how it goes there (http://www.hoplahup.net/paul_pubs/AccessRetrievalAM.html) and the software is open-source (http://www.activemath.org

Re: Phonetic search with Lucene 3.2

2011-11-07 Thread Paul Libbrecht

Felipe, in Lucene in Action there's a little bit on that. Basically it's just about using the right analyzer. paul Le 8 nov. 2011 à 01:45, Felipe Carvalho a écrit : > Hello, > I'm using Lucene 3.2 on a phone book app and phonetic search is a > requirement. I've googled up "lucene phonetic sea

Re: What's the best way to translate a query in multiple languages?

2011-11-02 Thread Paul Libbrecht

Raf, I always do this: query expansion. Take the Lucene QueryParser, default field "default", default analyzer whitespace analyzer... feed the query in. You typically get a BooleanQuery which you can now process to perform the query expansion. For example I replace all termQueries by a boolean

Re: Bet you didn't know Lucene can...

2011-10-22 Thread Paul Libbrecht

Grant, for years the ActiveMath learning environment has been using as storage engine. At the time (~2004), it was by far the best storage engine ever doable in a pure java-world. Now it still is perfect in terms of performance. We had an issue with the separate versions where the stored-fields w

Re: LSI

2011-08-29 Thread Paul Libbrecht

Zarrinkalam, have a look at semanticvectors. paul Le 29 août 2011 à 15:55, zarrinkalam a écrit : > hi, > > I want to use LSI for clustring ducuments indexed with lucene, I dont know > how, plz help me > > thanks, - To uns

Re: SSD Experience (on developer machine)

2011-08-23 Thread Paul Libbrecht

Sorry Toke, I do not know. The service shop replaced it fairly blindly. paul Le 23 août 2011 à 20:46, Toke Eskildsen a écrit : > On Tue, 2011-08-23 at 17:20 +0200, Paul Libbrecht wrote: >> Funnily, I had such an experience: an SSD on the laptop of the brand >> SanDisk, guarante

Re: SSD Experience (on developer machine)

2011-08-23 Thread Paul Libbrecht

Funnily, I had such an experience: an SSD on the laptop of the brand SanDisk, guaranteed for 80 TB of writes. Well, I had it twice changed under guarantee. Then the shop provided me an OCZ. Maybe that lasts longer... I'm still in guarantee. paul Le 23 août 2011 à 17:11, Toke Eskildsen a écrit :

Re: SSD Experience

2011-08-23 Thread Paul Libbrecht

I think we're getting out of topic about Lucene usage for SSDs but I fully acknowledge that below mail: SSDs are faster than normal disk for development. Actually, one of the things that got real faster with the SSD is IntelliJ indexing and reboot; I could not tell if it is using Lucene sadly. I

Re: Semantic indexing in Lucene

2011-05-23 Thread Paul Libbrecht

Diego, The semanticvectors project has a mailing list and his author, Dominic Widdows, is responding actively there. paul Le 24 mai 2011 à 02:34, Diego Cavalcanti a écrit : > Sorry, I thought the blog was yours! I will read the post and see if it > helps me. Thank you! > > About the Semantic

Re: Please help me with a basic question...

2011-05-18 Thread Paul Libbrecht

Richard, in SOLR at least there's an analyzer that avoids duplicates. I think that would solve it. There's also somewhere the option to ignore IDF (in similarity? in solrconfig?). paul Le 18 mai 2011 à 21:30, Rich Heimann a écrit : > Hello all, > > This is my first time on the list and my fir

Re: Thoughts on Search Analytics?

2011-05-06 Thread Paul Libbrecht

Le 6 mai 2011 à 00:20, Otis Gospodnetic a écrit : >> thus far, only search-testing has provided some analytics measures for us >> (precision and recall ones). We, of course, construct the test-suites from >> the >> logs. > > Interesting. It sounds like you don't currently utilize any sort o

Re: Are Okapi BM25 scores normalized into 0 and 1 ?

2011-04-29 Thread Paul Libbrecht

Patrick if the question is about the code snippert at the page you mention, which I copy below, I believe the answer is no and the author is aware of it since he is adding a comment about not-normalized in the second example. ScoreDocs and TopDocs are not returning normalized scores. Normalized

Re: file formats: MacRoman and UTF-8...

2011-03-28 Thread Paul Libbrecht

java -Dfile.encoding=utf-8 should do the trick. Or... which java app are you using? paul Le 28 mars 2011 à 09:03, Patrick Diviacco a écrit : > When I run my Lucene app and a parse a xml file I get the following error > due to some fonts such as "é" written in the text file. > > If I save the

Re: lucene double metaphone ranking.

2011-03-14 Thread Paul Libbrecht

Merlin, the kind of magic such as "prefer an exact match" still has to be programmed. Searching in a field with double-metaphone analyzer will only compare tokens by their double-metaphone-results. You probably want query expansion: text:picasso to be expanded to: text:picasso^3.0 text.stemm

Re: Indexing of multilingual labels

2011-03-14 Thread Paul Libbrecht

Stephane, I think that you have the freedom to put what you want in the stored value of a field. The simplest would even be to make it that the fields that you want to use for display are stored, preformatted, xml-ished, owl-ified, or json-ized, to be separate from the indexed fields (where yo

Re: ManifoldCF in Action

2011-03-10 Thread Paul Libbrecht

Erm, google DIH SOLR or http://wiki.apache.org/solr/DataImportHandler paul Le 10 mars 2011 à 14:37, karl.wri...@nokia.com a écrit : >>> > Karl, > > can you give, in one paragraph, the difference between ManifoldCF and DIH? > > thanks in advance > > paul > << > > I am unfamiliar

Re: Lucene paid support

2011-03-03 Thread Paul Libbrecht

David, I'm sure that if you request something more precise you might get enthusiasts over here easily. I heard several committers of Lucene have gone into LucidImagination and they offer paid services specialized for Lucene. hope it helps. paul Le 3 mars 2011 à 21:13, Jarrin, David a écrit :

Re: ManifoldCF in Action

2011-03-01 Thread Paul Libbrecht

Karl, can you give, in one paragraph, the difference between ManifoldCF and DIH? thanks in advance paul Le 1 mars 2011 à 23:23, karl.wri...@nokia.com a écrit : > Dear Lucene/Solr user, > > It is possible you may not know of an Apache project called ManifoldCF, whose > purpose is to provide

Re: gracefully interrupting an optimize

2011-01-26 Thread Paul Libbrecht

ucene.apache.org >> >> >> >> To >> java-user@lucene.apache.org >> cc >> >> Subject >> Re: gracefully interrupting an optimize >> >> >> >> >> >> >> No. >> >> If you just do IW.close() &l

Re: gracefully interrupting an optimize

2011-01-21 Thread Paul Libbrecht

Would that happen "automagically" at finalization? paul Le 21 janv. 2011 à 15:13, Michael McCandless a écrit : > If you call optimize(false), that'll return immediately but run the > optimize "in the background" (assuming you are using the default > ConcurrentMergeScheduler). > > Later, when i

Re: Best practices for multiple languages?

2011-01-20 Thread Paul Libbrecht

ockenizer, filters and stemmer. > > With this solution : > > 1. I only need one field (or two if I want both stemmed and unstemmed > processing) > 2. The user can search in all document regarless to there language > > I hope this help. > > Dominique > www.zo

NOT_ANALYZED... should be an analyzer

2011-01-20 Thread Paul Libbrecht

Hello list, I am hitting a stupid bug where a unit test shows me that QueryParser analyzes fierciely anything it finds hence... I have to tune the analyzer to not decompose the terms with fields that should be non-analyzed. For indexing, you can choose to have something not_analyzed. For query

Re: AW: Best practices for multiple languages?

2011-01-19 Thread Paul Libbrecht

Le 19 janv. 2011 à 20:56, Bill Janssen a écrit : > Paul Libbrecht wrote: > >> So you are only indexing "analyzed" and querying "analyzed". Is that correct? > > Yes, that's correct. I fall back to StandardAnalyzer if no > language-specific analy

Re: search on a field that is NOT_ANALYZED

2011-01-19 Thread Paul Libbrecht

I think you should use a TermQuery. paul Le 19 janv. 2011 à 20:03, Yuhan Zhang a écrit : > Hi all, > > I am trying to use > *IndexSearcher > * to retri

Re: AW: Best practices for multiple languages?

2011-01-19 Thread Paul Libbrecht

So you are only indexing "analyzed" and querying "analyzed". Is that correct? Wouldn't it be better to prefer precise matches (a field that is analyzed with StandardAnalyzer for example) but also allow matches are stemmed. paul Le 19 janv. 2011 à 19:21, Bill Janssen a écrit : > Clemens Wyss w

Re: Best practices for multiple languages?

2011-01-19 Thread Paul Libbrecht

sily solvable by indexing >> for each document a "language" field and use it as a Filter during the >> search. You can cache that Filter so that its posting list isn't traversed >> for every query but instead only once. >> >> We use the second approach an

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-19 Thread Paul Libbrecht

Grant Ingersoll wrote: > Where do you get your Lucene/Solr downloads from? > > [x] ASF Mirrors (linked in our release announcements or via the Lucene > website) > > [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) > > [X] I/we build them from source via an SVN/Git checkout.

Re: Best practices for multiple languages?

2011-01-18 Thread Paul Libbrecht

But for this, you need a skillfully designed: - set of fields - multiplexing analyzer - query expansion In one of my projects, we do not split language by fields and it's a pain... I'm having recurring issues in one sense or the other. - the "die" example that Oti s mentioned is a good one: stop-

lucene-based log searcher?

2011-01-13 Thread Paul Libbrecht

Hello list, has anyone built a log-analyzer based on Lucene? Our logs are so big that grep takes more hours to do what I want it to do. I'm sure Lucene would solve it. Thanks in advance paul - To unsubscribe, e-mail: java-user-

Re: Where to find non-English dictionaries, thesaurus, synonyms

2011-01-07 Thread Paul Libbrecht

Somehow, I had the impression that the TrebleCLEF and EuroMatrix european projects are meant to gather this kind of information sources. But honestly, it's not as homogeneous as in OpenOffice. Mozilla also has dictionaries. Wiktionary can also be helpful. paul Le 7 janv. 2011 à 22:26, Robert M

two IndexSearchers on one dir?

2010-12-31 Thread Paul Libbrecht

Hello list, is it a good or bad thing to open to index-searchers on FSDirectories of the same path? (namely, one short-lived, one long-lived). thanks in advance paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apac

Comment in query-parser?

2010-12-30 Thread Paul Libbrecht

I'm more and more involved into preparing dedicated pages that list resources of our servers according to an elaborate query I received in a human description and "implement" as a query-parser query. Doing this I regularly use "indexed-doc" views. The implementation is thus a query that could

Re: Outof memory exception on using Integer.MaxValue

2010-12-28 Thread Paul Libbrecht

I also not that this is a fundamental characteristic of the great performance of Lucene and its related products since it allows cleanly managed resources. "this" is generally called paging. paul Le 28 déc. 2010 à 10:32, Uwe Schindler a écrit : > The TopDocs returning methods are not intended

Re: java.lang.NoClassDefFoundError: org/apache/lucene/util/CharacterUtils

2010-12-13 Thread Paul Libbrecht

Allow me to recommend a little trick to track the origin of a class which works often: org.apache.lucene.analysis.WhitespaceAnalyzer.class.getResource("WhitespaceAnalyzer.class") will give you a URL that should be the URL of the jar, followed by an exclamation mark, followed by the jar-inter

Lucene index exchange format?

2010-11-09 Thread Paul Libbrecht

hello list, more and more I seem to encounter situations where the delivery of a prebuilt lucene index is desirable. The binary format probably works (experience hints would be welcome) but I fear it would be fragile with versioning (it certainly fails at version-downgrading). Did anyone work

Re: Does Lucene compress postings (or posting lists) in its inverted index?

2010-10-17 Thread Paul Libbrecht

Mahmoud, Lucene's documents' fields can be, when stored, compressed on disk. I think that answers your question. paul On 17 oct. 2010, at 09:16, Mahmoud Abdelkader wrote: > Hello, > > We're currently evaluating utilizing Lucene to index a large English corpus > and we were are optimizing for

Re: trying to use the highlighter

2010-09-05 Thread Paul Libbrecht

ping! Any hope for help here? I'm a bit stuck before deploying a release. thanks in advance paul On 3 sept. 2010, at 14:05, Paul Libbrecht wrote: > > Hello list, > > I'm strugging again with the highlighter. I don't understand why I obtain > sporadicall

trying to use the highlighter

2010-09-03 Thread Paul Libbrecht

Hello list, I'm strugging again with the highlighter. I don't understand why I obtain sporadically InvalidTokenOffsetsException. The mission: given a query, detect which field was matched, among the names of the concepts: there can be several names for a given concept, also in one language. C

Re: Fastest way to get number of matching documents

2010-07-26 Thread Paul Libbrecht

Le 26-juil.-10 à 16:01, Michael McCandless a écrit : You can make a custom Collector? Ie, it'd just increment a counter for each hit. As long as it does not call the Scorer.score() method then no scoring is done. I've done that. Code below. It feels a bit stupid to have to do that thoug

Re: Best practices for searcher memory usage?

2010-07-13 Thread Paul Libbrecht

Le 13-juil.-10 à 23:49, Christopher Condit a écrit : * are there performance optimizations that I haven't thought of? The first and most important one I'd think of is get rid of NFS. You can happily do a local copy which might, even for 10 Gb take less than 30 seconds at server start. pa

Re: best way to interest two queries?

2010-05-15 Thread Paul Libbrecht

Le 12-mai-10 à 10:55, mark harwood a écrit : two terminology questions: - is multiplier in the mail mentioned there the same as boost? This factor controls how many decimal places precision is retained in the adjusted scores. Pick to low a multiplier and scores that are only differe

Re: best way to interest two queries?

2010-05-11 Thread Paul Libbrecht

oost? - I intended to use prefix and fuzzyqueries. I believe this is contradictory to this or? paul Le 11-mai-10 à 12:02, mark harwood a écrit : See https://issues.apache.org/jira/browse/LUCENE-1999 - Original Message ---- From: Paul Libbrecht To: java-user@lucene.apache.org Sent: T

Re: best way to interest two queries?

2010-05-11 Thread Paul Libbrecht

at could work well for a). I still don't know what to do for b). thanks for hints. paul Le 31-mars-10 à 23:00, Paul Libbrecht a écrit : I've been wandering around but I see no solution yet: I would like to intersect two query results: going through the list of one query and indi

Re: Retrieving indexed field data

2010-05-04 Thread Paul Libbrecht

No way but re-index, that was easy in my case. paul Le 04-mai-10 à 14:46, Licinio Fernández Maurelo a écrit : Hi all, notice that i come from the solr-user mail list to get an answer ... We need to retrieve the indexed data for a field *indexed but not stored*(yes, i know this could sounds s

an analyzer map at hand?

2010-04-26 Thread Paul Libbrecht

Hello Luceners, I am sure I'm not the only one having such a snippet in my dedicated analyzer: m.put("en", new SnowballAnalyzer("English")); m.put("es", new SnowballAnalyzer("Spanish")); m.put("de", new SnowballAnalyzer("German")); m.put("dk", new SnowballAnal

Re: Designing a multilingual index

2010-04-02 Thread Paul Libbrecht

Le 01-avr.-10 à 16:29, henrib a écrit : By issuing multiple queries, one against each localized index, results being clustered by locale. You can further refine by translating the end-user input query terms for each locale and issue "translated" queries against the respective indices. I've

Re: Designing a multilingual index

2010-04-01 Thread Paul Libbrecht

How? paul Le 01-avr.-10 à 14:19, henrib a écrit : Finally, query expansion can also be used in the multiple indices case and might even use automated/guided translation. - To unsubscribe, e-mail: java-user-unsubscr...@lu

best way to interest two queries?

2010-03-31 Thread Paul Libbrecht

Hello list, I've been wandering around but I see no solution yet: I would like to intersect two query results: going through the list of one query and indicating which ones actually match the other query or, even better, indicating that "passed this, nothing matches that query anymore".

Re: Designing a multilingual index

2010-03-31 Thread Paul Libbrecht

David, I'm doing exactly that. And I think there's one crucial advantage aside: multilingual queries: if your user requests "segment" you have no way to know which language he is searching for; erm, well, you have the user-language(s) (through the browser Accept-Language header for example)

Re: If you could have one feature in Lucene...

2010-02-24 Thread Paul Libbrecht

I would wish a highlighting feature that's fully integrated. paul On 24-févr.-10, at 14:42, Grant Ingersoll wrote: What would it be? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands,

Re: lucene webinterface

2010-02-16 Thread Paul Libbrecht

On 16-févr.-10, at 17:40, luciusvorenus wrote: how can I build a webinterface for my aplication ? I read something with HTML table and php but i had no idea? Can anobody help me? Lucius, try solr. paul - To unsubscribe,

"one of the terms"

2010-01-29 Thread Paul Libbrecht

Hello luceners, In our project, we are building queries from long list of possible terms (expanded through ontology deduction). I would like, however, that the rank is unaffected by the number of matches: one or thirty occurrences of one of the many words should give the same score. Did

are Lucene queries thread-safe?

2010-01-23 Thread Paul Libbrecht

Hello list, for some strange reason I wish to cache very frequent (and big, ~3000 terms) queries. Now, this might mean that a query is searched for in several threads on the same index. Do I run a risk? thanks in advance paul --

Re: a complete solution for building a website search with lucene

2010-01-08 Thread Paul Libbrecht

Zhou, Lucene is a back-end library, it's very useful for developer but it is not a complete site-search-engine. A lucene-based site-search-engine is Nutch, it does crawl. Solr also provides functions close to these with a large amount of thoughts on flexible integration; crawling methods are

Re: [ANN] Luke 0.9.9 release

2009-10-23 Thread Paul Libbrecht

Because I like to have Luke always sitting at hand, I have packed this release as a MacOSX disk-image and applcation. http://www.activemath.org/~paul/tmp/Luke-0.9.9.dmg The icon could be better (I need a hires of Lucene's icon, haven't found it yet). Potentially the packaging should

Re: Using org.apache.lucene.analysis.compound

2009-10-21 Thread Paul Libbrecht

, maxDocs=1) 1.6294457 = queryNorm 0.15342641 = (MATCH) fieldWeight(field:gesetz in 0), product of: 1.0 = tf(termFreq(field:gesetz)=1) 0.30685282 = idf(docFreq=1, maxDocs=1) 0.5 = fieldNorm(field=field, doc=0) On Wed, Oct 21, 2009 at 3:16 PM, Paul Libbrecht wrote: Can

Re: Using org.apache.lucene.analysis.compound

2009-10-21 Thread Paul Libbrecht

Can the dictionary have weights? überwachungsgesetz alone probably needs a higher rank than überwachung and gesetzt or? paul Le 21-oct.-09 à 21:09, Benjamin Douglas a écrit : OK, that makes sense. So I just need to add all of the sub-compounds that are real words at posIncr=0, even if th

Re: Using org.apache.lucene.analysis.compound

2009-10-21 Thread Paul Libbrecht

I'm interested to this analyzer.. it had escaped me and solves an old problem! Could you report about its usage: - did you have to feed words in a dictionary? - does anyone have user-measures already? ... and the last question for the research fun: is there any approach towards preferring Üb

Re: OpenRelevance

2009-10-16 Thread Paul Libbrecht

Not something for the very soon future, but I'd be interested to base on such an infrastructure for a mathematical-formulæ search corpus (both semantic and presentation math). I believe the OpenRelevance infrastructure might present a best practice or infrastructure to be based on for such.

Re: Review and questions about Lucene Java 2.9.0

2009-10-08 Thread Paul Libbrecht

Mehdi, your requirements sound to be fulfilled mostly by Apache Solr which is a web-based packaging of Lucene. paul. Le 08-oct.-09 à 10:11, Mehdi Ben Hamida a écrit : Hello, I'm reviewing and doing some researches on Lucene Java 2.9.0, to check if it meets our needs. Unfortunat

Re: document diversity

2009-10-06 Thread Paul Libbrecht

Just as you can add a query that will boost better things with a higher quality, you can add a query for a higher revenue. Basically, the default operator "should" in boolean-clauses can be used exactly for that: do not force this query to be matched but raise boost if there's something tha

Re: phonetic encoders for other languages?

2009-08-23 Thread Paul Libbrecht

Le 23-août-09 à 17:05, Petite Abeille a écrit : I will need to use phonetic analyzers to do "phonetic search". I know of the Metaphone analyzers and use them but they're really only known to work for English. Double Metaphone? http://en.wikipedia.org/wiki/Double_Metaphone thanks, I hadn't

phonetic encoders for other languages?

2009-08-23 Thread Paul Libbrecht

Hello list, I will need to use phonetic analyzers to do "phonetic search". I know of the Metaphone analyzers and use them but they're really only known to work for English. Does anyone have pointers to projects that encode phonetically words of other languages? I'm interested to French,

Re: wheres the word

2009-06-25 Thread Paul Libbrecht

Le 25-juin-09 à 01:28, Mark Miller a écrit : im figgering about the following problem. in my index i cant find the word BE, but it exists in two documents. im usinglucene 2.4 with the standardanalyzer. other querys with words like de, et or de la works good. any ideas? be is a stopword. Do

Re: Lucene for the Mac

2009-06-08 Thread Paul Libbrecht

Le 08-juin-09 à 23:55, Ian Vink a écrit : Is there a Mac port of the Lucene engine? I don't get it, are you asking whether Lucene java works on MacOSX? answer is yes. Are you asking for a Cocoa and ObjC port? (don't know) paul smime.p7s Description: S/MIME cryptographic signature

Re: Help Needed...

2009-05-28 Thread Paul Libbrecht

Kumar, you'll have to make your own documents with after parsing yourself the HTML (e.g. with Nekohtml to dom). As for the weights of tokens, supplementarily to IDF, you can do that per field, i.e. when you add a field into the document. paul Le 28-mai-09 à 12:22, Gaurav Kumar a écrit :

Re: Lucene index on iPhone

2009-05-06 Thread Paul Libbrecht

Shashi, the only java I know for iphone is with Cydia on jailbroken iphones. Is this the type of things you're looking at? paul Le 06-mai-09 à 12:08, Shashi Kant a écrit : I am working on an iPhone application where the Lucene index needs to reside on-device (for multiple reasons). Has anyone

Re: Servlets Sharing Resources

2009-04-21 Thread Paul Libbrecht

Various servlets or various webapps? Various servlets is trivial, indeed using ServletContext.getAttribute(). Various webapps is more difficult: - you need to set cross context so that context.getContext("/ otherpath") is accessible (a config of context in tomcat) - you need classes to be shared

Re: Indexing Complex XML

2009-04-18 Thread Paul Libbrecht

daniel, have a look at solr DIH, it has prebuilt tools to do just that. http://wiki.apache.org/solr/DataImportHandler This bases on solr which is a web-application that bases on lucene. It does not need imperatively to be run as a web application though, it can be embedded. paul Le 18-avr

Re: semantic vectors

2009-04-06 Thread Paul Libbrecht

I am sorry Nittin, I may have injected you the doubt about this... semantic-vectors is a project based on Lucene: http://code.google.com/p/semanticvectors/ you probably want to look there and ask questions on the forum there. paul Le 06-avr.-09 à 22:45, Richard Marr a écrit : Hi Nitin,

Re: How to know the matched field?

2009-03-24 Thread Paul Libbrecht

found = text; } } I still don't grasp why there's TextFragment(stringbuffer) and the pass through the tokenizers but removing any of them breaks my unit- test. I guess this is the whole idead behind LUCENE-1522 which I would up-take later. paul Le 23

Re: Matching query terms

2009-03-23 Thread Paul Libbrecht

searcher.explain definitely seems to do the trick, going through the sub-queries. paul Le 23-mars-09 à 13:12, Wouter Heijke a écrit : I want to know for each term in a query if it matched the result or not. What is the best way to implement this? Highlighter seems to be able to do the tri

Re: How to know the matched field?

2009-03-23 Thread Paul Libbrecht

the solutions... On Sun, Mar 22, 2009 at 4:30 PM, Paul Libbrecht wrote: in an auto-completion task, I would like to show to the user the field that's been matched against the query in the found document. Typically, my documents have multiple fields for each field-name and I would like

How to know the matched field?

2009-03-22 Thread Paul Libbrecht

Hello list, in an auto-completion task, I would like to show to the user the field that's been matched against the query in the found document. Typically, my documents have multiple fields for each field-name and I would like the index's findings to give me the field used. How can I do t

robust inverse of query parser?

2009-03-20 Thread Paul Libbrecht

Hello luceners, query.toString() does a fair job at being reparsed by QueryParser but is there a safe way to do so? I have a lucene query object and want a string that QueryParser will reparse fairly exacty. thanks in advance paul smime.p7s Description: S/MIME cryptographic signature

1 2 >

1 - 100 of 161 matches

Mail list logo