[ANNOUNCE] Apache Lucene 8.11.1 released

2021-12-17 Thread Jan Høydahl
The Lucene PMC is pleased to announce the release of Apache Lucene 8.11.1. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. Th

Re: Distributed IDF for Solr using ExactStatsCache issue

2021-03-19 Thread Jan Høydahl
Hi, You may want to ask this question in the Solr Users mailing list instead of this one which is dedicated to the Lucene Java library - https://solr.apache.org/community.html#mailing-lists-chat <https://solr.apache.org/community.html#mailing-lists-chat> Jan > 16. mar. 2021 kl. 20

Re: [VOTE] Lucene logo contest, third time's a charm

2020-09-08 Thread Jan Høydahl
Congrats to Lucene with a new logo!! Jan Høydahl > 8. sep. 2020 kl. 18:08 skrev Simon Willnauer : > > Thank you ryan for pushing on this, being persistent and getting the vote > out. > > > >> On Tue, Sep 8, 2020 at 5:55 PM Ryan Ernst wrote: >> >> T

Re: [VOTE] Lucene logo contest, here we go again

2020-09-01 Thread Jan Høydahl
D (binding) Jan > 1. sep. 2020 kl. 02:26 skrev Ryan Ernst : > > Dear Lucene and Solr developers! > > In February a contest was started to design a new logo for Lucene > [jira-issue]. The initial attempt [first-vote] to call a vote resulted in > some confusion on t

[ANNOUNCE] Apache Lucene 7.7.2 released

2019-06-05 Thread Jan Høydahl
4 June 2019, Apache Lucene™ 7.7.2 available The Lucene PMC is pleased to announce the release of Apache Lucene 7.7.2. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full

Re: Environmental Protection Agency: Stop Deforesting in Sri Lanka

2019-03-22 Thread Jan Høydahl
I'm surprised that msgs with "To: undisclosed-recipients:;" are even delivered to the list? I.e. should it be possible to BCC solr-user? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 21. mar. 2019 kl. 19:45 skrev Noble Paul : > >

RE: Link Lucene index with Adobe reader

2018-02-06 Thread Jan Tosovsky
ime PDF is > searched using index. Are you sure your index is used? Isn't that just kind of search cache? Acrobat Reader has to understand your index. I doubt it can work out-of-the-box. Jan - To unsubscrib

A new Snowball stemmer

2017-10-01 Thread Jan Tosovsky
class: new Among("ce", -1, 1) vs. new Among ("ce", -1, -1, "", methodObject) * in the find_among_b() method only two params are accepted What is the procedure for producing Lucene-compatible stemmers from SBL file? Is there any automation or should I modify that origi

Re: Research problems on numeric values into text (with. or,)

2016-09-27 Thread Jan Høydahl
why is that a problem? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 27. sep. 2016 kl. 10.15 skrev Jérémy GUYENOT : > > Hello, > > we find research problems on numeric values into text (with. or,). Unable to > search 315.86 or 315.86. > >

RE: Strange index corruption related to numeric fields when upgrading from 6.0.1

2016-09-21 Thread Jan-Willem van den Broek
uot; [p]calculon". Thanks for the suggestion though. I'd never even considered that we might be using illegal fieldnames. Regards, Jan-Willem -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, September 20, 2016 19:02 To: java-user Subject: Re

Strange index corruption related to numeric fields when upgrading from 6.0.1

2016-09-20 Thread Jan-Willem van den Broek
unately, while I can reproduce the issue consistently in the full application, I don't yet have a clean test case with just/mostly Lucene code. Any feedback is much appreciated! Jan-Willem v/d Broek - To unsubscribe, e

IllegalStateExceptions after upgrading to 6.1.0

2016-06-23 Thread Jan-Willem van den Broek
tMergeScheduler.java:626) Met vriendelijke groet, drs. Jan-Willem van den Broek Ontwikkelaar [Beschrijving: Linkedin]<http://nl.linkedin.com/pub/jan-willem-van-den-broek/1/5bb/4b0> [Beschrijving: ValueCare]<http://www.valuecare.nl/> [Beschrijving: http://www.valuecare.nl/images-email-signatu

Search similar documents using dense vectors (alternative to MORELIKETHIS)

2016-02-24 Thread Jan Rygl
Hello, I would like to ask if has somebody tried/planned to implement indexing for dense vectors. The default scoring process is suitable only for text documents, but we would like to use/support/develop a plugin enabling to combine/replace default index by the dense vector index for non-textual d

Re: Lucene roadmap for language analyzers

2016-02-22 Thread Jan Høydahl
Hi, Moving discussion to Lucene user list. You may want to look at these references: * http://lucene.472066.n3.nabble.com/JLemmaGen-project-td4097466.html * https://github.com/Amice13/ukr_stemmer -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 18. feb. 2016

Re: Concurrent indexing performance problem

2013-03-07 Thread Jan Stette
after each index has been fully populated, or alternatively do searches across multiple indexes. How would you expect such a solution to perform by comparison? Best regards, Jan On 7 March 2013 17:44, Michael McCandless wrote: > This sounds reasonable (500 M docs / 50 GB index), though you

Concurrent indexing performance problem

2013-03-07 Thread Jan Stette
educe the amount of time spent merging segments? - What can I do to improve concurrency of indexing? Any suggestions would be highly appreciated. Regards, Jan

Re: Multiple facets in Lucene searches

2012-11-22 Thread Jan Stette
That works great, thanks! Jan On 21 November 2012 19:52, Shai Erera wrote: > Hi Jan, > > Basically, DrillDown is a helper class for creating such queries. You're > right that its query() methods create AND, because that's normally the > case, but if you requ

Re: Exact Match Query in Lucene with SnowBall Analyzer

2012-05-09 Thread Jan Høydahl
Hi, The behavior is expected with stemming. Have you tried using StandardAnalyzer which do not do stemming? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 9. mai 2012, at 08:12, Yogesh patel wrote: > I am using Lucene an

Re: [ANNOUNCE] Apache Lucene 3.3

2011-07-05 Thread Jan Engler
Hi Robert, thanks a lotfound the right one ;-) Thx again, Jan Am 05.07.2011 14:34, schrieb Robert Muir: > Hi Jan, > > * LUCENE-2323: Moved contrib/regex into contrib/queries. Moved the >queryparsers under contrib/misc and contrib/surround into > contrib/querypar

Re: [ANNOUNCE] Apache Lucene 3.3

2011-07-05 Thread Jan Engler
Hi, does anyone know where I could find the class "ChainedFilter" in Lucene 3.3.? Before our Upgrade (from 3.0.2 to 3.3.3) it was located in lucene-miscbut I cannot find that anymore at that location... Thx for your help, Jan Am 01.07.2011 07:56, schrieb Robert Muir: > July

Re: WELCOME to java-user@lucene.apache.org

2011-07-03 Thread Jan Rothhaar
; 'ö'), but in a multi-byte charset. I have some modest experience in programming in java, but am far from being a guru. Any help is appreciated. Thanks in advance, Jan -- Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir belo

Re: [Announce] RankingAlgorithm ver 1.0

2011-01-27 Thread Jan Burse
Do you take into accout the links between the documents? Nagendra Nagarajayya schrieb: Hi! I would like to announce RankingAlgorithm. RankingAlgorithm is a new search algorithm that seems to enable Solr to returns results comparable to Google site search results, and much better than Lucene

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-19 Thread Jan Engler
Where do you get your Lucene/Solr downloads from? [X] ASF Mirrors (linked in our release announcements or via the Lucene website) [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [] I/we build them from source via an SVN/Git checkout. [] Other (someone in your company

SpanOrQuery with unreasonable high scores

2010-12-07 Thread Jan Kurella
this point in time nor does the Spans concept support boosts or norms. so how do I get the SpanOrQuery best normed to be comparable to a BooleanQuery? Jan - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For

Re: DisMaxQuery calculating too high sumOfSquaredWeights?

2010-11-26 Thread Jan Kurella
On 26.11.2010 14:50, ext Jan Kurella wrote: On 26.11.2010 14:39, ext Jan Kurella wrote: Hi there, I was composing a Query like the Solr.DisMaxQueryHandler would do on my own as I needed a different Tokenizing strategy for non whitespace separated languages and more. The concept I took from

Re: DisMaxQuery calculating too high sumOfSquaredWeights?

2010-11-26 Thread Jan Kurella
On 26.11.2010 14:39, ext Jan Kurella wrote: Hi there, I was composing a Query like the Solr.DisMaxQueryHandler would do on my own as I needed a different Tokenizing strategy for non whitespace separated languages and more. The concept I took from http://www.lucidimagination.com/blog/2010/05

DisMaxQuery calculating too high sumOfSquaredWeights?

2010-11-26 Thread Jan Kurella
correct way and I do not see a way to to know in DisjunctionMaxQuery.DisjunctionMaxWeight.sumOfSquaredWeights() whether a returned currentWeight.sumOfSquaredWeights() comes from a TermQuery which only term has a df of 0? How to solve this problem to get a "better" sumOfSquaredWeights() from DisMaxQuery? The curre

Re: custom attributs in tokens

2010-11-25 Thread Jan Kurella
Hi Simon, On 25.11.2010 10:40, ext Simon Willnauer wrote: Hi Jan, On Wed, Nov 24, 2010 at 9:12 AM, wrote: Of course: We are trying to search in documents that contain text in several languages. We are also investigating other approaches*, so this is not about finding other variants. the

Re: uncorrect results

2010-11-18 Thread Jan
#x27;t know how ;) Anyway a scoring system should not "invent" token i think. but thanks jan Am Donnerstag, den 18.11.2010, 13:05 -0500 schrieb Pulkit Singhal: > Briefly looked at your code and there is no way that I'm right about > this but I'll say it anyway: > Every sing

Re: uncorrect results

2010-11-17 Thread Jan
thats what baffles me. thanks for the very quick reaction jan Am Mittwoch, den 17.11.2010, 12:57 -0500 schrieb Donna L Gresh: > As it is probably more likely that you're doing something incorrect than > that Lucene is reporting incorrect results :), it might help if you > reported t

uncorrect results

2010-11-17 Thread Jan
/TextAnalytics2/tree/master/src/textanalytics2/ Thanks in advance jan PS: I use Lucene 3.0.2 and the OpenJDK Runtime Environment (IcedTea6 1.8.2) on an 64 bit Linux machine. signature.asc Description: Dies ist ein digital signierter Nachrichtenteil

Query Formalism for Texts with Program Code

2010-10-28 Thread Jan Burse
Dear All Was setting up a web search with a query language that uses (, !, ), ^, *, ?, {, } and < in its syntax. For example: hot dog: Looks for documents with hot and dog in close vincinity. (hot dog): Looks for documents with hot or dog in it. This all

Case insensitive search

2010-10-08 Thread Jan Engler
o not store the results in lowercase. Is there any possibility to leave the index as it is (mixed lowercase and uppercase) but allow the search to be case-insensitive? I am using the Lucene 3.0.2 and (for now) the StandardAnalyzer... Thanks for your help and best wishes from Germany,

Re: Unintelligent implementation of IndexWriter locking?

2009-08-30 Thread Jan Peter Stotz
about the "locking file" in the JavaDoc of the IndexWriter class that explains how to specify the LockFactory? I am currently using FSDirectory.getDirectory(File,LockFactory)) and then use that Directory to create my IndexWriter. That would make it easier for others to find their way..

Unintelligent implementation of IndexWriter locking?

2009-08-30 Thread Jan Peter Stotz
ucene not use a more intelligent approach, e.g. the one via RandomAccessFile as it is presented here: http://jimlife.wordpress.com/2008/07/21/java-application-make-sure-only-singleone-instance-running-with-file-lock-ampampampampamp-shut

lucene score and float precision

2009-04-29 Thread Jan Paetzold
-- Explanation: 21.118654 = (MATCH) sum of: ... The explanation function reports a score as expected, compared to the other results of the search, while the ScoreDoc score is a bit to low or to high. Does anyone have an idea? regards, Jan

Re: Problem using Lucene on Ubuntu

2008-02-18 Thread Jan Peter Stotz
ncoding. I would print out the extracted text into a plain text files and compare if there are differences between the file generated on Windows and Linux/Ubuntu. This allows to determine if this is a WordExtractor or a Lucene prob

remote stored index

2008-02-17 Thread Jan Pieper
pload index" but I don't think that this is a good solution. Also read about an index server. -- Jan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Using Lucene 2.3.0 with PDFBox

2008-02-13 Thread Jan Peter Stotz
ile.getPath(), Field.Store.YES, Field.Index.UN_TOKENIZED)); Jan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Supported File Formats - PDF, MHT

2008-02-12 Thread Jan Peter Stotz
text representation. For details please see the Lucene FAQ: http://wiki.apache.org/lucene-java/LuceneFAQ#head-c45f8b25d786f4e384936fa93ce1137a23b7e422 Jan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail

Re: Using a QueryParser with an untokenized field?

2008-02-01 Thread Jan Peter Stotz
er, you will be unable find elements that contain upper characters. Jan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Creating an alias for a field name?

2008-01-19 Thread Jan Peter Stotz
just ask. I did not want to attach/include them because of its size (most of the code has been generated by Eclipse...) Jan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Creating an alias for a field name?

2008-01-19 Thread Jan Peter Stotz
Hi, I would like to provide multiple field-names that are all mapped to the same field in background (e.g. a long field-name and a short field-name). Is there any mechanism for creating such field-aliases, may be in the IndexWriter or an QueryParser? Jan

Problems while indexing

2007-10-31 Thread Jan F.
gives me an error message. Sadly very short: "caught a class java.lang.StringIndexOutOfBoundsException with message: String index out of range: -1" Has anyone worked with large Websites and Lucene before? Solutions? Error occures with lucene 2.2 and 2.0.

How to skip menu structure while parsing HTML sites?

2006-12-21 Thread Jan Francsi
ue() | )* { return t; } } void CommentTag() : {} { ( ( )* ) | ( ( )* ) } void ScriptTag() : {} { ( )* } TOKEN : { < ScriptStart: " : WithinScript | < TagName: "<" ("/")? ["A"-"Z","a"-"z"] ()? > : With

Re: Problem with Field.Text()

2006-10-05 Thread Jan Pieper
Yeah it works :) thanks to all, for help. You have to create a new Field class with "new Field(...", i.e. replace doc.add(Field.Text with doc.add(new Field(... Antony Jan Pieper wrote: No it is not your fault, it is mine, but it also does not function. My compiler giv

Re: Problem with Field.Text()

2006-10-05 Thread Jan Pieper
, org.apache.lucene.document.Field.Store, org.apache.lucene.document.Field.Index ); I found the declaration in the docs but i does not function. I do not know why. I downloaded the actual stable version: 2.0.0. -- Jan Blah. Sorry for the

Problem with Field.Text()

2006-10-05 Thread Jan Pieper
is the mistake? -- Jan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: how to free memory after index ist build.

2005-08-03 Thread Jan Philipp Seng
have any experience with memory leaks with this connector and how to avoid them ? Bye bye, Jan > However, i see a few problems in your code. > 1) you should take the JDBC code for getting the connection and > creation of an SQL statement out of that method, so it is not called > repe

Re: how to free memory after index ist build.

2005-08-03 Thread Jan Philipp Seng
conn.close(); conn = null; stmt.close(); stmt = null; rs.close(); rs = null; Bye for now. Jan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

how to free memory after index ist build.

2005-08-02 Thread Jan Philipp Seng
ex oldIndex.delete(); newIndex.renameTo(oldIndex); // load the new index to RAMDirectory from harddisc SearchTables.reloadIndex(); Thanks for your help, Jan Philipp Seng, Germany, Aachen