fuzzy prefix query?

2015-12-06 Thread Clemens Wyss DEV
Is it possible to do a fuzzy prefix query in Lucene: e.g.: term "foh*~" should match: foo foobar fohbar ... i.e. fuzzyness should be applied to the given prefix. Thx - Clemens - To unsubscribe, e-mail:

AW: Luke for Lucene 5.x?

2015-04-29 Thread Clemens Wyss DEV
+09:00 Clemens Wyss DEV clemens...@mysign.ch: I'll give it a try. Any plans for luke to support Lucene 5.x, too? -Ursprüngliche Nachricht- Von: Koji Sekiguchi [mailto:koji.sekigu...@rondhuit.com] Gesendet: Freitag, 24. April 2015 08:23 An: java-user@lucene.apache.org Betreff: Re

AW: Luke for Lucene 5.x?

2015-04-25 Thread Clemens Wyss DEV
stands for Natural Language Processing for Lucene, has a function for browsing Lucene index aside from NLP tools. It supports 5.x index format. https://github.com/NLP4L/nlp4l#using-lucene-index-browser Thanks, Koji On 2015/04/24 15:10, Clemens Wyss DEV wrote: From time to time I make use of luke

Luke for Lucene 5.x?

2015-04-24 Thread Clemens Wyss DEV
From time to time I make use of luke to inspect lucene indexes. I appreciate this tool very much. Will there be a version of Luke for Lucene 5.x? Or is there one already? Thx Clemens

Lucene 5 : createComponents without reader

2015-02-23 Thread Clemens Wyss DEV
My custom Analyzer had the following (Lucene 4) impl of createComponents: protected TokenStreamComponents createComponents ( final String fieldName, final Reader reader ) { Tokenizer source = new KeywordTokenizer( reader );

AW: Lucene 4.x - 5 : IllegalStateException while sorting

2015-02-23 Thread Clemens Wyss DEV
consuming than FieldCache. [1] https://wiki.apache.org/lucene-java/ReleaseNote50 On Mon, Feb 23, 2015 at 1:24 PM, Clemens Wyss DEV clemens...@mysign.ch wrote: After upgrading to Lucene 5 one of my unittest which tests sorting fails with: unexpected docvalues type NONE for field 'providertestfield

AW: Lucene 5 : createComponents without reader

2015-02-23 Thread Clemens Wyss DEV
Got this one sorted out. I was still referencing the 4.x lucene-analyzers.jar which required the reader ;) Sorry for the noise! -Ursprüngliche Nachricht- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Montag, 23. Februar 2015 12:42 An: java-user@lucene.apache.org Betreff

Lucene 4.x - 5 : IllegalStateException while sorting

2015-02-23 Thread Clemens Wyss DEV
After upgrading to Lucene 5 one of my unittest which tests sorting fails with: unexpected docvalues type NONE for field 'providertestfield' (expected=SORTED). Use UninvertingReader or index with docvalues What am I missing?

AW: Lucene 4.x - 5 : IllegalStateException while sorting

2015-02-23 Thread Clemens Wyss DEV
, you should really enable DocValues for fields you want to sort on. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Sent: Monday, February 23, 2015 2

Indexing an IntField but getting SotredField from found Document

2015-02-19 Thread Clemens Wyss DEV
When I index a Document with an IntField and then find that very Document the former IntField is returned as StoredField. How do I determine the original fieldtype (IntField, LongField, DoubleField ...)? Must I ? Number number = Field.numericValue(); if( number != null ) { if( number

AW: Indexing an IntField but getting SotredField from found Document

2015-02-19 Thread Clemens Wyss DEV
or whatever. But if you know you stored it as an IntField then surely you already know it's an integer? Unless you sometimes store different things in the one field. I wouldn't do that. -- Ian. On Thu, Feb 19, 2015 at 12:22 PM, Clemens Wyss DEV clemens...@mysign.ch wrote: When I index

[tika] ForkParser, Lost connection to a forked server process

2015-02-17 Thread Clemens Wyss DEV
Sorry for cross-posting, but the tika-ml does not seem to be too lively: I am trying to make use of the ForkParser. Unfortunately I am getting „Lost connection to a forked server process“ for an (encrypted) pdf which I can extract „in-process“. Extracting the document in-process takes approx

AW: LowercaseFilter, preserveOriginal?

2015-01-27 Thread Clemens Wyss DEV
only ... -Ursprüngliche Nachricht- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Dienstag, 27. Januar 2015 09:08 An: java-user@lucene.apache.org Betreff: LowercaseFilter, preserveOriginal? Why does the LowecaseFilter, opposed to the ASCIIFoldingFilter, have

LowercaseFilter, preserveOriginal?

2015-01-27 Thread Clemens Wyss DEV
Why does the LowecaseFilter, opposed to the ASCIIFoldingFilter, have no preserveOriginal-argument? I very much preserveOriginal=true when applying the ASCIIFoldingFilter for (german)suggestions

AW: fuzzy/case insensitive AnalyzingSuggester )

2015-01-24 Thread Clemens Wyss DEV
- From: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Sent: Friday, June 20, 2014 6:47 AM To: java-user@lucene.apache.org Subject: AW: fuzzy/case insensitive AnalyzingSuggester ) Sorry for re-asking. Has anyone implemented an AnalyzingSuggester which - is fuzzy - is case insensitive (or must

howto: handle temporal visibility of a document?

2015-01-12 Thread Clemens Wyss DEV
We have documents that are not always visible (visiblefrom-visibleto). In order to not have to query the originating object of the document whether it is currently visible (after the query), we'd like to put metadata into the documents, so that the visibility can be determined at query-time (by

AW: howto: handle temporal visibility of a document?

2015-01-12 Thread Clemens Wyss DEV
in ms]) OR ( visiblefrom:[* TO now in ms] AND visibleto:[ now in ms TO *]) -Ursprüngliche Nachricht- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Montag, 12. Januar 2015 09:40 An: java-user@lucene.apache.org Betreff: howto: handle temporal visibility of a document? We have

AW: AW: howto: handle temporal visibility of a document?

2015-01-12 Thread Clemens Wyss DEV
. -Mike On 1/12/15 4:23 AM, Clemens Wyss DEV wrote: I'll add/start with my proposal ;) Document-meta fields: + visiblefrom [long] + visibleto [long] Query or query filter: (*:* -visiblefrom:[* TO *] AND -visibleto:[* TO *]) OR (*:* -visiblefrom:[* TO *] AND visibleto:[ now in ms

RE: RE: howto: handle temporal visibility of a document?

2015-01-12 Thread Clemens Wyss DEV
to use this very query as query filter (qf), but I guess it doesn't make sense because 'now in ms' changes at every call ;) -Ursprüngliche Nachricht- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Montag, 12. Januar 2015 17:14 An: java-user@lucene.apache.org Betreff: AW: AW

AW: Looking for docs that have certain fields empty (an/or not set)

2015-01-07 Thread Clemens Wyss DEV
:[* TO *] (That's asterisk:asterisk -field1:[* TO *] in case the silly list interprets the asterisks as markup) There's some special magic in filter query processing to handle this case, but not in the main query parser. Best, Erick On Wed, Jan 7, 2015 at 8:14 AM, Clemens Wyss DEV clemens...@mysign.ch

Looking for docs that have certain fields empty (an/or not set)

2015-01-07 Thread Clemens Wyss DEV
Say I wanted to find documents which have no content in field1 (or dosuments that have no field 'field1'), wouldn't that be the following query? -field1:[* TO *] Thanks for you help Clemens - To unsubscribe, e-mail:

AW: Looking for docs that have certain fields empty (an/or not set)

2015-01-07 Thread Clemens Wyss DEV
to handle this case, but not in the main query parser. Best, Erick On Wed, Jan 7, 2015 at 8:14 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: Say I wanted to find documents which have no content in field1 (or dosuments that have no field 'field1'), wouldn't that be the following query? -field1

batch-update-pattern, NoMergeScheduler?

2014-12-22 Thread Clemens Wyss DEV
One of our indexes is updated completely quite frequently - batch update or re-index. If so more than 2million documents are added/updated to/in the very index. This creates an immense IO load on our system. Does it make sense to set merge scheduler to NoMergeScheduler (and/or MergePolicy to

[suggestions] fetch terms from a FilterAtomicReader(subclass)?

2014-10-27 Thread Clemens Wyss DEV
Is it possible to fetch the terms of a FilterAtomicReader in order to provide suggestions from a subset of all documents in an index? So my target is to provide suggestions from a subset of all documents in an index. Note: I have an equal discussion ongoing in the solr-mailinglist. But I

AW: [suggestions] fetch terms from a FilterAtomicReader(subclass)?

2014-10-27 Thread Clemens Wyss DEV
Betreff: Re: [suggestions] fetch terms from a FilterAtomicReader(subclass)? On 10/27/2014 07:32 AM, Clemens Wyss DEV wrote: Is it possible to fetch the terms of a FilterAtomicReader in order to provide suggestions from a subset of all documents in an index? Yes, it is possible. I do it by feeding

QueryParserUtil, big query with wildcards - runs endlessly and produces heavy load

2014-06-26 Thread Clemens Wyss DEV
The following testcase runs endlessly and produces VERY heavy load. ... String query = Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut + labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et

AW: fuzzy/case insensitive AnalyzingSuggester )

2014-06-22 Thread Clemens Wyss DEV
suggestions are merged into the final result list, helps improving the user experience, at least with our use cases. Cheers, Oli -Original Message- From: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Sent: Friday, June 20, 2014 6:47 AM To: java-user@lucene.apache.org Subject: AW: fuzzy

AW: fuzzy/case insensitive AnalyzingSuggester )

2014-06-20 Thread Clemens Wyss DEV
Sorry for re-asking. Has anyone implemented an AnalyzingSuggester which - is fuzzy - is case insensitive (or must/should this be implemented by the analyzer?) - does infix search [- has a small memory footprint] -Ursprüngliche Nachricht- Von: Clemens Wyss DEV [mailto:clemens

AW: IndexWriter#updateDocument(Term, Document)

2014-06-19 Thread Clemens Wyss DEV
to work; if it doesn't it's a bad bug :) Can you reduce it to a small example? Mike McCandless http://blog.mikemccandless.com On Wed, Jun 18, 2014 at 10:08 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: I would like to perform a batch update on an index. In order to omit duplicate entries I am

AW: IndexWriter#updateDocument(Term, Document)

2014-06-19 Thread Clemens Wyss DEV
://blog.mikemccandless.com On Thu, Jun 19, 2014 at 12:54 PM, Clemens Wyss DEV clemens...@mysign.ch wrote: directory = new SimpleFSDirectory( indexLocation ); IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_47, new WhitespaceAnalyzer( Version.LUCENE_47 )); indexWriter = new

IndexWriter#updateDocument(Term, Document)

2014-06-18 Thread Clemens Wyss DEV
I would like to perform a batch update on an index. In order to omit duplicate entries I am making use of IndexWriter#updateDocument(Term, Document) open an IndexWriter; foreach( element in elementsToBeUpdatedWhichHaveDuplicates ) { doc = element.toDoc(); indexWriter.updateDocument(

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-06-15 Thread Clemens Wyss DEV
Betreff: Re: [lucene 4.6] NPE when calling IndexReader#openIfChanged On Fri, Jun 13, 2014 at 8:53 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: Thanks a lot! large text fields What is a good limit (in characters) to switch from StringField to TextField? Do LangugaeAnalyzers (e.g. GermanAnalyzer

AW: AW: Analyzing suggester for many fields

2014-06-13 Thread Clemens Wyss DEV
something similar, adding weighting as some function of doc freq (and using Scala). Cheers, Neil On 13/06/14 00:19, Clemens Wyss DEV wrote: enter InputIteratorWrapper ;) i.e. new InputIteratorWrapper(tfit ) -Ursprüngliche Nachricht- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-06-13 Thread Clemens Wyss DEV
, May 21, 2014 at 3:17 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: Can you just decrease IW's ramBufferSizeMB to relieve the memory pressure? +1 Is there something alike for IndexReaders? No, although you can take steps during indexing to reduce the RAM required during searching, e.g. limit

fuzzy/case insensitive AnalyzingSuggester )

2014-06-13 Thread Clemens Wyss DEV
Looking for an AnalyzingSuggester which supports - fuzzyness - case insensitivity - small (in memors) footprint (*) (*)Just tried to hand my big IndexReader (see oher post [lucene 4.6] NPE when calling IndexReader#openIfChanged) into JaspellLookup. Got an OOM. Is there any (Jaspell)Lookup

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-06-13 Thread Clemens Wyss DEV
: Michael McCandless [mailto:luc...@mikemccandless.com] Gesendet: Freitag, 13. Juni 2014 13:15 An: Lucene Users Betreff: Re: [lucene 4.6] NPE when calling IndexReader#openIfChanged On Fri, Jun 13, 2014 at 3:02 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: limit how many fields have norms enabled

AW: Analyzing suggester for many fields

2014-06-12 Thread Clemens Wyss DEV
Nachricht- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Mittwoch, 11. Juni 2014 12:57 An: java-user@lucene.apache.org Betreff: AW: Analyzing suggester for many fields Unfortunately the link provided by Goutham is no more valid. Anybody still got the code? -Ursprüngliche

AW: Analyzing suggester for many fields

2014-06-12 Thread Clemens Wyss DEV
enter InputIteratorWrapper ;) i.e. new InputIteratorWrapper(tfit ) -Ursprüngliche Nachricht- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Donnerstag, 12. Juni 2014 16:01 An: java-user@lucene.apache.org Betreff: AW: Analyzing suggester for many fields trying to re-build

AW: Analyzing suggester for many fields

2014-06-11 Thread Clemens Wyss DEV
Unfortunately the link provided by Goutham is no more valid. Anybody still got the code? -Ursprüngliche Nachricht- Von: Goutham Tholpadi [mailto:gtholp...@gmail.com] Gesendet: Donnerstag, 29. August 2013 06:21 An: java-user@lucene.apache.org Betreff: Re: Analyzing suggester for many

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-05-21 Thread Clemens Wyss DEV
] NPE when calling IndexReader#openIfChanged On Mon, May 19, 2014 at 6:14 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: Mike, first of all thanks for all your input, I really appreciate (as much as I like reading your blog). You're welcome! Hmm, but you swap these files over while

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-05-19 Thread Clemens Wyss DEV
directly from the index directory yourself between reopens? Mike McCandless http://blog.mikemccandless.com On Mon, May 19, 2014 at 1:36 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: Sorry for being imprecise java version 1.6.0_26 Java(TM) SE Runtime Environment (build 1.6.0_26-b03) Java

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-05-19 Thread Clemens Wyss DEV
after deleteAll? -Ursprüngliche Nachricht- Von: Michael McCandless [mailto:luc...@mikemccandless.com] Gesendet: Montag, 19. Mai 2014 11:05 An: Lucene Users Betreff: Re: [lucene 4.6] NPE when calling IndexReader#openIfChanged On Mon, May 19, 2014 at 4:59 AM, Clemens Wyss DEV clemens

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-05-18 Thread Clemens Wyss DEV
://blog.mikemccandless.com On Wed, May 14, 2014 at 2:16 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: Tackled this down a little bit more: Lucene40LiveDocsFormat#readLiveDocs calls IndexFileNames#fileNameForGeneration If I get this right, param 'gen' seems to be -1. Gen is being gathered from

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-05-18 Thread Clemens Wyss DEV
2014 16:51 An: Lucene Users Betreff: Re: [lucene 4.6] NPE when calling IndexReader#openIfChanged But what is the output of java -fullversion? Mike McCandless http://blog.mikemccandless.com On Sun, May 18, 2014 at 5:24 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: What java version? We

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-05-18 Thread Clemens Wyss DEV
a concurrency/timing issue? -Ursprüngliche Nachricht- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Montag, 19. Mai 2014 07:37 An: java-user@lucene.apache.org Betreff: AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged Sorry for being imprecise java version 1.6.0_26 Java

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-05-15 Thread Clemens Wyss DEV
Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Dienstag, 13. Mai 2014 18:23 An: java-user@lucene.apache.org Betreff: [lucene 4.6] NPE when calling IndexReader#openIfChanged I am facing the following stacktrace: java.lang.NullPointerException: null at java.io.File.init(File.java

AW: Issue with Lucene 3.6.1 and MMapDirectory

2014-05-14 Thread Clemens Wyss DEV
But if I close if given that it is share by multiple threads I will need to check each time before doing the search if IndexReader is still open correct? You can make use of IndexReader#incRef/#decRef , i.e. ir.incRef(); try { Or maybe SearcherManager

AW: Issue with Lucene 3.6.1 and MMapDirectory

2014-05-14 Thread Clemens Wyss DEV
Not closing an IndexReader most probably (to say the least) results in a mem-leak - OOM But if I close if given that it is share by multiple threads I will need to check each time before doing the search if IndexReader is still open correct? You can make use of IndexReader#incRef/#decRef ,

[lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-05-13 Thread Clemens Wyss DEV
I am facing the following stacktrace: java.lang.NullPointerException: null at java.io.File.init(File.java:305) ~[na:1.6.0_26] at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:80) ~[lucene-core.jar:4.6.0 1543363 - simon - 2013-11-19 11:05:50]

porting a cutsom Analyzer from 3.6 - 4.0

2012-12-09 Thread Clemens Wyss DEV
I have a CustomAnalyzer which overrides public final TokenStream tokenStream ( String fieldName, Reader reader ): @Override public final TokenStream tokenStream ( String fieldName, Reader reader ) { boolean fieldRequiresExactMatching = IndexManager.getInstance().isExactMatchField( fieldName );

Lucene (4.0), junit, failed to delete _0_nrm.cfs

2012-12-09 Thread Clemens Wyss DEV
I am (also) running lucene unit tests. In the teardown-method(@After) I (try to) delete the complete directory-folder. Unfortunately this does not always work. If not, the file _0_nrm.cfs (or _0.fdx) is the first to cause problems, i.e. is being locked... I do explicitly close the

AW: Lucene (4.0), junit, failed to delete _0_nrm.cfs

2012-12-09 Thread Clemens Wyss DEV
@lucene.apache.org Betreff: Re: Lucene (4.0), junit, failed to delete _0_nrm.cfs Can you post the source code for your test case? Mike McCandless http://blog.mikemccandless.com On Sun, Dec 9, 2012 at 11:45 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: I am (also) running lucene unit tests

AW: Lucene (4.0), junit, failed to delete _0_nrm.cfs

2012-12-09 Thread Clemens Wyss DEV
, and extends LuceneTestCase, using newDirectory and so on. if you have files still open this will fail the test and give you a stacktrace of where you initially opened the file. On Sun, Dec 9, 2012 at 12:28 PM, Clemens Wyss DEV clemens...@mysign.chwrote: Hi Mike, unfortunately not. When I run

Alternative for WildcardQuery with leading *

2012-12-07 Thread Clemens Wyss DEV
In order to provide suggestions our query also includes a WildcardQuery with a leading *, which, of course, has a HUGE performance impact :-( E.g. Say we have indexed vacancyplan, then if a user typed plan he should also be offered vacancyplan ... How can this feature be implemented without

AW: Alternative for WildcardQuery with leading *

2012-12-07 Thread Clemens Wyss DEV
to nalp* :). You can also index the suffixes of words, e.g. vacancyplan, acancyplan, cancyplan and so forth, and then convert the query *plan to plan. Note that it increases the lexicon ! Shai On Fri, Dec 7, 2012 at 11:16 AM, Clemens Wyss DEV clemens...@mysign.chwrote: In order to provide