Re: Limitations of StempelStemmer

2019-09-24 Thread Martin Grigorov
Hi, On Tue, Sep 10, 2019, 22:31 Maciej Gawinecki wrote: > Hi, > > I have just checked out the latest version of Lucene from Git master > branch. > > I have tried to stem a few words using StempelStemmer for Polish. > However, it looks it cannot handle some words properly, e.g. > > joyce -> ąć >

Re: AlphaNumeric analyzer/tokenizer

2019-08-19 Thread Martin Grigorov
Hi, On Mon, Aug 19, 2019 at 9:31 AM Uwe Schindler wrote: > You already got many responses. Check you inbox. > "many" made me think that I've also missed something. https://markmail.org/message/ohv5qcvxilj3n3fb > > Uwe > > Am August 19, 2019 6:23:20 AM UTC schrieb Abhishek Chauhan < >

Re: How groupingSearch specifies SortedNumericDocValuesField

2019-05-14 Thread Martin Grigorov
Hi, On Tue, May 14, 2019 at 8:28 PM 顿顿 wrote: > When I use groupingSearch specified as SortedNumericDocValuesField, > I got an "unexpected docvalues type NUMERIC for field 'id' > (expected=SORTED)" Exception. > > My code is as follows: > String indexPath = "tmp/grouping"; > Analyzer

Re: Format of Wikipedia Index

2018-01-22 Thread Will Martin
From the javadoc for DocMaker: * *doc.stored* - specifies whether fields should be stored (default *false*). * *doc.body.stored* - specifies whether the body field should be stored (default = *doc.stored*). So ootb you won't get content stored. Does this help? regards -will On

Re: Explain Scoring function in LMJelinekMercerSimilarity Class

2016-12-20 Thread Will Martin
https://doi.org/10.3115/981574.981579 On 12/20/2016 12:21 PM, Dwaipayan Roy wrote: Hello, Can anyone help me understand the scoring function in the LMJelinekMercerSimilarity class? The scoring function in LMJelinekMercerSimilarity is shown below:

Re: Multi-field IDF

2016-11-18 Thread Will Martin
In this work, we aim to improve the fi eld weighting for structured doc- ument retrieval. We fi rst introduce the notion of fi eld relevance as the generalization of fi eld weights, and discuss how it can be estimated using relevant documents, which eff ectively implements relevance feedback for

Re: Multi-field IDF

2016-11-17 Thread Will Martin
are you familiar with pivoted normalized document length practice or theory? or croft's recent work on relevance algorithms accounting for structured field presence? On 11/17/2016 5:20 PM, Nicolás Lichtmaier wrote: That depends on what you want. In this case I want to use a discrimination

Re: Searching in a bitMask

2016-08-27 Thread will martin
hi aren’t we waltzing terribly close to the use of a bit vector in your field caches? there’s no reason to not filter longword operations on a cache if alignment is consistent across multiple caches just be sure to abstract your operations away from individual bits….imo -will > On Aug 27,

Lucene 5.4 - scoring divided by number of search terms?

2016-03-13 Thread Martin Krämer
I have a simple setup with IndexSearcher, QueryParser, SimpleAnalyzer. Running some queries I recognised that a query with more than one term returns a different ScoreDoc[i].score than shown in explain query statement. Apparently it is the score shown in explain divided by the number of search

Re: how to backup index files with Replicator

2016-01-23 Thread will martin
Hi Dancer: Found this thread with good info that may be irrelevant to your scenario but, this in particular struck me writer.waitForMerges(); writer.commit(); replicator. replicate(new IndexRevision(writer)); writer.close(); — even though writer.close()

Re: SolrIndexSearcher throws Misleading Error Message When timeAllowed is Specified.

2016-01-08 Thread will martin
Please read the javadoc for System.nanoTime(). I won’t bore you with the details about how computer clocks work. > On Jan 8, 2016, at 4:14 AM, Vishnu Mishra wrote: > > I am using Solr 5.3.1 and we are facing OutOfMemory exception while doing > some complex wildcard and

Re: Any lucene query sorts docs by Hamming distance?

2015-12-24 Thread will martin
s say 3, and sort them from distance 0 to 3. > > 2015-12-22 21:42 GMT+08:00 will martin <wmartin...@gmail.com>: > >> Yonghui: >> >> Do you mean sort, rank or score? >> >> Thanks, >> Will >> >> >> >>> On Dec 22,

Re: range query highlighting

2015-12-23 Thread will martin
Todd: "This trick just converts the multi term queries like PrefixQuery or RangeQuery to boolean query by expanding the terms using index reader." http://stackoverflow.com/questions/7662829/lucene-net-range-queries-highlighting beware cost. (my comment) g’luck will > On Dec 23, 2015, at

Re: Any lucene query sorts docs by Hamming distance?

2015-12-22 Thread will martin
Yonghui: Do you mean sort, rank or score? Thanks, Will > On Dec 22, 2015, at 4:02 AM, Yonghui Zhao wrote: > > Hi, > > Is there any query can sort docs by hamming distance if field values are > same length, > > Seems fuzzy query only works on edit distance.

Re: Jensen–Shannon divergence

2015-12-14 Thread will martin
cool list. Thanks Uwe. Opportunities to gain competitive advantage in selected domains. > On Dec 14, 2015, at 6:02 PM, Uwe Schindler wrote: > > Hi, > > Next to BM25 and TF-IDF, Lucene also privides many more similarity > implementations: > >

Re: Jensen–Shannon divergence

2015-12-13 Thread will martin
expand your due diligence beyond wikipedia: i.e. http://ciir.cs.umass.edu/pubfiles/ir-464.pdf > On Dec 13, 2015, at 8:30 AM, Shay Hummel wrote: > > LMDiricletbut its feasibilit

Re: Jensen–Shannon divergence

2015-12-13 Thread will martin
g'luck > On Dec 13, 2015, at 10:55 AM, Shay Hummel <shay.hum...@gmail.com> wrote: > > Hi > > I am sorry but I didn't understand your answer. Can you please elaborate? > > Shay > > On Sun, Dec 13, 2015 at 3:41 PM will martin <wmartin...@gmail.com> wrote:

Few questions about updateDocValues methods

2015-11-17 Thread Gonzalo Emanuel San Martin
Hi, I have few questions related to updateDocValues methods and usages, it would be great if I can be helped 1) Is it possible to update a stored numeric field? I saw from the java-doc that updateNumericDocValue is to update NumericDocValues. The fields NumericDocValuesField aren't stored, if I

RE: Lucene 5 : any merge performance metrics compared to 4.x?

2015-09-30 Thread will martin
IndexReader.checkIntegrity. Mike McCandless http://blog.mikemccandless.com On Tue, Sep 29, 2015 at 9:00 PM, will martin <wmartin...@gmail.com> wrote: > Ok So I'm a little confused: > > The 4.10 JavaDoc for LiveIndexWriterConfig supports volatile access on > a flag to setChe

RE: Lucene 5 : any merge performance metrics compared to 4.x?

2015-09-29 Thread will martin
So, if its new, it adds to pre-existing time? So it is a cost that needs to be understood I think. And, I'm really curious, what happens to the result of the post merge checkIntegrity IFF (if and only if) there was corruption pre-merge: I mean if you let it merge anyway could you get a

RE: Lucene 5 : any merge performance metrics compared to 4.x?

2015-09-29 Thread will martin
ted a check step once the index is in its final state to ensure that it is OK. So, since we want to do the check post-merge, is there a way to disable the check during merge so we don't have to do two checks? Thanks! Jim From: will martin <w

RE: Lucene 5 : any merge performance metrics compared to 4.x?

2015-09-29 Thread will martin
system. The file system is EMC Isilon via NFS. Jim From: will martin <wmartin...@gmail.com> Sent: 29 September 2015 14:29 To: java-user@lucene.apache.org Subject: RE: Lucene 5 : any merge performance metrics compared to 4.x? This sounds robust. Is the

RE: Solr java.lang.OutOfMemoryError: Java heap space

2015-09-28 Thread will martin
http://opensourceconnections.com/blog/2014/07/13/reindexing-collections-with-solrs-cursor-support/ -Original Message- From: Ajinkya Kale [mailto:kaleajin...@gmail.com] Sent: Monday, September 28, 2015 2:46 PM To: solr-u...@lucene.apache.org; java-user@lucene.apache.org Subject: Solr

RE: hello,I have a problem about lucene,please help me to explain ,thank you

2015-09-22 Thread will martin
Hi: Would you mind doing websearch and cataloging the relevant pages into a primer? Thx, Will -Original Message- From: 王建军 [mailto:jianjun200...@163.com] Sent: Tuesday, September 22, 2015 4:02 AM To: java-user@lucene.apache.org Subject: hello,I have a problem about lucene,please help me

Upgrading Lucene from 3.5 to 4.10 - how to handle Java API changes

2015-01-11 Thread Martin Wunderlich
: TermPositionVector termVector = (TermPositionVector) reader.getTermFreqVector(...); (reader is of Type IndexReader) I would appreciate any help with these issues. Thanks a lot in advance. Cheers, Martin PS: FYI, I have posted the same question on Stackoverflow: http://stackoverflow.com/questions

Re: Upgrading Lucene from 3.5 to 4.10 - how to handle Java API changes

2015-01-11 Thread Martin Wunderlich
, I guess. Cheers, Martin Am 11.01.2015 um 11:05 schrieb Uwe Schindler u...@thetaphi.de: Hi, First, there is also a migrate guide next to the changes log: http://lucene.apache.org/core/4_10_3/MIGRATE.html 1. If you implement analyzer, you have to override createComponents

ISORC 2015 - Deadline Extension: 28/12/2014

2014-12-12 Thread Martin Schoeberl
: -- Anirudda Gokhale, Vanderbilt University, USA Parthasarathi Roop, University of Auckland, New Zealand Paul Townend, University of Leeds, United Kingdom Program Co-Chairs: -- Martin Schoeberl, Technical University of Denmark, Denmark Chunming Hu, Beihang

CfP: ISORC 2015 - IEEE International Symposium On Real-Time Computing

2014-12-05 Thread Martin Schoeberl
, University of Leeds, United Kingdom Program Co-Chairs: -- Martin Schoeberl, Technical University of Denmark, Denmark Chunming Hu, Beihang University, China Workshop Chair: --- Marco Aurelio Wehrmeister, Federal Univ. Technology - Parana, Brazil Program Committee

CfP: ISORC 2015 - IEEE International Symposium On Real-Time Computing

2014-11-19 Thread Martin Schoeberl
, University of Leeds, United Kingdom Program Co-Chairs: -- Martin Schoeberl, Technical University of Denmark, Denmark Chunming Hu, Beihang University, China Workshop Chair: --- Marco Aurelio Wehrmeister, Federal Univ. Technology - Parana, Brazil Program Committee

RE: How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2

2014-11-11 Thread Martin O'Shea
insensitive (there is a boolean to do this): StopFilter(boolean enablePositionIncrements, TokenStream input, Set? stopWords, boolean ignoreCase) Uwe Martin O'Shea. -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: 10 Nov 2014 14 06 To: java-user@lucene.apache.org

RE: How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2

2014-11-11 Thread Martin O'Shea
Ahmet, Yes that is quite true. But as this is only a proof of concept application, I'm prepared for things to be 'imperfect'. Martin O'Shea. -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] Sent: 11 Nov 2014 18 26 To: java-user@lucene.apache.org Subject: Re

How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2

2014-11-10 Thread Martin O'Shea
I realise that 3.0.2 is an old version of Lucene but if I have Java code as follows: int nGramLength = 3; SetString stopWords = new SetString(); stopwords.add(the); stopwords.add(and); ... SnowballAnalyzer snowballAnalyzer = new SnowballAnalyzer(Version.LUCENE_30, English, stopWords);

RE: How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2

2014-11-10 Thread Martin O'Shea
,...); stopFilter = new StopFilter(standardFilter,...); snowballFilter = new SnowballFilter(stopFilter,...); But ignore LowerCaseFilter. Does this make sense? Thanks Martin O'Shea. -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: 10 Nov 2014 14 06 To: java-user

RE: How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2

2014-11-10 Thread Martin O'Shea
? stopWords, boolean ignoreCase) Uwe Martin O'Shea. -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: 10 Nov 2014 14 06 To: java-user@lucene.apache.org Subject: RE: How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2 Hi, In general, you

RE: A really hairy token graph case

2014-10-24 Thread Will Martin
HI Benson: This is the case with n-gramming (though you have a more complicated start chooser than most I imagine). Does that help get your ideas unblocked? Will -Original Message- From: Benson Margulies [mailto:bimargul...@gmail.com] Sent: Friday, October 24, 2014 4:43 PM To:

RE: A really hairy token graph case

2014-10-24 Thread Will Martin
comp0-1 PI 0 comp1-1 PI 0 comp0-N compM-N That is, group all the first-components, and all the second-components. But now the bits and pieces of the compounds are interspersed. Maybe that's OK. On Fri, Oct 24, 2014 at 5:44 PM, Will Martin wmartin

Re: SpanTermQuery getSpans

2014-04-02 Thread Martin Líška
Gregory, that was indeed my problem. Thank you very much for your support. Martin This is a reply to http://mail-archives.apache.org/mod_mbox/lucene-java-user/201404.mbox/%3CCAASL1-8jRbEG%3DLi96eDLY-Pr_zwev6vk4vk4BW_ryKF1Dnb4KA%40mail.gmail.com%3E On 1 April 2014 23:52, Martin Líška djmatoli

SpanTermQuery getSpans

2014-04-01 Thread Martin Líška
Dear all, I'm experiencing troubles with SpanTermQuery.getSpans(AtomicReaderContext context, Bits acceptDocs, MapTerm,TermContext termContexts) method in version 4.6. I want to use it to retrieve payloads of matched spans. First, I search the index with IndexSearcher.search(query, limit) and I

NPE while decrement ref count

2012-11-12 Thread Martin Sachs
) at org.apache.lucene.index.SegmentReader.doClose(SegmentReader.java:394) at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:222) at org.apache.lucene.index.DirectoryReader.doClose(DirectoryReader.java:904) at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:222) Martin -- ** Dipl. Inform

Java HotSpot problem with search and 64-bit JVM

2012-11-12 Thread Martin Sachs
yet. martin -- ** Dipl. Inform. Martin Sachs ** Senior Software-Developer / Software-Architect T +49 (30) 443 50 99 - 33 F +49 (30) 443 50 99 - 99 E martin.sa...@artnology.com Google+: martin.sachs.artnol...@gmail.com skype: ms ** artnology GmbH A Milastraße 4 / D-10437 Berlin T +49

Re: NPE while decrement ref count

2012-11-12 Thread Martin Sachs
oh yes i missed the version: I'm using lucene 3.6.1 Martin Am 12.11.2012 09:40, schrieb Uwe Schindler: Which Lucene version? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Martin Sachs

Re: NPE while decrement ref count

2012-11-12 Thread Martin Sachs
download the newest oracle version and try it. Also I just enabled assertions in JVM. I have to wait for occurrence. martin Am 12.11.2012 09:56, schrieb Uwe Schindler: Hi, I opened the code, the NPE occurs here: if (bytes != null) { assert bytesRef != null

RE: Using stop words with snowball analyzer and shingle filter

2012-09-20 Thread Martin O'Shea
Thanks for the responses. They've given me much food for thought. -Original Message- From: Steven A Rowe [mailto:sar...@syr.edu] Sent: 20 Sep 2012 02 19 To: java-user@lucene.apache.org Subject: RE: Using stop words with snowball analyzer and shingle filter Hi Martin, SnowballAnalyzer

RE: Using a Lucene ShingleFilter to extract frequencies of bigrams in Lucene

2012-09-06 Thread Martin O'Shea
: 05 Sep 2012 01 53 To: java-user@lucene.apache.org Subject: Re: Using a Lucene ShingleFilter to extract frequencies of bigrams in Lucene On Tue, Sep 4, 2012 at 12:37 PM, Martin O'Shea app...@dsl.pipex.com wrote: Does anyone know if this can be used in conjunction with other analyzers to return

Using a Lucene ShingleFilter to extract frequencies of bigrams in Lucene

2012-09-04 Thread Martin O'Shea
If a Lucene ShingleFilter can be used to tokenize a string into shingles, or ngrams, of different sizes, e.g.: please divide this sentence into shingles Becomes: shingles please divide, divide this, this sentence, sentence into, and into shingles Does anyone know if this

JTRES 2012 Call for Paper

2012-02-21 Thread Martin Schoeberl
, 2012 * Camera Ready Paper Due: August 20, 2012 * Workshop: October 24-26, 2012 Program Chair: -- Andy Wellings, University of York Workshop Chair: -- Martin Schoeberl, Technical University of Denmark Program Committee

JTRES 2011 Call for Papers

2011-04-25 Thread Martin Schoeberl
. Leavens, University of Central Florida Doug Locke, LC Systems Services Kelvin Nilsen, Aonix Marek Prochazka, European Space Agency Anders Ravn, Aalborg University Corrado Santoro, University of Catania Martin Schoeberl, Technical University of Denmark Fridtjof Siebert, Aicas

Combining analyzers in Lucene

2011-03-05 Thread Martin O'Shea
Hello I have a situation where I'm using two methods in a Java class to implement a StandardAnalyzer in Lucene to index text strings and return their word frequencies as follows: public void indexText(String suffix, boolean includeStopWords) { StandardAnalyzer analyzer = null;

Use of hyphens in StandardAnalyzer

2010-10-24 Thread Martin O'Shea
\Lawton-Browne\ Lucene); And single quotes but without success. Thanks Martin O'Shea.

RE: Use of hyphens in StandardAnalyzer

2010-10-24 Thread Martin O'Shea
in StandardAnalyzer Hi Martin, StandardTokenizer and -Analyzer have been changed, as of future version 3.1 (the next release) to support the Unicode segmentation rules in UAX#29. My (untested) guess is that your hyphenated word will be kept as a single token if you set the version to 3.1 or higher

FW: Use of hyphens in StandardAnalyzer

2010-10-24 Thread Martin O'Shea
: Use of hyphens in StandardAnalyzer Hi Martin, StandardTokenizer and -Analyzer have been changed, as of future version 3.1 (the next release) to support the Unicode segmentation rules in UAX#29. My (untested) guess is that your hyphenated word will be kept as a single token if you set

Using a TermFreqVector to get counts of all words in a document

2010-10-20 Thread Martin O'Shea
dummies. Thanks Martin O'Shea.

RE: Using a TermFreqVector to get counts of all words in a document

2010-10-20 Thread Martin O'Shea
://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Martin O'Shea [mailto:app...@dsl.pipex.com] Sent: Wednesday, October 20, 2010 8:23 PM To: java-user@lucene.apache.org Subject: Using a TermFreqVector to get counts of all words in a document Hello I am trying

RE: Using a TermFreqVector to get counts of all words in a document

2010-10-20 Thread Martin O'Shea
PM, Martin O'Shea wrote: Uwe Thanks - I figured that bit out. I'm a Lucene 'newbie'. What I would like to know though is if it is practical to search a single document of one field simply by doing this: IndexReader trd = IndexReader.open(index); TermFreqVector tfv

RE: Use of Lucene to store data from RSS feeds

2010-10-15 Thread Martin O'Shea
to calculate word frequencies. But can I do this in Lucene to this degree of granularity at all? If so, would each feed form a Lucene document or would each 'row' from the database table form one? Can anyone advise? Thanks Martin O'Shea

on-the-fly filters from docID lists

2010-07-21 Thread Martin J
query the main index with content:cars but only allow the docIDs that came back to be part of the response. The list of docIDs can near the hundreds of thousands. What should I be looking at to implement such a feature? Thank you Martin

A full-text tokenizer for the NGramTokenFilter

2010-07-17 Thread Martin
can't use something like the StandardTokenizer is that ngrams should really include spaces and pretty much every tokenizer gets rid of them. Thank you very much in advance for any suggestions. Regards, Martin - To unsubscribe

Re: A full-text tokenizer for the NGramTokenFilter

2010-07-17 Thread Martin
Ahh, I knew I saw it somewhere, then I lost it again... :) I guess the name is not quite intuitive, but anyway thanks a lot! and I'm just wondering if there is a tokenizer that would return me the whole text. KeywordTokenizer does this.

Re: Help understanding fieldNorm

2009-10-05 Thread Ole-Martin Mørk
); //is regenerated from the url value input.removeField(score); server.add(input); } -- Ole-Martin Mørk On Mon, Oct 5, 2009 at 11:15 AM, Simon Willnauer simon.willna...@googlemail.com wrote: Did you change any boost values for URL field or document while reindexing the document by any

Re: Help understanding fieldNorm

2009-10-05 Thread Ole-Martin Mørk
. It might be that the index was really small the first time the document was added. Could that affect the fieldNorm value? -- Ole-Martin Mørk On Mon, Oct 5, 2009 at 11:39 AM, Simon Willnauer simon.willna...@googlemail.com wrote: On Mon, Oct 5, 2009 at 11:31 AM, Ole-Martin Mørk olemar

Re: Help understanding fieldNorm

2009-10-05 Thread Ole-Martin Mørk
I did not change the url. The length of the title was increased by 1, from 41 to 42 characters. -- Ole-Martin Mørk On Mon, Oct 5, 2009 at 12:39 PM, Karl Wettin karl.wet...@gmail.com wrote: sorry, I ment title. 5 okt 2009 kl. 11.57 skrev Simon Willnauer: Ole-Martin, did you mention

Re: Help understanding fieldNorm

2009-10-05 Thread Ole-Martin Mørk
That might be true. The document boost did not change, but maybe the field boost changed. Is it possible to retrieve the field boost from solr? -- Ole-Martin Mørk On Mon, Oct 5, 2009 at 2:01 PM, Simon Willnauer simon.willna...@googlemail.com wrote: I still guess that the document has been

Re: Help understanding fieldNorm

2009-10-05 Thread Ole-Martin Mørk
Thanks. It might be that Nutch sets some values. I am not able to find anything in the config files though. We are using nutch' solrindex. -- Ole-Martin Mørk http://twitter.com/olemartin http://flickr.com/olemartin On Mon, Oct 5, 2009 at 2:28 PM, Simon Willnauer simon.willna...@googlemail.com

Results by unique id's

2008-08-12 Thread Martin vWysiecki
Is this possible? Thank you -- mit freundlichen Grüßen Martin von Wysiecki software development aspedia GmbH Roßlauer Weg 5 D-68309 Mannheim Telefon +49 (0) 621 - 71600 33 Telefax +49 (0) 621 - 71600 10 [EMAIL PROTECTED] Geschäftsführung: Steffen Künster, Christoph Goldschmitt Amtsgericht Mannheim

Re: Results by unique id's

2008-08-12 Thread Martin vWysiecki
://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! On Tue, Aug 12, 2008 at 6:05 AM, Martin vWysiecki [EMAIL PROTECTED]wrote: Hello, thanks for help in advance. my example docs

Re: Term Based Meta Data

2008-08-11 Thread Martin Owens
the text files stored on special storage boxes mounted to the webservers and they're using directly and c) It didn't seem worth it. Thoughts? So can I use the TermPositions object without the stored text? Best Regards, Martin Owens

Unique list of keywords

2008-08-08 Thread Martin vWysiecki
Hello, i have very much data, about 20GB of text, and need a unique list of keywords based on my text in all docs from the whole index. Some ideas? THX Martin -- mit freundlichen Grüßen Martin von Wysiecki software development aspedia GmbH Roßlauer Weg 5 D-68309 Mannheim Telefon +49

Re: Term Based Meta Data

2008-08-08 Thread Martin Owens
of TermPositions because of that data is available without storing the text in the index. Is it possible to translate code which uses TermPositions to using TermPositionsVector with regards to payloads? Best Regards, Martin Owens On Tue, 2008-08-05 at 11:14 -0600, Tricia Williams wrote: Hi Martin

Term Based Meta Data

2008-08-05 Thread Martin Owens
, is it possible to store the data alongside the terms in lucene and then recall them when doing certain searches? and how much custom code needs to be written to do it? Best Regards, Martin Owens - To unsubscribe, e-mail: [EMAIL

Re: Term Based Meta Data

2008-08-05 Thread Martin Owens
Thank you very much, I'm using Solr so it's very relivent to me. Even though the indexing is being done by a smaller RMI method (since Solr doesn't support streaming of very large files and has term limits) but all the searching is done through solr. Thanks again, Best Regards, Martin Owens

Get BestFrequentKeywords

2008-08-04 Thread Martin vWysiecki
: tyres, dealer tyres 3x dealer 2x How can i do that? THX -- mit freundlichen Grüßen Martin von Wysiecki software development aspedia GmbH Roßlauer Weg 5 D-68309 Mannheim Telefon +49 (0) 621 - 71600 33 Telefax +49 (0) 621 - 71600 10 [EMAIL PROTECTED] Geschäftsführung: Steffen Künster, Christoph

Re: Weird operator precedence with default operator AND

2007-10-11 Thread Martin Dietze
out blacklisted facettes and then parse them on to solr using solrj. Maybe I am missing out on something obvious, and there's an entirely simple way to accomplish this? Cheers, Martin -- --- / http://herbert.the-little-red-haired-girl.org / - =+= Yoda of Borg I am

Re: Weird operator precedence with default operator AND

2007-10-11 Thread Martin Dietze
in a SpanQuery `+spanNear([foo, bar], 0, true)' (I may not understand the concept here). Cheers, Martin -- --- / http://herbert.the-little-red-haired-girl.org / - =+= Who the fsck is General Failure, and why is he reading my disk

Re: Weird operator precedence with default operator AND

2007-10-10 Thread Martin Dietze
QueryParser, and I found it produces the same output, however the search queries are still handled correctly, i.e. the results I get indicate that deep down inside it seems to get it right in the end. Cheers, Martin -- --- / http://herbert.the-little-red-haired-girl.org

Re: Weird operator precedence with default operator AND

2007-10-10 Thread Martin Dietze
this out right now! Thannk you! Martin -- --- / http://herbert.the-little-red-haired-girl.org / - =+= Die Freiheit ist uns ein schoenes Weib. Sie hat einen Ober- und Unterleib. - To unsubscribe, e-mail

Re: Weird operator precedence with default operator AND

2007-10-10 Thread Martin Dietze
Mark, On Wed, October 10, 2007, Martin Dietze wrote: Qsol: myhardshadow.com/qsol (A query parser I wrote that has fully customizable precedence support - don't be fooled by the stale website...I am actually working on version 2 as i have time) That sounds promising, I will check

Weird operator precedence with default operator AND

2007-10-09 Thread Martin Dietze
, but what I get with the default operator set to AND is completely incorrect. I've seen this behaviour with both version 2.1.0 and 2.2.0. Any hints? Cheers, Martin -- --- / http://herbert.the-little-red-haired-girl.org / - =+= I got it good, I got it bad. I got the sweetest

All keys for a field

2007-06-21 Thread Martin Spamer
I need to return all of the keys for a certain field, essentially fieldName:*.This causes a ParseException / lexical error Encountered: * (42), after : I understand why this fails, WildCard prevent this to keep the results manageble. In my case the number of results will always be

How to view deleted documents and undelete(int docID)

2007-06-11 Thread Martin Kobele
? (Luke has a reconstruct edit button which does not seem to work on deleted documents, if it ever was designed to undelete a document). Thank you! Regards, Martin pgpBd0XCCpK5b.pgp Description: PGP signature

Obtain Lock file timeout during deleteDocument()

2007-05-30 Thread Martin Kobele
. After I deleted a document, the write.lock file is still there, and directoryOwner is still true. Maybe knowing more about this will help me to find out why I get the exception Lock obtain timed out after a while and after several successful document deletions. Thank you! Regards, Martin

Re: Obtain Lock file timeout during deleteDocument()

2007-05-30 Thread Martin Kobele
On Wednesday 30 May 2007 11:49:41 Michael McCandless wrote: Martin Kobele [EMAIL PROTECTED] wrote: I was trying to find an answer to this. I call IndexReader.deleteDocument() for the _first_ time. If my index has several segments, my IndexReader is actually a MultiReader. Therefore

Re: Obtain Lock file timeout during deleteDocument()

2007-05-30 Thread Martin Kobele
On Wednesday 30 May 2007 11:53:09 Martin Kobele wrote: On Wednesday 30 May 2007 11:49:41 Michael McCandless wrote: You are only using a single instance of IndexReader, right? If for example you try to make a new instance of IndexReader and then call deleteDocument on that new one, then you

phrases containing escaped quotes

2007-05-15 Thread Martin Kobele
Hi, I tried to parse the following phrase: foo \bar\ I get the following exception: org.apache.lucene.queryParser.ParseException: Lexical error at line 1, column 18. Encountered: EOF after : \) Am I mistaken that foo \bar\ is a valid phrase? Thanks! Martin pgp6qEn6ntvUi.pgp Description

Re: phrases containing escaped quotes

2007-05-15 Thread Martin Kobele
thank you! I was indeed using lucene 2.0 and it works very nicely with 2.1 thanks! Martin On Tuesday 15 May 2007 09:59:42 Michael Busch wrote: Martin Kobele wrote: Hi, I tried to parse the following phrase: foo \bar\ I get the following exception

Re: Spelt, for better spelling correction

2007-03-22 Thread Martin Haye
make sure the contribution includes an index-to-dictionary API, and thank you very much for the input. --Martin On 3/21/07, Otis Gospodnetic [EMAIL PROTECTED] wrote: Martin, This sounds like the spellchecker dictionary needs to be built in parallel with the main Lucene index. Is it possible

Re: Spelt, for better spelling correction

2007-03-21 Thread Martin Haye
applications that are continuously adding things to an index. Happily, it's not as important to keep the spelling dictionary absolutely up to date, so it would be fine to queue words over several index runs, and refresh the dictionary less often. --Martin On 3/20/07, Yonik Seeley [EMAIL PROTECTED

Spelt, for better spelling correction

2007-03-20 Thread Martin Haye
... There is already a standalone test program that people can try out, and we're interested in feedback. If you're interested in discussing, testing, or previewing, consider joining the Google group: http://groups.google.com/group/spelt/ --Martin

Re: similar contrib in lucene 2.1.0

2007-03-02 Thread Martin Braun
... hth, martin Cheers Hans Lund - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail

autocomplete with multiple terms

2007-02-22 Thread Martin Braun
way? I am not sure if we get enough queries for a search over an index base on the user-queries. the only thing I have found in the list before concerning this subject is http://issues.apache.org/jira/browse/LUCENE-625, but I'm not sure if it does the things I want. tia, martin

boosting instead of sorting WAS: to boost or not to boost

2006-12-21 Thread Martin Braun
by score AND by year of publication. But for performance reasons I want to avoid this sorting at query-time by boosting at index time. Is that possible? thanks, Martin -- Universitaetsbibliothek Heidelberg Tel: +49 6221 54-2580 Ploeck 107-109, D-69117 Heidelberg Fax: +49 6221 54-2623

spnafirstquery and multiple field instances

2006-12-21 Thread Martin Braun
for the first token in that field of a document. Is there a way to do a SpanFirstQuery for each token? tia, martin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

to boost or not to boost

2006-12-20 Thread Martin Braun
= fieldNorm(field=AU, doc=0) Explain für 1043960: 1.6931472 = fieldWeight(AU:palandt in 1), product of: 1.0 = tf(termFreq(AU:palandt)=1) 1.6931472 = idf(docFreq=2) 1.0 = fieldNorm(field=AU, doc=1) so the older doc is better rated or with the same rank as the newer? any ideas? tia, martin

Re: Index XML file

2006-12-14 Thread Martin Braun
/developerworks/java/library/j-lucene/ regards, martin Thanks regards, Wooi Meng - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: how to search string with words

2006-11-21 Thread Martin Braun
( new SpanNearQuery(spanq_ar,1,true), spanq_ar.length); hth, martin Below r the codes that I wrote, please point me out where I have done wrong. readerA = IndexReader.open(DsConstant.indexDir); readerB = IndexReader.open

Search C++ with Solrs WordDelimiterFilter

2006-11-17 Thread Martin Braun
on doing this?). tia, martin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Best approach for exact Prefix Field Query

2006-11-16 Thread Martin Braun
to merge these two query-classes? tia, martin SpanFirstQuery = org.apache.lucene.search.spans.SpanFirstQuery SpanTermQuery = org.apache.lucene.search.spans.SpanTermQuery SpanQuery = org.apache.lucene.search.spans.SpanQuery SpanNearQuery = org.apache.lucene.search.spans.SpanNearQuery

Best approach for exact Prefix Field Query

2006-11-14 Thread Martin Braun
lucene function I have overseen :) with 2) I am worrying about performance, anybody have experiences with regex-queries? .. but same for 1) anybody already impolemented this already and could give some code samples / hints ? tia, martin

Re: Best approach for exact Prefix Field Query

2006-11-14 Thread Martin Braun
? tia, martin Erik On Nov 14, 2006, at 8:32 AM, Martin Braun wrote: hi, i would like to provide a exact PrefixField Search, i.e. a search for exactly the first words in a field. I think I can't use a PrefixQuery because it would find also substrings inside the field, e.g

Re: Update an existing index

2006-11-08 Thread Martin Braun
WATHELET Thomas schrieb: how to update a field in lucene? I think you'll have to delete the whole doc and add the doc with the new field to the index... hth, martin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional

Re: experiences with lingpipe

2006-11-02 Thread Martin Braun
with -Xms1024m -Xmx1024m. How many RAM will I need for the Model (I only have 2 GB of physical RAM, and lucene's also using some memory). Is there a rule of thumb to calculate the needed amount of memory of the model? thanks in advance, martin Tuning params dominate the performance space

  1   2   >