RE: Can an analyzer access other field's data during index time?

2023-05-03 Thread Wang, Guan
! Thank you for the tip! I am reading it's SimplePreAnalyzedParser as we speak! Best regards, Guan -Original Message- From: Mikhail Khludnev Sent: Wednesday, April 26, 2023 4:14 PM To: java-user@lucene.apache.org Subject: Re: Can an analyzer access other field's data during in

Re: Can an analyzer access other field's data during index time?

2023-04-26 Thread Mikhail Khludnev
lue() method. > > In a nutshell, I will need two parts to make this work: > > 1. a custom tokenizer/filter; > 2. a custom field; > > Let me know if there is any caveat... > > And thank you so much for guiding me through! > > Guan > > -Original Messag

RE: Can an analyzer access other field's data during index time?

2023-04-25 Thread Wang, Guan
stom field; Let me know if there is any caveat... And thank you so much for guiding me through! Guan -Original Message- From: Mikhail Khludnev Sent: Tuesday, April 25, 2023 4:40 AM To: java-user@lucene.apache.org Subject: Re: Can an analyzer access other field's data during index

Re: Can an analyzer access other field's data during index time?

2023-04-25 Thread Mikhail Khludnev
: Monday, April 24, 2023 4:56 PM > To: java-user@lucene.apache.org > Subject: Re: Can an analyzer access other field's data during index time? > > External Email - Use Caution > > Well.. maybe something like > > https://lucene.apache.org/core/8_5_1/analyzers-common/org

RE: Can an analyzer access other field's data during index time?

2023-04-24 Thread Wang, Guan
-user@lucene.apache.org Subject: Re: Can an analyzer access other field's data during index time? External Email - Use Caution Well.. maybe something like https://lucene.apache.org/core/8_5_1/analyzers-common/org/apache/lucene/analysis/miscellaneous/ConditionalTokenFilter.html ? On Mon, Apr 24,

Re: Can an analyzer access other field's data during index time?

2023-04-24 Thread Mikhail Khludnev
> Sent: Monday, April 24, 2023 4:20 PM > To: java-user@lucene.apache.org > Subject: Re: Can an analyzer access other field's data during index time? > > External Email - Use Caution > > Hello Guan. > It reminds me https://youtu.be/EkkzSLstSAE?t=1531 timecode. > I'm af

RE: Can an analyzer access other field's data during index time?

2023-04-24 Thread Wang, Guan
Khludnev Sent: Monday, April 24, 2023 4:20 PM To: java-user@lucene.apache.org Subject: Re: Can an analyzer access other field's data during index time? External Email - Use Caution Hello Guan. It reminds me https://youtu.be/EkkzSLstSAE?t=1531 timecode. I'm afraid it's quite far from t

Re: Can an analyzer access other field's data during index time?

2023-04-24 Thread Mikhail Khludnev
Hello Guan. It reminds me https://youtu.be/EkkzSLstSAE?t=1531 timecode. I'm afraid it's quite far from the existing codebase where the Field has no reference to enclosing Document. sigh. On Mon, Apr 24, 2023 at 6:00 PM Wang, Guan wrote: > Hi, > > I understand Lucene analyzer

Can an analyzer access other field's data during index time?

2023-04-24 Thread Wang, Guan
Hi, I understand Lucene analyzer is per field basis. But I wonder if it's even possible for an analyzer on field A to be able to access data in field B during the index process on any stage, saying CharFilter, Tokenizer or TokenFilter? I'd like to control the behavior of the indexi

TokenStream contract violation: close() call missing due to race condition in custom Analyzer

2021-08-13 Thread Edgar H
alyzer classes, which implement a couple of methods that add TokenFilters to process the tokens. The base class for the rest of implementations is the following one. public class StandardCustomAnalyzer implements CustomAnalyzer { private final Analyzer analyzer; public StandardCusto

Re: Lucene custom scoring / analyzer

2021-03-17 Thread Charlie Hull
I think you'll need a SpanQuery with the inOrder flag set: https://lucene.apache.org/core/8_8_1/core/org/apache/lucene/search/spans/SpanNearQuery.html Charlie On 17/03/2021 10:30, Vlad Smirnovskiy wrote: Hello! I`d like to do something like that: When I add a document and some text is going wi

Lucene custom scoring / analyzer

2021-03-17 Thread Vlad Smirnovskiy
Hello! I`d like to do something like that: When I add a document and some text is going with (e.g.) quotes it should mean that this text has to be exactly in the query. Better with an examples - text: green "blue apple" juice query : blue apple - result: hit. query : blue apple juice - result: h

Re: How can i specify a custom Analyzer for a Field of Document?

2019-12-09 Thread Mikhail Khludnev
You can check how SolrAnalyzer switches chains across fields. On Tue, Dec 10, 2019 at 9:41 AM 小鱼儿 wrote: > Directory indexDataDir = FSDirectory.open(Paths.get("index_data")); > Analyzer analyzer = MyLuceneAnalyzerFactory.newInstance(); > IndexWriterConfig iwc = new IndexWr

How can i specify a custom Analyzer for a Field of Document?

2019-12-09 Thread 小鱼儿
Directory indexDataDir = FSDirectory.open(Paths.get("index_data")); Analyzer analyzer = MyLuceneAnalyzerFactory.newInstance(); IndexWriterConfig iwc = new IndexWriterConfig(analyzer); iwc.setOpenMode(OpenMode.CREATE); iwc.setRAMBufferSizeMB(256.0); IndexWriter indexWriter = new I

Re: AlphaNumeric analyzer/tokenizer

2019-08-19 Thread Martin Grigorov
> > >> I can use StandardAnalyzer or WhitespaceAnalyzer but I want to > >tokenize on > >> underscores also > >> which these analyzers don't do. I have also looked at > >WordDelimiterFilter > >> which will split "axt1234" into "a

Re: AlphaNumeric analyzer/tokenizer

2019-08-18 Thread Uwe Schindler
zer but I want to >tokenize on >> underscores also >> which these analyzers don't do. I have also looked at >WordDelimiterFilter >> which will split "axt1234" into "axt" and "1234". However, using this >also, >> I cannot search for &q

Re: AlphaNumeric analyzer/tokenizer

2019-08-18 Thread Abhishek Chauhan
;1234". However, using this also, > I cannot search for "axt12" etc. > > Is there something like an Alphanumeric analyzer which would be very > similar to SimpleAnalzyer but in addition to letters it would also keep > digits in its tokens? I am willing contribute such an analyzer if one is > not available. > > Thanks and Regards, > Abhishek > > >

RE: AlphaNumeric analyzer/tokenizer

2019-08-16 Thread Uwe Schindler
Hi, The easiest is to use PatternTokenizer as part of your analyzer. It uses a regular expression to split words. Just use some regular expression that matches unicode ranges for numbers and digits. To build your Analyzer use the class CustomAnalyzer and its builder API to construct your own

AlphaNumeric analyzer/tokenizer

2019-08-16 Thread Abhishek Chauhan
xt1234" into "axt" and "1234". However, using this also, I cannot search for "axt12" etc. Is there something like an Alphanumeric analyzer which would be very similar to SimpleAnalzyer but in addition to letters it would also keep digits in its tokens? I am willing contribute such an analyzer if one is not available. Thanks and Regards, Abhishek

Use "CommonGramsFilterFactory" and "StopFilterFactory" in the query analyzer chain breaks phrase queries

2019-04-03 Thread JiaJun Zhu
pply "CommonGramsFilterFactory" and "StopFilterFactory" in the query analyzer for our solr environment, while this issue cause some query get empty result. The issue can be reproduce by the steps in LUCENE-7698 and just change the query string to "hello with an accent&quo

Re: analyzer context during search

2018-04-13 Thread Chris Tomlinson
there really should be an extension of the Analyzer api to include a generic argument of abstract class AnalyzerContext that could optionaly be used via IndexWriter snd IndexSearcher to supply useful context information from the caller to IndexWriter and IndexSearcher. This would require threading

Re: analyzer context during search

2018-04-12 Thread Michael Sokolov
s for different inputs. Doing this requires extra context when choosing the analyzer (our the token streams that it generates) as you say. See http://issues.apache.org/jira/browse/LUCENE-8240 for one idea of how to accomplish this. On Wed, Apr 11, 2018, 9:34 AM Chris Tomlinson wrote: > Hello,

analyzer context during search

2018-04-11 Thread Chris Tomlinson
. For Chinese, for example, we have an analyzer that creates a TokenStream of Pinyin with diacritics for any of the input encodings. Thus it is possible in some situations to retrieve documents originally input as zh-hans and so on. The same applies to the other languages. One objective is to

MultiFieldQueryParser over Analyzer

2018-01-22 Thread Chitra
Hi Team, I have a doubt on parsing a query using MultiFieldQueryParser over StandardAnalyzer. searchWord: abc.def_...@global-international.com while performing a search using the code, Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_40, new > StringRea

Re: Analyzer is not called upon executing addDocument()

2018-01-09 Thread Armins Stepanjans
acets). > > Uwe > > - > Uwe Schindler > Achterdiek 19, D-28357 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Armins Stepanjans [mailto:armins.bagr...@gmail.com] > > Sent: Tuesday, January 9, 2018 2:52 PM

RE: Analyzer is not called upon executing addDocument()

2018-01-09 Thread Uwe Schindler
nal Message- > From: Armins Stepanjans [mailto:armins.bagr...@gmail.com] > Sent: Tuesday, January 9, 2018 2:52 PM > To: java-user@lucene.apache.org > Subject: Analyzer is not called upon executing addDocument() > > Hi, > > When I create a document with multiple

Analyzer is not called upon executing addDocument()

2018-01-09 Thread Armins Stepanjans
Hi, When I create a document with multiple StringFields and add it to IndexWriter using addDocument(Document), the StringFields within the Document are not tokenized nor filtered according to Analyzer's specifications, however when I test my Analyzer, while looping through tokens by expli

RE: Extending Analyzer at runtime

2017-06-23 Thread Allison, Timothy B.
Head meet brick. Thank you, Uwe! -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, June 23, 2017 11:23 AM To: java-user@lucene.apache.org Subject: RE: Extending Analyzer at runtime Hi, Or just use CustomAnalyzer, shipped with Lucene since version 5.0. No

RE: Extending Analyzer at runtime

2017-06-23 Thread Uwe Schindler
re.org] > Sent: Friday, June 23, 2017 3:55 PM > To: java-user@lucene.apache.org; nb...@ebi.ac.uk > Subject: RE: Extending Analyzer at runtime > > I plagiarized Solr's org.apache.solr.analysis.TokenizerChain to read the > configuration from a json file: > > https://github.com/

Re: Extending Analyzer at runtime

2017-06-23 Thread nb...@ebi.ac.uk
Thanks Alan, I will take a look at it. Nicola -- Original message--From: Alan WoodwardDate: Fri, 23 Jun 2017 14:55To: java-user@lucene.apache.org;nb...@ebi.ac.uk;Cc: Subject:Re: Extending Analyzer at runtime Hi, You should be able to use AnalyzerWrapper for this, adding your TokenFilters

Re: Extending Analyzer at runtime

2017-06-23 Thread Alan Woodward
Hi, You should be able to use AnalyzerWrapper for this, adding your TokenFilters in wrapComponents(). Alan Woodward www.flax.co.uk > On 23 Jun 2017, at 14:33, Nicola Buso wrote: > > Hi, > > maybe it's a known question but I could not find and answer. > I need to base

RE: Extending Analyzer at runtime

2017-06-23 Thread Allison, Timothy B.
gramreaper just yet, but that might give you some ideas. -Original Message- From: Nicola Buso [mailto:nb...@ebi.ac.uk] Sent: Friday, June 23, 2017 9:34 AM To: java-user Subject: Extending Analyzer at runtime Hi, maybe it's a known question but I could not find and answer. I

Extending Analyzer at runtime

2017-06-23 Thread Nicola Buso
Hi, maybe it's a known question but I could not find and answer. I need to base an Analyzer on another Analyzer at runtime. I know that the Analyzer is a factory and I should really look at combine the Filters. I'm looking for a way to get the TokenStreamComponents from an analyzer

Re: email field - analyzed and not analyzed in single field using custom analyzer

2017-06-19 Thread Kumaran Ramasubramanian
imiterGraphFilter.java that passed > for me: > > - > public void testEmail() throws Exception { > final int flags = GENERATE_WORD_PARTS | GENERATE_NUMBER_PARTS | > SPLIT_ON_CASE_CHANGE | SPLIT_ON_NUMERICS | PRESERVE_ORIGINAL; > Analyzer a = new Analyzer() { > @Ove

Re: email field - analyzed and not analyzed in single field using custom analyzer

2017-06-15 Thread Steve Rowe
for me: - public void testEmail() throws Exception { final int flags = GENERATE_WORD_PARTS | GENERATE_NUMBER_PARTS | SPLIT_ON_CASE_CHANGE | SPLIT_ON_NUMERICS | PRESERVE_ORIGINAL; Analyzer a = new Analyzer() { @Override public TokenStreamComponents createComponents(String

email field - analyzed and not analyzed in single field using custom analyzer

2017-06-15 Thread Kumaran Ramasubramanian
Hi All, i want to index email fields as both analyzed and not analyzed using custom analyzer. for example, sm...@yahoo.com will.sm...@yahoo.com that is, indexing sm...@yahoo.com as single token as well as analyzed tokens in same email field... My existing custom analyzer, public class

Re: any analyzer will keep punctuation?

2017-03-08 Thread Ahmet Arslan
I didn't quite get your idea. I want to get an analyzer like standard analyzer but with punctuation customized. I think one way is customizing an analyzer with a customizer tokenizer like StandardTokenizer. In my tokenizer I will re-write StandardTokenizerImpl which seems a little complica

Re: Re: any analyzer will keep punctuation?

2017-03-07 Thread 380382...@qq.com
: java-user@lucene.apache.org 主题: Re: any analyzer will keep punctuation? Hi Ahmet, Thanks for your reply, but I didn't quite get your idea. I want to get an analyzer like standard analyzer but with punctuation customized. I think one way is customizing an analyzer with a customizer tokenizer

Re: any analyzer will keep punctuation?

2017-03-07 Thread Yonghui Zhao
Hi Ahmet, Thanks for your reply, but I didn't quite get your idea. I want to get an analyzer like standard analyzer but with punctuation customized. I think one way is customizing an analyzer with a customizer tokenizer like StandardTokenizer. In my tokenizer I will re-

Re: any analyzer will keep punctuation?

2017-03-06 Thread Ralph Soika
(" + searchphrase + ")"; in this case the value of the field singername will not be analyzed by the standard analyzer. On 06.03.2017 09:15, Yonghui Zhao wrote: Lucene standard anlyzer will remove almost all punctuation. In some cases, we want to keep some punctuation, for

Re: any analyzer will keep punctuation?

2017-03-06 Thread Ahmet Arslan
Hi Zhao, WhiteSpace tokeniser followed by a customised word delimiter filter factory would be solution. Please see types attribute of the word delimiter filter for customising characters. ahmet On Monday, March 6, 2017 12:22 PM, Yonghui Zhao wrote: Yes whitespace analyzer will keep

Re: any analyzer will keep punctuation?

2017-03-06 Thread Michael McCandless
keep into something that StandardTokenizer would not split on. Mike McCandless http://blog.mikemccandless.com On Mon, Mar 6, 2017 at 5:22 AM, Yonghui Zhao wrote: > Yes whitespace analyzer will keep punctuation, but it only breaks word by > space. > > > I didn’t explain my requ

Re: any analyzer will keep punctuation?

2017-03-06 Thread Yonghui Zhao
Yes whitespace analyzer will keep punctuation, but it only breaks word by space. I didn’t explain my requirement clearly. I want to an analyzer like standard analyzer but may keep some punctuation configured. 2017-03-06 18:03 GMT+08:00 Ahmet Arslan : > Hi, > > Whitespace analyser/

Re: any analyzer will keep punctuation?

2017-03-06 Thread Ahmet Arslan
punctuation. Is there any analyzer that we can customized punctuation to be removed? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

any analyzer will keep punctuation?

2017-03-06 Thread Yonghui Zhao
Lucene standard anlyzer will remove almost all punctuation. In some cases, we want to keep some punctuation, for example in music search, some singer name and album name could be a punctuation. Is there any analyzer that we can customized punctuation to be removed?

Re: Porting Analyzer from ver 4.8.1 to ver 6.4.1

2017-02-17 Thread Vincenzo D'Amore
Thank you, works like a charm.

RE: Porting Analyzer from ver 4.8.1 to ver 6.4.1

2017-02-16 Thread Uwe Schindler
handle the difference between first call and followup calls. No its quite easy separated: Create it in one step without a reader and then reuse it as often as you like by setting a Reader. This is done inside the final Analyzer logic that is hidden to you! Uwe - Uwe Schindler Achterdiek 19

Porting Analyzer from ver 4.8.1 to ver 6.4.1

2017-02-16 Thread Vincenzo D'Amore
Hi all, I'm porting few classes from 4.8.1 to the newer version of Lucene. I can't understand how to convert this code, I hope you can help me: public Analyzer creaAnalyzer() { return new Analyzer() { @Override protected TokenStreamComponents createComponents(String fieldNa

Re: Keyword analyzer will turn query to lowercase

2016-09-22 Thread Cam Bazz
wrote: > Hello, > > I am indexing userAgent fields found in apache logs. Indexing and querying > everything with > KeywordAnalyzer - But I found something strange: > > > IndexSearcher searcher = new IndexSearcher(reader); > Analyzer q_ana

Keyword analyzer will turn query to lowercase

2016-09-22 Thread Cam Bazz
Hello, I am indexing userAgent fields found in apache logs. Indexing and querying everything with KeywordAnalyzer - But I found something strange: IndexSearcher searcher = new IndexSearcher(reader); Analyzer q_analyzer = new KeywordAnalyzer(); QueryParser

migrating custom analyzer/tokenizer (3.6-> 6.x)

2016-09-08 Thread Dirk Rothe
Hi, I'm trying to migrate some Analyzers from API 3.6 to 6.2 and I'm not sure if I got the right approach. I'm using Pylucene, so lets assume this is pseudo-code. In 3.x (and up to 4), I've had access to the StringReader containing the data in the overriden tokenStream(fieldName, reader):

Optimizing Lucene search for whitespace analyzer.

2016-06-09 Thread apoorv gupta
2. [ "ABC", "AB", "XYZ", "Z"] Requests that will not match the doc: 1. [ "ABC", "AB", "XYZ"] 2. [ "ABC", "AB", "XYZ"

Problem with Analyzer Infix Suggester and Suggestions on multiple fields.

2016-04-26 Thread Ankit.Murarka
ava.nio.file.Path path = FileSystems.getDefault().getPath("D:\\", "indexRawData"); FSDirectory phraseIndexdir = FSDirectory.open(path); String fieldContent="Content"; Analyzer analyzerNormal = new StandardAnalyzer(); java.nio.file.Path path2 = FileSystems.getDefault().getP

RE: Language Specific Analyzer

2015-11-14 Thread Uwe Schindler
Hi, you cannot change the behavior of predefined analyzers! But since Lucene 5 there is no need to write your own subclass to define a custom analyzer. Just use CustomAnalyzer and define via fluent builder API how your analysis should look like (see example in javadocs): https

Language Specific Analyzer

2015-11-14 Thread marco turchi
Dear Users, I need to develop my language specific analyzer that: 1) does not remove punctuations 2) lowercases and stems each term in the text. I have tried some of the pre-implemented language analyzer (e.g. German and Italian analyzers), but they remove punctuation. I/m not sure, but probably

Re: Loading Solr Analyzer from RuntimeLib Blob

2015-09-10 Thread Noble Paul
components in schema are not loaded from runtimelib yet. Only solrconfig components are loaded from runtime lib a.k.a blob store On Thu, Sep 10, 2015 at 11:00 PM, Steve Davids wrote: > Hi, > > I am attempting to migrate our deployment process over to using the > recently added "Blob Store API" wh

RE: Loading Solr Analyzer from RuntimeLib Blob

2015-09-10 Thread Uwe Schindler
> Sent: Thursday, September 10, 2015 7:31 PM > To: java-user@lucene.apache.org > Subject: Loading Solr Analyzer from RuntimeLib Blob > > Hi, > > I am attempting to migrate our deployment process over to using the > recently added "Blob Store API" which should simpli

Loading Solr Analyzer from RuntimeLib Blob

2015-09-10 Thread Steve Davids
Hi, I am attempting to migrate our deployment process over to using the recently added "Blob Store API" which should simplify things a bit when it comes to cloud infrastructures for us. Unfortunately, after loading the jar in the .system collection and adding it to our runtimelib config overlay an

Re: PerFieldAnalyzerWrapper does not seem to allow use of a custom analyzer

2015-08-10 Thread Bauer, Herbert S. (Scott)
I found the problem here. I had changed some method params and was inadvertently creating the fields I was having issues with as StringFields, which the analyzer fails silently against. From: , Scott Bauer mailto:bauer.sc...@mayo.edu>> Date: Friday, August 7, 2015 at 1:56 PM To: &quo

PerFieldAnalyzerWrapper does not seem to allow use of a custom analyzer

2015-08-07 Thread Bauer, Herbert S. (Scott)
I can’t seem to detect any issues with the final custom analyzer declared in this code snippet (The one that attempts to use a PatternMatchingTokenizer and is initialized as sa), but it doesn’t seem to be hit when I run my indexing code despite being in the map. It is indexed finally but I

Re: Analyzer for supporting hyphenated words

2015-07-23 Thread Diego Socaceti
> userCriteriaProcessed = escape(userCriteria); > > > > if (!userCriteria.endsWith(MULTIPLE_CHARACTER_WILDCARD)) { > > userCriteriaProcessed += MULTIPLE_CHARACTER_WILDCARD; > > } > > } > > > > > > String queryStr = ""; &g

Re: Analyzer for supporting hyphenated words

2015-07-22 Thread Alessandro Benedetti
ields) { > String escapedFieldName = escape(fieldName); > queryStr += String.format("%s:%s ", escapedFieldName, > userCriteriaProcessed); > } > > query = new QueryParser("", analyzer).parse(queryStr.trim()); > > ... > > On Wed, Jul 22, 201

Re: Analyzer for supporting hyphenated words

2015-07-22 Thread Diego Socaceti
th(MULTIPLE_CHARACTER_WILDCARD)) { userCriteriaProcessed += MULTIPLE_CHARACTER_WILDCARD; } } String queryStr = ""; for (String fieldName : fields) { String escapedFieldName = escape(fieldName); queryStr += String.format("%s:%s ", escapedFieldName, userCriteriaProcessed);

Re: Analyzer for supporting hyphenated words

2015-07-22 Thread Diego Socaceti
{ userCriteriaProcessed += MULTIPLE_CHARACTER_WILDCARD; } } String queryStr = ""; for (String fieldName : fields) { String escapedFieldName = escape(fieldName); queryStr += String.format("%s:%s ", escapedFieldName, curTokenProcessed); } query = new QueryPars

Re: Analyzer for supporting hyphenated words

2015-07-22 Thread Alessandro Benedetti
Diego Socaceti : > > > > > Hi Alessandro, > > > > > > yes, i want the user to be able to surround the query with "" to run > the > > > phrase query with a NOT tokenized phrase. > > > > > > What do i have to do? > > > > &g

Re: Analyzer for supporting hyphenated words

2015-07-22 Thread Diego Socaceti
ave to do? > > > > Thanks and Kind regards > > > > On Tue, Jul 21, 2015 at 2:47 PM, Alessandro Benedetti < > > benedetti.ale...@gmail.com> wrote: > > > > > Hey Jack, reading the doc : > > > > > > " Set to true if phrase queries will be

Re: Analyzer for supporting hyphenated words

2015-07-22 Thread Alessandro Benedetti
phrase. > > What do i have to do? > > Thanks and Kind regards > > On Tue, Jul 21, 2015 at 2:47 PM, Alessandro Benedetti < > benedetti.ale...@gmail.com> wrote: > > > Hey Jack, reading the doc : > > > > " Set to true if phrase queries will be au

Re: Analyzer for supporting hyphenated words

2015-07-22 Thread Diego Socaceti
y Jack, reading the doc : > > " Set to true if phrase queries will be automatically generated when the > analyzer returns more than one term from whitespace delimited text. NOTE: > this behavior may not be suitable for all languages. > > Set to false if phrase queries should onl

Re: Analyzer for supporting hyphenated words

2015-07-21 Thread Alessandro Benedetti
Hey Jack, reading the doc : " Set to true if phrase queries will be automatically generated when the analyzer returns more than one term from whitespace delimited text. NOTE: this behavior may not be suitable for all languages. Set to false if phrase queries should only be generated

Re: Analyzer for supporting hyphenated words

2015-07-21 Thread Jack Krupansky
assic/QueryParserBase.html#setAutoGeneratePhraseQueries(boolean) -- Jack Krupansky On Fri, Jul 17, 2015 at 4:41 AM, Diego Socaceti wrote: > Hi all, > > i'm new to lucene and tried to write my own analyzer to support > hyphenated words like wi-fi, jean-pierre, etc. > For our customer it

Re: Analyzer for supporting hyphenated words

2015-07-21 Thread Alessandro Benedetti
case I identified your requirement, we can have a think to a solution! Cheers 2015-07-17 9:41 GMT+01:00 Diego Socaceti : > Hi all, > > i'm new to lucene and tried to write my own analyzer to support > hyphenated words like wi-fi, jean-pierre, etc. > For our customer it is imp

Analyzer for supporting hyphenated words

2015-07-17 Thread Diego Socaceti
Hi all, i'm new to lucene and tried to write my own analyzer to support hyphenated words like wi-fi, jean-pierre, etc. For our customer it is important to find the word - wi-fi by wi, fi, wifi, wi-fi - jean-pierre by jean, pierre, jean-pierre, jean-* The analyzer: public

Re: Changing analyzer in an indexwriter

2015-04-21 Thread Michael McCandless
gt;> > On Sunday, April 19, 2015 1:37 PM, Lisa Ziri wrote: >> > Hi, >> > I'm upgrading to lucene 5.1.0 from lucene 4. >> > In our index we have documents in different languages which are analyzed >> > with the correct analyzer. >> > We used the me

Re: Text dependent analyzer

2015-04-20 Thread Shay Hummel
ywordTokenizer has similar behaviour. It injects a single token. >> > >> http://lucene.apache.org/core/3_0_3/api/all/org/apache/lucene/analysis/KeywordAnalyzer.html >> > >> > Ahmet >> > >> > >> > On Wednesday, April 15, 2015 3:12 PM,

Re: Changing analyzer in an indexwriter

2015-04-20 Thread Anna Elisabetta Ziri
; > On Sunday, April 19, 2015 1:37 PM, Lisa Ziri wrote: > > Hi, > > I'm upgrading to lucene 5.1.0 from lucene 4. > > In our index we have documents in different languages which are analyzed > > with the correct analyzer. > > We used the method addDocume

Re: Changing analyzer in an indexwriter

2015-04-20 Thread Michael McCandless
ferent languages which are analyzed > with the correct analyzer. > We used the method addDocument of IndexWriter giving the correct analyzer > for every different document. > Now I see that I can define the analyzer used by the IndexWriter only in > the creation and I cannot switch ana

Re: Changing analyzer in an indexwriter

2015-04-19 Thread Ahmet Arslan
h are analyzed with the correct analyzer. We used the method addDocument of IndexWriter giving the correct analyzer for every different document. Now I see that I can define the analyzer used by the IndexWriter only in the creation and I cannot switch analyzer on the same IndexWriter. We allow to do

Changing analyzer in an indexwriter

2015-04-19 Thread Lisa Ziri
Hi, I'm upgrading to lucene 5.1.0 from lucene 4. In our index we have documents in different languages which are analyzed with the correct analyzer. We used the method addDocument of IndexWriter giving the correct analyzer for every different document. Now I see that I can define the analyzer

Re: Text dependent analyzer

2015-04-17 Thread Rich Cariens
I am doing. At the moment, to index a document, I > break > > it to sentences, and each sentence is analyzed (lemmatizing, stopword > > removal etc.) > > Now, what I am looking for is a way to create an analyzer (a class which > > extends lucene's analyzer). This analyzer

Re: Text dependent analyzer

2015-04-17 Thread Benson Margulies
r the reply, > That's exactly what I am doing. At the moment, to index a document, I break > it to sentences, and each sentence is analyzed (lemmatizing, stopword > removal etc.) > Now, what I am looking for is a way to create an analyzer (a class which > extends lucene'

Re: Text dependent analyzer

2015-04-17 Thread Ahmet Arslan
pword removal etc.) Now, what I am looking for is a way to create an analyzer (a class which extends lucene's analyzer). This analyzer will be used for index and query processing. It (a like the english analyzer) will receive the text and produce tokens. The Api of Analyzer requires implem

Re: Text dependent analyzer

2015-04-15 Thread Jack Krupansky
nk you for the reply, > That's exactly what I am doing. At the moment, to index a document, I break > it to sentences, and each sentence is analyzed (lemmatizing, stopword > removal etc.) > Now, what I am looking for is a way to create an analyzer (a class which > extends lucene&

Re: Text dependent analyzer

2015-04-15 Thread Shay Hummel
Hi Ahment, Thank you for the reply, That's exactly what I am doing. At the moment, to index a document, I break it to sentences, and each sentence is analyzed (lemmatizing, stopword removal etc.) Now, what I am looking for is a way to create an analyzer (a class which extends lucene'

Re: Text dependent analyzer

2015-04-14 Thread Ahmet Arslan
like to create a text dependent analyzer. That is, *given a string*, the analyzer will: 1. Read the entire text and break it into sentences. 2. Each sentence will then be tokenized, possesive removal, lowercased, mark terms and stemmed. The second part is essentially what happens in english

Text dependent analyzer

2015-04-14 Thread Shay Hummel
Hi I would like to create a text dependent analyzer. That is, *given a string*, the analyzer will: 1. Read the entire text and break it into sentences. 2. Each sentence will then be tokenized, possesive removal, lowercased, mark terms and stemmed. The second part is essentially what happens in

Re: Analyzer: Access to document?

2015-02-04 Thread Ahmet Arslan
eam ts = analyzer().tokenStream("field", new StringReader(text))) { final CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class); ts.reset(); // Resets this stream to the beginning. (Required) while (ts.incrementToken()) list.add(termAtt.toString()); ts.end(); // Perfo

Analyzer: Access to document?

2015-02-04 Thread Ralf Bierig
Hi all, an Analyzer has access to content on a per-field level by overwriting this method: protected TokenStreamComponents createComponents(String fieldName, Reader reader); Is it possible to get to the document? I want to collect the text content from the entire document within my

Getting new token stream from analyzer for legacy projects!

2014-12-12 Thread andi rexha
Hi, I have a legacy problem with the token stream. In my application I create a batch of documents from a unique analyzer (this due to configuration). I add the field using the tokenStream from the analyzer(for internal reasons). In a pseudo code this translates in : Analyzer analyzer

PositionFilter Deprecation and Questioning the associated Analyzer Invariant

2014-11-20 Thread Doug Turnbull
he user. Does this invariant still matter in this case? I could see adjusting offsets in an analyzer. However, I feel like offsets are a bit sacrosanct -- they refer to a character offset in the original document -- not the result of analysis. Am I wrong in feeling this way? So I question why

Exception while using a custom analyzer in a parallel indexing!

2014-09-15 Thread andi rexha
.java:1537) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1207) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1188) When I use the "PerFieldAnalyzerWrapper" only with the analyzer as default analyzer: this.analyzer = new Per

Re: Can't get case insensitive keyword analyzer to work

2014-08-12 Thread Milind
Christoph Kaser < christoph.ka...@iconparc.de> wrote: > Hello Milind, > > if you don't set the field to be tokenized, no analyzer will be used and > the field's contents will be stored "as-is", i.e. case sensitive. > It's the analyzer's job to toke

Re: Can't get case insensitive keyword analyzer to work

2014-08-12 Thread Jack Krupansky
: Re: Can't get case insensitive keyword analyzer to work Hello Milind, if you don't set the field to be tokenized, no analyzer will be used and the field's contents will be stored "as-is", i.e. case sensitive. It's the analyzer's job to tokenize the input, so i

Re: Can't get case insensitive keyword analyzer to work

2014-08-12 Thread Christoph Kaser
Hello Milind, if you don't set the field to be tokenized, no analyzer will be used and the field's contents will be stored "as-is", i.e. case sensitive. It's the analyzer's job to tokenize the input, so if you use an analyzer that does not separate the input

Re: Can't get case insensitive keyword analyzer to work

2014-08-11 Thread Milind
9, 2014 at 4:39 PM, Milind wrote: > >> I looked at a couple of examples on how to get keyword analyzer to be >> case insensitive but I think I missed something since it's not working for >> me. >> >> In the code below, I'm indexing text in upper case and s

Re: Can't get case insensitive keyword analyzer to work

2014-08-11 Thread Milind
g 9, 2014 at 4:39 PM, Milind wrote: > I looked at a couple of examples on how to get keyword analyzer to be case > insensitive but I think I missed something since it's not working for me. > > In the code below, I'm indexing text in upper case and searching in lower > case. Bu

Can't get case insensitive keyword analyzer to work

2014-08-09 Thread Milind
I looked at a couple of examples on how to get keyword analyzer to be case insensitive but I think I missed something since it's not working for me. In the code below, I'm indexing text in upper case and searching in lower case. But I get back no hits. Do I need to something more whil

Re: usage of CollationAttributeFactory StandardTokenizer Analyzer

2014-07-31 Thread craiglang44
Sent from my BlackBerry® smartphone -Original Message- From: Cemo Date: Thu, 31 Jul 2014 11:04:18 To: Reply-To: java-user@lucene.apache.org Subject: usage of CollationAttributeFactory StandardTokenizer Analyzer Hi, I am trying to use CollationAttributeFactory with a custom analyzer

usage of CollationAttributeFactory StandardTokenizer Analyzer

2014-07-31 Thread Cemo
Hi, I am trying to use CollationAttributeFactory with a custom analyzer. I am using StandardTokenizer with CollationAttributeFactory as in org.apache.lucene.collation.CollationKeyAnalyzer. protected TokenStreamComponents createComponents(String fieldName

Re: Incorrect tokenizing in the UAX29URLEmailAnalyzer analyzer?

2014-07-24 Thread Milind
Thanks again Steve. It was the version number. I hadn't noticed the deprecated warning. Changing to use Version.LUCENE_47 fixed the problem. On Wed, Jul 23, 2014 at 8:20 PM, Steve Rowe wrote: > On Jul 23, 2014, at 7:43 PM, Milind wrote: > > >>> input=esl2.gbr > >>> output=[esl2.gb][r]

Re: Incorrect tokenizing in the UAX29URLEmailAnalyzer analyzer?

2014-07-23 Thread Steve Rowe
On Jul 23, 2014, at 7:43 PM, Milind wrote: >>> input=esl2.gbr >>> output=[esl2.gb][r] >>> >>> This is a bug, which was fixed in Lucene 4.7 - see < > https://issues.apache.org/jira/browse/LUCENE-5391> > > BTW, I changed the POM dependency to 4.7.1, but I'm still seeing the same > output. I

  1   2   3   4   5   6   7   8   >