Re: how to use HTMLStripCharFilter in solrJ?

2018-07-05 Thread Ahmet Arslan
Hi Arturas,  Here are some things to try : 1) HTMLStripCharFilter stripper = new HTMLStripCharFilter(strReader.markSupported() ? strReader : new BufferedReader(strReader)) 2) Consider using HTML Strip update processor factory.  3) Create a custom Lucene analyzer using html strip char filter a

Re: coord in SolR 7

2018-02-18 Thread Ahmet Arslan
Hi Andreas, Can weak AND (WAND) be used in your use case? https://issues.apache.org/jira/browse/LUCENE-8135 Ahmet On Monday, February 12, 2018, 1:44:38 PM GMT+3, Moll, Dr. Andreas wrote: Hi, I try to upgrade our SolR installation from SolR 5 to 7. We use a customized similarity clas

Re: Difference between UAX29URLEmailTokenizerFactory and ClassicTokenizerFactory

2017-11-24 Thread Ahmet Arslan
Hi Zheng, UAX29UET recognizes URLs and e-mails. It does not tokenize them. It keeps them single token. StandardTokenizer produce two or more tokens for an entity. Please try them using the analysis page, use which one suits your requirements. Ahmet On Friday, November 24, 2017, 11:46:57 A

Re: get all tokens from TokenStream in my custom filter

2017-11-19 Thread Ahmet Arslan
last token is indexed . Ahmet i could not find peek or advance method :(   Please help me guys .  On Fri, Nov 17, 2017 at 10:10 PM, Ahmet Arslan wrote: Hi Kumar, If I am not wrong, I think there is method named something like peek(2) or advance(2).Some filters access tokens ahead and perform

Re: get all tokens from TokenStream in my custom filter

2017-11-17 Thread Ahmet Arslan
Hi Kumar, If I am not wrong, I think there is method named something like peek(2) or advance(2).Some filters access tokens ahead and perform some logic. AhmetOn Wednesday, November 15, 2017, 10:50:55 PM GMT+3, kumar gaurav wrote: Hi I need to get full field value from TokenStream in m

Re: Keeping the index naturally ordered by some field

2017-10-01 Thread Ahmet Arslan
Hi Alex, Lucene has this capability (borrowed from Nutch) under  org.apache.lucene.index.sorter package.I think it has been integrated into Solr, but could not find the Jira issue. Ahmet On Sunday, October 1, 2017, 10:22:45 AM GMT+3, alexpusch wrote: Hello, We've got a pretty big

Re: Help with Query/Function for conditional boost

2017-08-16 Thread Ahmet Arslan
Hi Shamik, I belive 5-args map function can be used here. Here is a link which may inspire you. http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/ Ahmet On Wednesday, August 16, 2017, 11:06:28 PM GMT+3, Shamik Bandopadhyay wrote: Hi,   I'm trying to create a fun

Re: QueryParser changes query by itself

2017-08-15 Thread Ahmet Arslan
Hi Bernd, In LUCENE-3758, a new member field added into ComplexPhraseQuery class. But we didn't change its hashCode method accordingly. This caused anomalies in Solr, and Yonik found the bug and fixed hashCode. Your e-mail somehow reminded me this. Could it be the QueryCache and hashCode method

Re: RE: Comparison of Solr with Sharepoint Search

2017-08-14 Thread Ahmet Arslan
Hi, https://manifoldcf.apache.org is used to crawl content from SharePoint and index into Solr. Ahmet On Monday, August 14, 2017, 9:05:20 PM GMT+3, jmahuang wrote: Sir, Can SOLR search existing SharePoint document libraries and lists? Thanks! -- View this message in context: http://lu

Re: Token "states" not getting lemmatized by Solr?

2017-08-10 Thread Ahmet Arslan
Hi Omer, Your analysis chain does not include a stem filter (lemmatizer) Assuming you are dealing with English text, you can use KStemFilterFactory or SnowballFilterFactory. Ahmet On Thursday, August 10, 2017, 9:33:08 PM GMT+3, OTH wrote: Hi, Regarding 'analysis chain': I'm using Solr 6.4.

Re: Indexing a CSV that contains double quotes

2017-08-07 Thread Ahmet Arslan
Hi Ahmet, I'm afraid I don't understand, do you think you could clarify a little bit? Thanks, Devon O'Shaughnessy Developer/Analyst Upper Lakes Foods p: 800.879.1265 | ext: 4135 w: upperlakesfoods.com   ____ From:

Re: Indexing a CSV that contains double quotes

2017-08-07 Thread Ahmet Arslan
Hi Devon, I think you need to supply encapsulator=" parameter-value pair. Ahmet On Monday, August 7, 2017, 7:57:45 PM GMT+3, O'Shaughnessy, Devon wrote:    Hello all, I'm pretty new at Solr, having only worked with in a couple weeks, and I'm guessing I'm having a newbie problem of

Re: Highlighting words with special characters

2017-07-19 Thread Ahmet Arslan
Hi, Maybe name of the UAX29URLEMailTokenizer is deceiving you?It does *not* tokenize URLs and Emails. Actually it recognises them and emits them as a single token. Ahmet On Wednesday, July 19, 2017, 12:00:05 PM GMT+3, Lasitha Wattaladeniya wrote: Update, I changed the UAX29URLEmailTokenizerF

Re: Solr Analyzer for Vietnamese

2017-07-13 Thread Ahmet Arslan
Hi Eirik, I believe "icu tokenizer" does a decent job on text written in non-alphabets. Ahmet On Monday, May 22, 2017, 10:32:22 AM GMT+3, Eirik Hungnes wrote: Hi, There doesn't seem to be any Tokenizer / Analyzer for Vietnamese built in to Lucene at the moment. Does anyone know if something

Re: How to get field names of dynamic field

2017-04-14 Thread Ahmet Arslan
Hi Midas, LukeRequestHandler shows that information. Ahmet On Friday, April 14, 2017, 1:16:09 PM GMT+3, Midas A wrote: Actually , i am looking for APi On Fri, Apr 14, 2017 at 3:36 PM, Andrea Gazzarini wrote: > I can see those names in the "Schema  browser" of the admin UI, so I guess > usin

Re: KeywordTokenizer and multiValued field

2017-04-12 Thread Ahmet Arslan
I don't understand the first option, what is each value? Keyword tokenizer emits single token, analogous to string type. On Wednesday, April 12, 2017, 7:45:52 PM GMT+3, Walter Underwood wrote: Does the KeywordTokenizer make each value into a unitary string or does it take the whole list of v

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Ahmet Arslan
Hi, I cannot find it. However it should be something like  q=hello&fq={!frange l=0.5}query($q) Ahmet On Wednesday, April 12, 2017, 10:07:54 PM GMT+3, Ahmet Arslan wrote: Hi David, A function query named "query" returns the score for the given subquery.  Combined with frange query

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Ahmet Arslan
Hi David, A function query named "query" returns the score for the given subquery.  Combined with frange query parser this is possible. I tried it in the past.I am searching the original post. I think it was Yonik's post. https://cwiki.apache.org/confluence/display/solr/Function+Queries Ahmet

Re: Filtering results by minimum relevancy score

2017-04-10 Thread Ahmet Arslan
Hi, I remember that this is possible via frange query parser.But I don't have the query string at hand. Ahmet On Monday, April 10, 2017, 9:00:09 PM GMT+3, David Kramer wrote: I’ve done quite a bit of searching on this.  Pretty much every page I find says it’s a bad idea and won’t work well, but

Re: How on EARTH do I remove 's in schema file?

2017-03-19 Thread Ahmet Arslan
Hi Donato, How about using ApostropheFilterFactory ? http://lucene.apache.org/core/6_4_2/analyzers-common/org/apache/lucene/analysis/tr/ApostropheFilter.html Ahmet On Sunday, March 19, 2017 4:08 PM, donato wrote: Then why is it not working? It doesn't make sense at all? And in the Tag fiel

Re: Distinguish exact match from wildcard match

2017-03-02 Thread Ahmet Arslan
Hi, how about q=code_text:bolt*&fq=code_text:bolt Ahmet On Thursday, March 2, 2017 4:41 PM, Сергей Твердохлеб wrote: Hi, is there way to separate exact match from wildcard match in solr response? e.g. there are two documents: {code_text:bolt} and {code_text:bolter}. When I search for "bolt

Re: CPU Intensive Scoring Alternatives

2017-02-21 Thread Ahmet Arslan
Hi, New default similarity is BM25. May be explicitly set similarity to tf-idf and see how it goes? Ahmet On Tuesday, February 21, 2017 4:28 AM, Fuad Efendi wrote: Hello, Default TF-IDF performs poorly with the indexed 200 millions documents. Query "Michael Jackson" may run 300ms, and "Mich

Re: Stemming and accents

2017-02-10 Thread Ahmet Arslan
Hi, I have experimented before, and found that Snowball is sensitive to accents/diacritics. Please see for more details: http://www.sciencedirect.com/science/article/pii/S0306457315001053 Ahmet On Friday, February 10, 2017 11:27 AM, Dominique Bejean wrote: Hi, Is the SnowballPorterFilter

Re: Dismax query special characters

2017-01-29 Thread Ahmet Arslan
Hi, I don't think dismax recognizes AND OR. Special characters for dismax are + - and quotes. In your example, ampersand may causing you trouble. Due to URL encode stuff... Ahmet On Sunday, January 29, 2017 12:17 AM, Jarosław Grązka wrote: Hi, Reading Solr documentation about dismax query

Re: Empty Highlight Problem - Solr 6.3.0

2016-12-24 Thread Ahmet Arslan
Hi, Did you try increasing hl.maxAnalyzedChars ? Ahmet On Friday, December 23, 2016 10:47 PM, Furkan KAMACI wrote: Hi All, I'm trying highlighter component at Solr 6.3. I have a problem when I index PDF files. I know that given keyword exists at result document (it is returned as result bec

Re: Stemming with SOLR

2016-12-15 Thread Ahmet Arslan
Hi, KStemFilter returns legitimate English words, please use it. Ahmet On Thursday, December 15, 2016 6:17 PM, Lasitha Wattaladeniya wrote: Hello devs, I'm trying to develop this indexing and querying flow where it converts the words to its original form (lemmatization). I was doing bit of

Re: Searching for a term which isn't a part of an expression

2016-12-15 Thread Ahmet Arslan
lter or adding a SearchComponent to filter out the "bad" results, but obviously a true query-time support would be a lot better. On Wed, Dec 14, 2016 at 10:52 PM, Ahmet Arslan wrote: > Hi, > > Do you have a common list of phrases that you want to prohibit partial > match? &

Re: Searching for a term which isn't a part of an expression

2016-12-14 Thread Ahmet Arslan
Hi, Do you have a common list of phrases that you want to prohibit partial match? You can index those phrases in a special way, for example, This is a new world hello_world hot_dog tap_water etc. ahmet On Wednesday, December 14, 2016 9:20 PM, deansg wrote: We would like to enable queries for

Re: Unicode Character Problem

2016-12-10 Thread Ahmet Arslan
Hi Furkan, I am pretty sure this is a pdf extraction thing. Turkish characters caused us trouble in the past during extracting text from pdf files. You can confirm by performing manual copy-paste from original pdf file. Ahmet On Friday, December 9, 2016 8:44 PM, Furkan KAMACI wrote: Hi, I'm

Re: Wildcard searches with space in TextField/StrField

2016-11-25 Thread Ahmet Arslan
Hi, You could try this: drop wildcard stuff altogether: 1) Employ edgengramfilter at index time. 2) Use plain searches at query time. Ahmet On Friday, November 25, 2016 4:59 PM, Sandeep Khanzode wrote: Hi All, Can someone please assist with this query? My data consists of: 1.] John Doe 2.

Re: Problem with Han character in ICUFoldingFilter

2016-10-30 Thread Ahmet Arslan
Hi Eyal, ICUFoldingFilter uses http://site.icu-project.org under the hood. If you think there is a bug, it is better to ask its mailing list. Ahmet On Sunday, October 30, 2016 3:41 PM, "eyal.naam...@exlibrisgroup.com" wrote: Hi, I was wondering if anyone ran into the following issue, or a s

Re: Solr 5.3.1 - Synonym is not working as expected

2016-10-25 Thread Ahmet Arslan
Hi, If your index is pure Chinese, I would do the expansion on query time only. Simply replace English query term with Chinese translations. Ahmet On Tuesday, October 25, 2016 12:30 PM, soundarya wrote: We are using Solr 5.3.1 version as our search engine. This setup is provided by the Bitn

Re: Lowercase all characters in String

2016-10-11 Thread Ahmet Arslan
Hi, KeywordTokenizer and LowerCaseFilter should suffice. Optionally you can add TrimFilter too. Ahmet On Tuesday, October 11, 2016 5:24 PM, Zheng Lin Edwin Yeo wrote: Hi, Would like to find out, what is the best way to lowercase all the text, while preserving all the tokens. As I need to p

Re: Preceding special characters in ClassicTokenizerFactory

2016-10-03 Thread Ahmet Arslan
Hi Andy, WordDelimeterFilter has "types" option. There is an example file named wdftypes.txt in the source tree that preserves #hashtags and @mentions. If you follow this path, please use Whitespace tokenizer. Ahmet On Monday, October 3, 2016 9:52 PM, "Whelan, Andy" wrote: Hello, I am guess

Re: StrField with Wildcard Search

2016-09-08 Thread Ahmet Arslan
functionality will still be provided no matter the approach... SRK On Thursday, September 8, 2016 5:05 PM, Ahmet Arslan wrote: Hi, EdgeNGram and Wildcard may be used to achieve the same goal: prefix search or starts with search. Lets say, wildcard enumerates the whole inverted index, thus it may

Re: StrField with Wildcard Search

2016-09-08 Thread Ahmet Arslan
Hi, EdgeNGram and Wildcard may be used to achieve the same goal: prefix search or starts with search. Lets say, wildcard enumerates the whole inverted index, thus it may get slower for very large databases. With this one no index time manipulation is required. EdgeNGram does its magic at index

Re: changed query parsing between 4.10.4 and 5.5.3?

2016-09-07 Thread Ahmet Arslan
Hi, The tilde in the former looks interesting. I think it related to proximity search. What query parser is this? Ahmet On Wednesday, September 7, 2016 10:52 AM, Bernd Fehling wrote: Hi list, while going from SOLR 4.10.4 to 5.5.3 I noticed a change in query parsing. 4.10.4 text:star text:

Re: Blank/Null value search in term filter

2016-09-05 Thread Ahmet Arslan
any configuration change. Please suggest. On 02-Sep-2016 9:37 PM, "Ahmet Arslan" wrote: > > > Hi Kishore, > > You can employ an impossible token value (say XX) for null values. > This can be done via default value update processor factory. > You index some pl

Re: Blank/Null value search in term filter

2016-09-02 Thread Ahmet Arslan
Hi Kishore, You can employ an impossible token value (say XX) for null values. This can be done via default value update processor factory. You index some placeholder token for null values. fq={!terms f='queryField' separator='|'}A|XX would fetche docs with A or null values. Ahmet On Friday, S

Re: Sorting non-english text

2016-08-25 Thread Ahmet Arslan
for example, I update JVM patch-version, then already indexed documents whose indexed fields used CollationKeyAnalyzer needs to be re-indexed or else we cannot query them? Thanks, Vasu On Thu, Aug 25, 2016 at 7:59 PM, Ahmet Arslan wrote: > Hi Vasu, > > There is a fi

Re: Sorting non-english text

2016-08-25 Thread Ahmet Arslan
Hi Vasu, There is a field type or something like that (CollationKeyAnalyzer) for language specific sorting. Ahmet On Thursday, August 25, 2016 12:29 PM, Vasu Y wrote: Hi, I have a text field which can contain values (multiple tokens) in English; to support sorting, I had in schema.xml to co

Re: Wildcard search not working

2016-08-12 Thread Ahmet Arslan
I change to be able to correctly do wildcard searches? Many thanks for your time. Cheers, christian -- Christian Ribeaud Software Engineer (External) NIBR / WSJ-310.5.17 Novartis Campus CH-4056 Basel -Original Message----- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Donnerstag, 11.

Re: Wildcard search not working

2016-08-11 Thread Ahmet Arslan
Hi Chiristian, The query r?che may not return at least the same number of matches as roche depending on your analysis chain. The difference is roche is analyzed but r?che don't. Wildcard queries are executed on the indexed/analyzed terms. For example, if roche is indexed/analyzed as roch, the qu

Re: Query optimization

2016-07-28 Thread Ahmet Arslan
Ups I forgot the link: http://yonik.com/solr/paging-and-deep-paging/ On Friday, July 29, 2016 9:51 AM, Ahmet Arslan wrote: Hi Midas, Please search 'deep paging' over the documentation, mailing list, etc. Solr Deep Paging and Sorting Ahmet On Friday, July 29, 2016 9:21 AM, Mida

Re: Query optimization

2016-07-28 Thread Ahmet Arslan
Hi Midas, Please search 'deep paging' over the documentation, mailing list, etc. Solr Deep Paging and Sorting Ahmet On Friday, July 29, 2016 9:21 AM, Midas A wrote: please reply . On Fri, Jul 29, 2016 at 10:26 AM, Midas A wrote: > a) my index size is 10 gb for higher start is query res

Re: No need white space split

2016-07-25 Thread Ahmet Arslan
Hi, May be you can simply use string field type? Or KeywordTokenizerFactory? Ahmet On Monday, July 25, 2016 4:38 PM, Shashi Roushan wrote: Hi All, I am Shashi. I am using Solr 6.1. I want to get result only when the hole word matched. Actually I want to avoid whitespace split. Whenever we

Re: Find part of long query in shorter fields

2016-07-21 Thread Ahmet Arslan
Hi, If you want to disable operators altogether please use dismax instead of edismax. In dismax, only + and - unary operators are supported, if i am not wrong. I don't remember the situation of quotations for the phrase query. Ahmet On Tuesday, July 19, 2016 8:29 PM, CA wrote: Just for the r

Re: Find part of long query in shorter fields

2016-07-16 Thread Ahmet Arslan
Hi Chantal, Please see https://issues.apache.org/jira/browse/LUCENE-7148 ahmet On Saturday, July 16, 2016 3:48 PM, CA wrote: Hello all, our index contains product offers from online shops. The fields we are indexing have all rather short values: the name of the product, the brand, the pric

Re: Filter Query that matches all values of a field

2016-07-04 Thread Ahmet Arslan
Hi Vasu, This question appears occasionally in the mailing list. Please see https://issues.apache.org/jira/browse/LUCENE-7148 ahmet On Monday, July 4, 2016 9:10 PM, Vasu Y wrote: Hi, I have a single type field that can contain zero or more values (comma separated values). This field stores

Re: Data import handler in techproducts example

2016-07-02 Thread Ahmet Arslan
Hi Jonas, Search for the solr-dataimporthandler-*.jar place it under a lib directory (same level as the solr.xml file) along with the mysql jdbc driver (mysql-connector-java-*.jar) Please see: https://cwiki.apache.org/confluence/display/solr/Lib+Directives+in+SolrConfig On Saturday, July 2,

Re: an advice: why not to add a searching model for mailing list

2016-07-02 Thread Ahmet Arslan
Hi Kent, There are already two search systems for the task: http://find.searchhub.org http://search-lucene.com Is this what you mean by saying 'search model'? Ahmet On Saturday, July 2, 2016 6:43 PM, Kent Mu wrote: hi all, I wonder why not do add a searching model for mailing list, so tha

Re: Sorting & searching on the same field

2016-06-23 Thread Ahmet Arslan
Hi Jay, I don't think it can be combined. Mainly because: searching requires a tokenized field. Sorting requires a single value (token) to be meaningful. Ahmet On Thursday, June 23, 2016 7:43 PM, Jay Potharaju wrote: Hi, I would like to have 1 field that can used for both searching and case i

Re: How do we get terms suggestion from SuggestComponent?

2016-06-21 Thread Ahmet Arslan
Hi, With grams parameter of FreeTextLookupFactory, no? Ahmet On Tuesday, June 21, 2016 1:19 PM, solr2020 wrote: Thanks Ahmet. It is working fine. Now i would like to get suggestions for multiple terms. How do i get suggestions for multiple terms?

Re: How do we get terms suggestion from SuggestComponent?

2016-06-20 Thread Ahmet Arslan
Hi, I think : FreeTextLookupFactory DocumentDictionaryFactory 3 content Ahmet On Monday, June 20, 2016 3:51 PM, solr2020 wrote: Hi, I am using solr.SuggestComponent for auto suggestion, it works fine. But the problem is, it returns the whole field value as suggestion instead of terms. But m

Re: Phrase query proximity parameter doe not show up in parsed query string

2016-06-20 Thread Ahmet Arslan
Hi, I think synonym_edismax is not part of solr. Can you re-produce with the stock edismax? On Monday, June 20, 2016 12:34 PM, preeti kumari wrote: Hi All, My query looks like below : q=((_query_:"{!synonym_edismax qf='partnum' v='597871' bq='' mm=100 synonyms=true synonyms.constructPhra

Re: Can someone explain about Sweetspot Similarity ?

2016-06-19 Thread Ahmet Arslan
Hi, Sweet spot is designed to punish too long or too short documents. Did you reindex? Can you see the mention of sweet spot in debugQuery=true response? Ahmet On Sunday, June 19, 2016 2:18 PM, dirmanhafiz wrote: Hi , Im Dirman and im trying experiment solr with sweetspot similarity,, can s

Re: Error when searching with special characters

2016-06-18 Thread Ahmet Arslan
. What could be the reason that it did not work with the default defType=lucene? Regards, Edwin On 18 June 2016 at 01:04, Ahmet Arslan wrote: > Hi, > > May be URL encoding issue? > By the way, I would use back slash to escape special characters. > > Ahmet > > On Friday

Re: Error when searching with special characters

2016-06-17 Thread Ahmet Arslan
Hi, May be URL encoding issue? By the way, I would use back slash to escape special characters. Ahmet On Friday, June 17, 2016 10:08 AM, Zheng Lin Edwin Yeo wrote: Hi, I encountered this error when I tried to search with special characters, like "&" and "#". { "responseHeader":{ "st

Re: Stemming

2016-06-16 Thread Ahmet Arslan
Hi Jamal, Snowball requires lowercase filter above it. This is documented in javadocs but it is a small but important detail. Please use a lowercase filter after the whitescpace tokenizer. Ahmet On Thursday, June 16, 2016 10:13 PM, "Jamal, Sarfaraz" wrote: Hi Guys, I have enabled stemmin

Re: wildcard search for string having spaces

2016-06-15 Thread Ahmet Arslan
Hi Roshan, I think there are two options: 1) escape the space q=abc\ p* 2) use prefix query parser q={!prefix f=my_string}abc p Ahmet On Wednesday, June 15, 2016 3:48 PM, Roshan Kamble wrote: Hello, I have below custom field type defined for solr 6.0.0

Re: Question about multiple fq parameters

2016-06-09 Thread Ahmet Arslan
Hi Mikhail, Can you please explain what this mysterious op parameter is? How is it related to range queries issued on date fields? Thanks, Ahmet On Thursday, June 9, 2016 11:43 AM, Mikhail Khludnev wrote: Shawn, I found "op" at org.apache.solr.schema.DateRangeField.parseSpatialArgs(QParser, S

Re: Scoring changes between 4.10 and 5.5

2016-06-09 Thread Ahmet Arslan
Hi, I wondered the same before and failed to decipher TFIDFSimilarity. Scoring looks like tf*idf*idf to me. I appreciate someone who will shed some light on this. Thanks, Ahmet On Friday, June 10, 2016 12:37 AM, Upayavira wrote: I've just done a very simple, single term query against a 4.10

Re: Question about multiple fq parameters

2016-06-08 Thread Ahmet Arslan
What is the meaning of 'op=Intersects' here? On Thursday, June 9, 2016 12:20 AM, Mikhail Khludnev wrote: oh.. hold on. you might need the space in the later one ?&q=*&q.op=OR&fq= {!field+f=DateB+op=Intersects v=$b} {!field+f=DateA+op=Intersects v=$a}&b=[2000-01-01+TO+2020-01-01]&a=[2020-01-01

Re: carrot2 label understanding(clustering)

2016-06-08 Thread Ahmet Arslan
Hi, This is search result clustering. Carrot2 also assigns labels to clusters. It automatically generates those labels. Ahmet On Wednesday, June 8, 2016 12:36 PM, Mugeesh Husain wrote: Hi, I have a few question regarding clustering , i check out this link https://cwiki.apache.org/confluence

Re: Getting a list of matching terms and offsets

2016-06-05 Thread Ahmet Arslan
me use the full power of the Solr ecosystem. I'd basically be back to dealing with Lucene directly, which I think is a step backwards. I think the right approach is to write my own SearchComponent, using the highlighter as a starting point. But I wanted to make sure there wasn't a simple

Re: Getting a list of matching terms and offsets

2016-06-05 Thread Ahmet Arslan
ghter has to do just this in order to create snippets with accurate highlighting. Justin On Sun, Jun 5, 2016 at 9:09 AM Ahmet Arslan wrote: > Hi, > > May be org.apache.lucene.search.spans.TermSpans ? > > > > On Sunday, June 5, 2016 7:59 AM, Alexandre Rafalovitch > wr

Re: Getting a list of matching terms and offsets

2016-06-05 Thread Ahmet Arslan
Hi, May be org.apache.lucene.search.spans.TermSpans ? On Sunday, June 5, 2016 7:59 AM, Alexandre Rafalovitch wrote: It sounds like TermVector component's output: https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component Perhaps with additional flags enabled (e.g. tv.offsets

Re: debugging solr query

2016-05-27 Thread Ahmet Arslan
w columns? >>> >>> >>> Thanks >>> Jay >>> >>> On Tue, May 24, 2016 at 8:06 PM, Erick Erickson >> > wrote: >>> >>>> Try adding debug=timing, that'll give you an idea of what component is >>>> taking all the tim

Re: How can Most Popular Search be implemented in Solr?

2016-05-27 Thread Ahmet Arslan
Hi, Solr does not explicitly save incoming/maintain queries. * Some people save queries at the UI side. * Some folks enable Solr logging and then extract useful query, numFound, QTime, etc information from logs: http://soleami.com * Others identify searches that return zero documents (missing con

Re: how can we use multi term search along with stop words

2016-05-26 Thread Ahmet Arslan
iginal Message- From: Siddhartha Singh Sandhu [mailto:sandhus...@gmail.com] Sent: Thursday, May 26, 2016 6:54 PM To: solr-user@lucene.apache.org; Ahmet Arslan Subject: Re: how can we use multi term search along with stop words Hi Preeti, You can use the analysis tool in the Solr console to see how yo

Re: sort by custom function of similarity score

2016-05-26 Thread Ahmet Arslan
Hi, Probably, using the 'query' function query, which returns the score of a given query. https://cwiki.apache.org/confluence/display/solr/Function+Queries#FunctionQueries-UsingFunctionQuery On Thursday, May 26, 2016 1:59 PM, aanilpala wrote: is it allowed to provide a sort function (sortspe

Re: how can we use multi term search along with stop words

2016-05-26 Thread Ahmet Arslan
Hi Bhat, What do you mean by multi term search? In your first e-mail, your example uses quotes, which means phrase/proximity search. ahmet On Thursday, May 26, 2016 11:49 AM, Preeti Bhat wrote: HI All, Sorry for asking the same question again, but could someone please advise me on this.

Re: debugging solr query

2016-05-24 Thread Ahmet Arslan
Hi, Is it QueryComponent taking time? Ot other components? Also make sure there is plenty of RAM for OS cache. Ahmet On Wednesday, May 25, 2016 1:47 AM, Jay Potharaju wrote: Hi, I am trying to debug solr performance problems on an old version of solr, 4.3.1. The queries are taking really

Re: highlight don't work if df not specified

2016-05-23 Thread Ahmet Arslan
uot;,"org.apache.solr.common.SolrException"], "msg":"undefined field text", "code":400}} On Sun, May 22, 2016 at 5:34 PM, Ahmet Arslan wrote: > Hi, > > What happens when you increase hl.maxAnalyzedChars? > > OR > > hl.q=blah blah&

Re: highlight don't work if df not specified

2016-05-22 Thread Ahmet Arslan
Hi, What happens when you increase hl.maxAnalyzedChars? OR hl.q=blah blah&hl.fl=normal_text,title Ahmet On Sunday, May 22, 2016 5:24 PM, michael solomon wrote: On Sun, May 22, 2016 at 5:18 PM, Ahmet Arslan wrote: > Hi, > > Weird, are your fields stored? > > &g

Re: highlight don't work if df not specified

2016-05-22 Thread Ahmet Arslan
Hi, Weird, are your fields stored? On Sunday, May 22, 2016 5:14 PM, michael solomon wrote: Thanks Ahmet, It was mistake in the question, sorry, in the quey I wrote it properly. On Sun, May 22, 2016 at 5:06 PM, Ahmet Arslan wrote: > Hi, > > q=normal_text:"bla bla&q

Re: highlight don't work if df not specified

2016-05-22 Thread Ahmet Arslan
Hi, q=normal_text:"bla bla"&title:"bla bla" should be q=+normal_text:"bla bla" +title:"bla bla" On Sunday, May 22, 2016 4:52 PM, michael solomon wrote: Hi, I'm I query multiple fields in solr: q=normal_text:"bla bla"&title:"bla bla" I turn on the highlighting, but it doesn't work even

Re: How to use a regex search within a phrase query?

2016-05-22 Thread Ahmet Arslan
Hi Erez, I don't think it is possible to combine regex with phrase out-of-the-box. However, there is https://issues.apache.org/jira/browse/LUCENE-5205 for the task. Can't you define your query in terms of pure regex? something like /[0-9]{3} .* [0-9]{4}/ ahmet On Sunday, May 22, 2016 1:37 PM,

Re: indexing dovecot mailbox

2016-05-22 Thread Ahmet Arslan
/Postfach/cur # file 1461583672.Vfe03I1000f4M981621.bitmachine1:2,S 1461583672.Vfe03I1000f4M981621.bitmachine1:2,S: SMTP mail, ASCII text I can read them with the Midnight Commeander. Has it something to do with the file-ending not recognized? Andreas Ahmet Arslan schrieb am 22.05.16 um 00:46:32 Uhr

Re: indexing dovecot mailbox

2016-05-21 Thread Ahmet Arslan
, 2016 3:46 AM, Ahmet Arslan wrote: Hi Meyer, Not sure what "mailbox of dovecot" is, but SimplePostTool can recognize certain file types. They (xml,json,...,log) are actually listed in the log msg in your email. Can you describe the format of the files that you want to index? Are they

Re: indexing dovecot mailbox

2016-05-21 Thread Ahmet Arslan
Hi Meyer, Not sure what "mailbox of dovecot" is, but SimplePostTool can recognize certain file types. They (xml,json,...,log) are actually listed in the log msg in your email. Can you describe the format of the files that you want to index? Are they text files? ahmet On Sunday, May 22, 2016

Re: Solrj 4.7.2 - slowing down over time

2016-05-19 Thread Ahmet Arslan
Hi, EmbeddedSolrServer bypass the servlet container. Please see : http://find.searchhub.org/document/a88f669d38513a76 On Thursday, May 19, 2016 6:23 PM, Roman Slavik wrote: Hi Ahmet, thanks for your response, I appreciate it. I thought that EmbeddedSolrServer is just wrapper around Solr cor

Re: Solrj 4.7.2 - slowing down over time

2016-05-18 Thread Ahmet Arslan
Hi Roman, You said you were using EmbeddedSolrServer, also you mention Tomcat. I don't think it is healthy to use both. Also I wouldn't use EmbeddedSolrServer at all. It is rarely used and there can be hidden things there. Consider using jetty which is actually tested. Since you commit every min

Re: Precision, Recall, ROC in solr

2016-05-18 Thread Ahmet Arslan
Hi Tentri, Evaluation in IR primary carried out by traditional TREC-style (also referred to as Cranfield paradigm) evaluation methodology. The evaluation methodology requires a document collection, a set of information needs (called topics or queries), and a set of query relevance judgments (qr

Re: Filter query (fq) on comma seperated value does not work

2016-05-16 Thread Ahmet Arslan
was able to retrieve the expected results. But Still Can you help me out in achieving the results using the comma as you suggested. Thanks & Regards On Mon, May 16, 2016 at 5:50 PM, Ahmet Arslan wrote: > Hi, > > Its all about how you tokenize the category field. > It looks like

Re: easiest way to search parts of words

2016-05-16 Thread Ahmet Arslan
Hi Gates, There are two approaches: 1) Use a wildcard query with star operator q=consult* 2) Create an index with EdgeNGramFilterFactory and issue a regular search q=consult (2) will be faster at the cost of bigger index size You don't need to change anything for (1) if the execution time is s

Re: Filter query (fq) on comma seperated value does not work

2016-05-16 Thread Ahmet Arslan
Hi, Its all about how you tokenize the category field. It looks like you are using a string type, which does not tokenize at all (e.g. verbatim) Please use a PatterTokenizer and configure it so that it splits on comma. Ahmet On Monday, May 16, 2016 2:11 PM, SRINI SOLR wrote: Hi Team - Can yo

Re: URL parameters combined with text param

2016-05-13 Thread Ahmet Arslan
uery_:"{!q.op=AND v='hospital'}" _query_:"{!q.op=AND v=$a}" (+())/no_coord +() ExtendedDismaxQParser [...] On 12/05/2016 17:06, Erick Erickson wrote: > Try adding &debug=query to your query and look at the parsed results

Re: URL parameters combined with text param

2016-05-12 Thread Ahmet Arslan
ame as http://localhost:8983/solr/my_core/select?q=hospital ) Kind regards, Bastien On 11/05/2016 16:06, Ahmet Arslan wrote: > Hi Bastien, > > Please use magic _query_ field, q=hospital AND _query_:"{!q.op=AND v=$a}" > > ahmet > > > On Wednesday, May 11, 2016 2:

Re: Error

2016-05-11 Thread Ahmet Arslan
Hi Midas, It looks like you are committing too frequently, cache warming cannot catchup. Either lower your commit rate, or disable cache auto warm (autowarmCount=0). You can also remove queries registered at newSearcher event if you have defined some. Ahmet On Wednesday, May 11, 2016 2:51 PM,

Re: URL parameters combined with text param

2016-05-11 Thread Ahmet Arslan
Hi Bastien, Please use magic _query_ field, q=hospital AND _query_:"{!q.op=AND v=$a}" ahmet On Wednesday, May 11, 2016 2:35 PM, Latard - MDPI AG wrote: Hi Everybody, Is there a way to pass only some of the data by reference and some others in the q param? e.g.: q1. http://localhost:8983

Re: How to search string

2016-05-11 Thread Ahmet Arslan
Hi, You can be explicit about the field that you want to search on. e.g. q=product_name:(Garmin Class A) Or you can use lucene query parser with default field (df) parameter. e.g. q={!lucene df=product_name)Garmin Class A Its all about query parsers. Ahmet On Wednesday, May 11, 2016 9:12 AM

Re: How to search in solr for words like %rek Dr%

2016-05-11 Thread Ahmet Arslan
Hi Thrinadh, Why don't you use plain wildcard search? There are two operator star and question mark for this purpose. Ahmet On Wednesday, May 11, 2016 4:31 AM, Thrinadh Kuppili wrote: Thank you, Yes i am aware that surround with quotes will result in match for space but i am trying to match

Re: Facet ignoring repeated word

2016-05-10 Thread Ahmet Arslan
+1 to Toke's facet and stats combo! On Tuesday, May 10, 2016 11:21 AM, Toke Eskildsen wrote: On Fri, 2016-04-29 at 08:55 +, G, Rajesh wrote: > I am trying to implement word > cloud

Re: how to find out how many times a word appears in a collection of documents?

2016-05-10 Thread Ahmet Arslan
document is marked for deletion. df values include deleted documents. Christian Fotache Tel: 0728.297.207 Fax: 0351.411.570 From: Ahmet Arslan To: "solr-user@lucene.apache.org" ; "liviuchrist...@yahoo.com" Sent: Tuesday, May 10, 2016 1:42 PM Subject: Re: h

Re: how to find out how many times a word appears in a collection of documents?

2016-05-10 Thread Ahmet Arslan
Hi Christian, Collection wide term statistics can be accessed via TermsComponent or LukeRequestHandler. Ahmet On Tuesday, May 10, 2016 1:26 PM, "liviuchrist...@yahoo.com.INVALID" wrote: Hi everyone, I need to "read" the solr/lucene index and see how many times does words appear in all docu

Re: Facet ignoring repeated word

2016-05-09 Thread Ahmet Arslan
r than the intended person(s) is prohibited. -Original Message- From: G, Rajesh [mailto:r...@cebglobal.com] Sent: Friday, May 6, 2016 1:08 PM To: Ahmet Arslan ; solr-user@lucene.apache.org Subject: RE: Facet ignoring repeated word Hi Ahmet, Sorry it is Word Cloud https://urldefense.proofpo

Re: Filter queries & caching

2016-05-08 Thread Ahmet Arslan
Hi, As I understand it useful incase you use an OR operator between two restricting clauses. Recall that multiple fq means implicit AND. ahmet On Monday, May 9, 2016 4:02 AM, Jay Potharaju wrote: As mentioned above adding filter() will add the filter query to the cache. This would mean that

Re: Facet ignoring repeated word

2016-05-06 Thread Ahmet Arslan
er than the intended person(s) is prohibited. -Original Message- From: G, Rajesh [mailto:r...@cebglobal.com] Sent: Thursday, May 5, 2016 4:29 PM To: Ahmet Arslan ; solr-user@lucene.apache.org; erickerick...@gmail.com Subject: RE: Facet ignoring repeated word Hi, TermVectorComponent work

Re: How to get all the docs whose field contain a specialized string?

2016-05-06 Thread Ahmet Arslan
Hi, It looks like brand_s is defined as string, which is not tokenized. Please do one of the following to retrieve "brand_s":"ibm hp" a) use a tokenized field type or b) issue a wildcard query of q=ibm* Ahmet On Friday, May 6, 2016 8:35 AM, 梦在远方 wrote: Hi, all I do a query by solr admi

  1   2   3   4   5   6   7   8   9   10   >