How to handle special characters in fuzzy search query

2015-05-08 Thread Madhav Bahuguna
So my solr query is implemented in two parts,first query does an exact search if there are no results found for exact then it goes to the second query that does a fuzzy search. every things works fine but in situations like--A user enters burg + So in exact search no records will come,so second

How to get the docs id after commit

2015-05-08 Thread 李文
Hi, Solr Developers I want to get the newest commited docs in the postcommit event, then nofity the other server which data can be used, but I can not find any way to get the newest docs after commited, so is there any way to do this? Thank you. Wen Li

Re: Proximity searching in percentage

2015-05-08 Thread Zheng Lin Edwin Yeo
Hi Alessandro, I'm using Solr 5.0.0, but it is still able to work. Actually I found this to be better than query~1 or query~2, as it can automatically detect and allow the 20% error rate that I want. For this query~1 or query~2, does it mean that I'll have to manually detect how many characters

Queries on SynonymFilterFactory

2015-05-08 Thread Zheng Lin Edwin Yeo
Hi, Will like to check, for the SynonymFilterFactory, I have the following in my synonyms.txt: Titanium Dioxides, titanium oxide, pigment pigment, colour, colouring material If I set expend=false, and I search for q=pigment, I will get results that matches pigment, Titanium Dioxides and

Re: Proximity searching in percentage

2015-05-08 Thread Alessandro Benedetti
Hi Zheng, actually that version of the fuzzy search is deprecated! Currently the fuzzy search syntax is : query~1 or query~2 The ~(tilde) param is the number of edit we provide to generate all the expanded query to run. Can I ask you which version of Solr are you using ? This article from 2011

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Zheng Lin Edwin Yeo
Just an update, the tokenizer class which I'm using is StandardTokenizerFactory, and I'm using Solr 5.0. On 8 May 2015 16:24, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, Will like to check, for the SynonymFilterFactory, I have the following in my synonyms.txt: Titanium Dioxides,

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Alessandro Benedetti
Let's explain little bit better here : First of all, the SynonimFilter is a Token Filter, and being a Token Filter it can be part of an Analysis pipeline at Indexing and Query Time. As the different type of analysis explicitly explains when the filtering happens, let's go to the details of the

Re: Proximity searching in percentage

2015-05-08 Thread Alessandro Benedetti
2015-05-08 10:14 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: Hi Alessandro, I'm using Solr 5.0.0, but it is still able to work. Actually I found this to be better than query~1 or query~2, as it can automatically detect and allow the 20% error rate that I want. I don't think that the

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Zheng Lin Edwin Yeo
Thanks for explaining the information. Currently I'm only using the comma-separated list of words and only using the synonym filter at query time. I find that when I set expend = true, there's quite a number of irrelevant results that came back, and this didn't happen when I set expend = false.

Re: Solr 5.1.0 Cloud and Zookeeper

2015-05-08 Thread Christos Manios
Hello Shacky, I have recently performed a manual installation of a Zookeeper ensemble (3 zookeepers) in the same machine. I used the upstart init script from official .deb configuration https://svn.apache.org/repos/asf/zookeeper/trunk/src/packages/deb/init.d/zookeeper and modified it in order to

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Zheng Lin Edwin Yeo
So it means like having more than 10 or 20 synonym files locally will still be faster than accessing external service? As I found out that zookeeper only allows the synonym.txt file to be a maximum of 1MB, and as my potential synonym file is more than 20MB, I'll need to split the file to more

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Zheng Lin Edwin Yeo
The document seems to point to using AutoPhrasingTokenFilter, putting an underscore to the multi-term or changing to index time synonyms. I'm also thinking of putting the synonyms onto a database or query some thesaurus website when the using enter the search key, instead of using the

Re: Proximity searching in percentage

2015-05-08 Thread Zheng Lin Edwin Yeo
Hi Alessandro, Thank you so much for the info. Will try that out. Regards, Edwin On 8 May 2015 17:27, Alessandro Benedetti benedetti.ale...@gmail.com wrote: 2015-05-08 10:14 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: Hi Alessandro, I'm using Solr 5.0.0, but it is still able to

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Alessandro Benedetti
I found this very interesting article that I think can help in better understanding the problem : http://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/ And this :

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Alessandro Benedetti
Accessing an external service ( such a thesaurus website) per each query, can slow down your system a lot. Having the synonyms locally, with the Solr integration is much better. Cheers 2015-05-08 11:46 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: The document seems to point to using

Re: Solr Multilingual Indexing with one field- Guidance

2015-05-08 Thread Alessandro Benedetti
Is it possible to know a little bit more about the nature of that multi-lingual field ? I can see the keywordTokenizer and then a lot of grams calculated from that token . What is that field used for ? 2015-05-07 19:23 GMT+01:00 Kuntal Ganguly gangulykuntal1...@gmail.com: Our current production

Re: New core on Solr Cloud

2015-05-08 Thread shacky
Thank you very much Erick. Bye 2015-05-06 17:06 GMT+02:00 Erick Erickson erickerick...@gmail.com: That should have put one replica on each machine, if it did you're fine. Best, Erick On Wed, May 6, 2015 at 3:58 AM, shacky shack...@gmail.com wrote: Ok, I found out that the creation of new

Re: JSON Facet Analytics API in Solr 5.1

2015-05-08 Thread Frank li
Hi Yonik, Any update for the question? Thanks in advance, Frank On Thu, May 7, 2015 at 2:49 PM, Frank li fudon...@gmail.com wrote: Is there any book to read so I won't ask such dummy questions? Thanks. On Thu, May 7, 2015 at 2:32 PM, Frank li fudon...@gmail.com wrote: This one does not

AW: determine big documents in the index?

2015-05-08 Thread Clemens Wyss DEV
On one of my fields (the phrase suggestion field) has 30'860'099 terms. Is this too much? Another field (the single word suggestion) has 2'156'218 terms. -Ursprüngliche Nachricht- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Freitag, 8. Mai 2015 15:54 An:

SolrCloud 4.8.0 - Snapshots directory take a lot of space

2015-05-08 Thread Vincenzo D'Amore
Hi All, Looking at data directory in my solrcloud cluster I have found a lot of old snapshot directory in Like these: snapshot.20150506003702765 snapshot.20150506003702760 snapshot.20150507002849492 snapshot.20150507002849473 snapshot.20150507002849459 or even a month older. These directories

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Alessandro Benedetti
This is a quite big Sinonym corpus ! If it's not feasible to have only 1 big synonym file ( I haven't checked, so I assume the 1 Mb limit is true, even if strange) I would do an experiment : 1) testing query time with a Solr Classic config 2) Use an Ad Hoc Solr Core to manage Synonyms ( in this

determine big documents in the index?

2015-05-08 Thread Clemens Wyss DEV
Context: Solr/Lucene 5.1 Is there a way to determine documents that occupy alot space in the index. As I don't store any fields that have text, it must be the terms extracted from the documents occupying the space. So my question is: which documents occupy a most space in the inverted index?

Re: How to handle special characters in fuzzy search query

2015-05-08 Thread Erick Erickson
Each of the characters you identified are characters that have meaning to the query parser, '+' is a mandatory clause, '-' is a NOT operator and * is a wildcard. To get through the query parser, these (and a bunch of others, see below) must be escaped. Personally, though, I'd pre-scrub the data.

Re: determine big documents in the index?

2015-05-08 Thread Erick Erickson
Oops, this may be a better link: http://lucidworks.com/blog/indexing-with-solrj/ On Fri, May 8, 2015 at 9:55 AM, Erick Erickson erickerick...@gmail.com wrote: bq: has 30'860'099 terms. Is this too much Depends on how you indexed it. If you used shingles, then maybe, maybe not. If you just do

Re: Slow highlighting on Solr 5.0.0

2015-05-08 Thread Matt Hilt
I¹ve been looking into this again. The phrase highlighter is much slower than the default highlighter, so you might be able to add hl.usePhraseHighlighter=false to your query to make it faster. Note that web interface will NOT help here, because that param is true by default, and the checkbox is

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Zheng Lin Edwin Yeo
Thank you for your suggestions. I can't do a proper testing on that yet as I'm currently using a 4GB RAM normal PC machine, and all these probably requires more RAM that what I have. I've tried running the setup with 20 synonyms file, and the system went Out of Memory before I could test

Re: How to get the docs id after commit

2015-05-08 Thread Erick Erickson
Not that I know of. newest doc id is pretty ambiguous. If I transmit a batch of 100 docs then commit, they're all committed at once. Which one, then, is newest? And consider what happens if (in SolrCloud) mode, I send updates to two separate nodes. The docs are forwarded to the leader for the

Best way to backup and restore an index for a cloud setup in 4.6.1?

2015-05-08 Thread John Smith
All, With a cloud setup for a collection in 4.6.1, what is the most elegant way to backup and restore an index? We are specifically looking into the application of when doing a full reindex, with the idea of building an index on one set of servers, backing up the index, and then restoring that

Re: determine big documents in the index?

2015-05-08 Thread Erick Erickson
bq: has 30'860'099 terms. Is this too much Depends on how you indexed it. If you used shingles, then maybe, maybe not. If you just do normal text analysis, it's suspicious to say the least. There are about 300K words in the English language and you have 100X that. So either 1 you have a lot of

Re: How to handle special characters in fuzzy search query

2015-05-08 Thread Erick Erickson
Steven: They're listed on the ref guide I posted. Not a concise list, but you'll see || and other interesting bits. On Fri, May 8, 2015 at 9:20 AM, Steven White swhite4...@gmail.com wrote: Hi Erick, Is there a documented list of all operators (AND, OR, NOT, etc.) that also need to be

Re: How to handle special characters in fuzzy search query

2015-05-08 Thread Steven White
Hi Erick, Is there a documented list of all operators (AND, OR, NOT, etc.) that also need to be escaped? Are there more beside the 3 I listed? Thanks Steve On Fri, May 8, 2015 at 11:47 AM, Erick Erickson erickerick...@gmail.com wrote: Each of the characters you identified are characters

Re: Limit the documents for each shard in solr cloud

2015-05-08 Thread Jilani Shaik
Hi, Actually we are facing lot of issues with Solr shards in our environment. Our environment is fully loaded with around 150 million documents where each document will have around 50+ stored fields which has multiple values. And also we have lot of custom components in this environment which are

Re: Not able to Add docValues in Solr

2015-05-08 Thread pras.venkatesh
Never mind.. used the zkcli.sh that comes with solr to accomplish the firewall -- View this message in context: http://lucene.472066.n3.nabble.com/Not-able-to-Add-docValues-in-Solr-tp4204405p4204579.html Sent from the Solr - User mailing list archive at Nabble.com.

indexing java byte code in classes / jars

2015-05-08 Thread Mark
I looking to use Solr search over the byte code in Classes and Jars. Does anyone know or have experience of Analyzers, Tokenizers, and Token Filters for such a task? Regards Mark

Re: How to handle special characters in fuzzy search query

2015-05-08 Thread Tomasz Borek
FWIW you may also want to drop the boolean ops in favour of + and - (OR being default) pozdrawiam, LAFK 2015-05-08 18:59 GMT+02:00 Erick Erickson erickerick...@gmail.com: Steven: They're listed on the ref guide I posted. Not a concise list, but you'll see || and other interesting bits.

Re: and stopword in user query is being change to q.op=AND

2015-05-08 Thread Rajesh Hazari
Thanks Show and Hoss. Just added lowercaseOperators=false to my edismax config and everything seems to be working. *Thanks,* *Rajesh,* *(mobile) : 8328789519.* On Mon, Apr 27, 2015 at 11:53 AM, Rajesh Hazari rajeshhaz...@gmail.com wrote: I did go through the documentation of edismax (solr 5.1

Re: indexing java byte code in classes / jars

2015-05-08 Thread Mike Drob
What do the various Java IDEs use for indexing classes for field/type/variable/method usage search? I imagine it's got to be bytecode. On Fri, May 8, 2015 at 2:40 PM, Tomasz Borek tomasz.bo...@gmail.com wrote: Out of curiosity: why bytecode? pozdrawiam, LAFK 2015-05-08 21:31 GMT+02:00 Mark

Re: indexing java byte code in classes / jars

2015-05-08 Thread Tomasz Borek
Out of curiosity: why bytecode? pozdrawiam, LAFK 2015-05-08 21:31 GMT+02:00 Mark javam...@gmail.com: I looking to use Solr search over the byte code in Classes and Jars. Does anyone know or have experience of Analyzers, Tokenizers, and Token Filters for such a task? Regards Mark

Re: Solr Exception The remote server returned an error: (400) Bad Request.

2015-05-08 Thread Tomasz Borek
Short answer: wget skips body on 400 assuming you didn't want error page stored. Long answer: get your error page with additional wget params, like so: ✗ wget -Sd http://10.0.3.113:8080/solr/collection1/vitas\?q\=coreD%3A25 DEBUG output created by Wget 1.15 on linux-gnu. URI encoding = `UTF-8'

Re: indexing java byte code in classes / jars

2015-05-08 Thread Mark
To answer why bytecode - because mostly the use case I have is looking to index as much detail from jars/classes. extract class names, method names signatures packages / imports I am considering using ASM in order to generate an analysis view of the class The sort of usecases I have would be

Re: Fuzzy phrases + weighting at query level or do I need to program?

2015-05-08 Thread Tomasz Borek
Best I found so far is: +place:(+word1~ +word2~ +word3~) pozdrawiam, LAFK 2015-04-26 3:20 GMT+02:00 Tomasz Borek tomasz.bo...@gmail.com: Ave! How do I make fuzzy search on lengthy names? As in La Riviera Montana de los Diablos or Unified Mega Corp Super Dwelling? Across all queries? My

Re: indexing java byte code in classes / jars

2015-05-08 Thread Erik Hatcher
Oh, and sorry, I omitted a couple of details: # creating the “java” core/collection bin/solr create -c java # I ran this from my Solr source code checkout, so that SolrLogFormatter.class just happened to be handy Erik On May 8, 2015, at 4:11 PM, Erik Hatcher

Re: indexing java byte code in classes / jars

2015-05-08 Thread Mark
Erik, Thanks for the pretty much OOTB approach. I think I'm going to just try a range of approaches, and see how far I get. The IDE does this suggestion would be worth looking into as well. On 8 May 2015 at 22:14, Mark javam...@gmail.com wrote: https://searchcode.com/ looks really

Re: indexing java byte code in classes / jars

2015-05-08 Thread Erik Hatcher
What kinds of searches do you want to run? Are you trying to extract class names, method names, and such and make those searchable? If that’s the case, you need some kind of “parser” to reverse engineer that information from .class and .jar files before feeding it to Solr, which would happen

RE: indexing java byte code in classes / jars

2015-05-08 Thread Reitzel, Charles
There are a number of reverse compilers for Java. Some are quite good and very detailed, so long as the byte code has not been deliberately obfuscated. Of course the original sources would be better for picking up comments. But, then you'd need a java parser (the compiler front end), of

Re: indexing java byte code in classes / jars

2015-05-08 Thread Mark
https://searchcode.com/ looks really interesting, however I want to crunch as much searchable aspects out of jars sititng on a classpath or under a project structure... Really early days so I'm open to any suggestions On 8 May 2015 at 22:09, Mark javam...@gmail.com wrote: To answer why

Re: ZooKeeperException: Could not find configName for collection

2015-05-08 Thread shacky
Thank you Erick for your answer! I just tried to restart the first node and now the error is not yet there! Sorry for my too-early email :-) Bye! 2015-05-06 17:05 GMT+02:00 Erick Erickson erickerick...@gmail.com: Have you looked arond at your directories on disk? I'm _not_ talking about the

Re: solr.war built from solr 4.7.2 not working

2015-05-08 Thread Shawn Heisey
On 5/7/2015 11:52 PM, Rahul Singh wrote: ERROR - 2015-05-08 11:15:25.738; org.apache.solr.common.SolrException; null:java.lang.IllegalArgumentException: You cannot set an index-time bo ost on an unindexed field, or one that omits norms This seems to be the problem. You are trying to set an

Re: SolrCloud indexing

2015-05-08 Thread Vincenzo D'Amore
I have just added a comment to the CWiki. Thanks again for your prompt answer Erick. Best, Vincenzo On Fri, May 8, 2015 at 12:39 AM, Erick Erickson erickerick...@gmail.com wrote: bq: ...forwards the index notation to itself and any replicas... That's just odd phrasing. All that means is