So my solr query is implemented in two parts,first query does an exact
search if there are no results found for exact then it goes to the second
query that does a fuzzy search.
every things works fine but in situations like--A user enters burg +
So in exact search no records will come,so second
Hi, Solr Developers
I want to get the newest commited docs in the postcommit event, then
nofity the other server which data can be used, but I can not find any way to
get the newest docs after commited, so is there any way to do this?
Thank you.
Wen Li
Hi Alessandro,
I'm using Solr 5.0.0, but it is still able to work. Actually I found this
to be better than query~1 or query~2, as it can automatically detect
and allow the 20% error rate that I want.
For this query~1 or query~2, does it mean that I'll have to manually
detect how many characters
Hi,
Will like to check, for the SynonymFilterFactory, I have the following in
my synonyms.txt:
Titanium Dioxides, titanium oxide, pigment
pigment, colour, colouring material
If I set expend=false, and I search for q=pigment, I will get results that
matches pigment, Titanium Dioxides and
Hi Zheng,
actually that version of the fuzzy search is deprecated!
Currently the fuzzy search syntax is :
query~1 or query~2
The ~(tilde) param is the number of edit we provide to generate all the
expanded query to run.
Can I ask you which version of Solr are you using ?
This article from 2011
Just an update, the tokenizer class which I'm using is
StandardTokenizerFactory, and I'm using Solr 5.0.
On 8 May 2015 16:24, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote:
Hi,
Will like to check, for the SynonymFilterFactory, I have the following in
my synonyms.txt:
Titanium Dioxides,
Let's explain little bit better here :
First of all, the SynonimFilter is a Token Filter, and being a Token Filter
it can be part of an Analysis pipeline at Indexing and Query Time.
As the different type of analysis explicitly explains when the filtering
happens, let's go to the details of the
2015-05-08 10:14 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
Hi Alessandro,
I'm using Solr 5.0.0, but it is still able to work. Actually I found this
to be better than query~1 or query~2, as it can automatically detect
and allow the 20% error rate that I want.
I don't think that the
Thanks for explaining the information.
Currently I'm only using the comma-separated list of words and only using
the synonym filter at query time. I find that when I set expend = true,
there's quite a number of irrelevant results that came back, and this
didn't happen when I set expend = false.
Hello Shacky,
I have recently performed a manual installation of a Zookeeper ensemble (3
zookeepers) in the same machine. I used the upstart init script from
official .deb configuration
https://svn.apache.org/repos/asf/zookeeper/trunk/src/packages/deb/init.d/zookeeper
and modified it in order to
So it means like having more than 10 or 20 synonym files locally will still
be faster than accessing external service?
As I found out that zookeeper only allows the synonym.txt file to be a
maximum of 1MB, and as my potential synonym file is more than 20MB, I'll
need to split the file to more
The document seems to point to using AutoPhrasingTokenFilter, putting an
underscore to the multi-term or changing to index time synonyms.
I'm also thinking of putting the synonyms onto a database or query some
thesaurus website when the using enter the search key, instead of using the
Hi Alessandro,
Thank you so much for the info. Will try that out.
Regards,
Edwin
On 8 May 2015 17:27, Alessandro Benedetti benedetti.ale...@gmail.com
wrote:
2015-05-08 10:14 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
Hi Alessandro,
I'm using Solr 5.0.0, but it is still able to
I found this very interesting article that I think can help in better
understanding the problem :
http://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
And this :
Accessing an external service ( such a thesaurus website) per each query,
can slow down your system a lot.
Having the synonyms locally, with the Solr integration is much better.
Cheers
2015-05-08 11:46 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
The document seems to point to using
Is it possible to know a little bit more about the nature of that
multi-lingual field ?
I can see the keywordTokenizer and then a lot of grams calculated from that
token .
What is that field used for ?
2015-05-07 19:23 GMT+01:00 Kuntal Ganguly gangulykuntal1...@gmail.com:
Our current production
Thank you very much Erick.
Bye
2015-05-06 17:06 GMT+02:00 Erick Erickson erickerick...@gmail.com:
That should have put one replica on each machine, if it did you're fine.
Best,
Erick
On Wed, May 6, 2015 at 3:58 AM, shacky shack...@gmail.com wrote:
Ok, I found out that the creation of new
Hi Yonik,
Any update for the question?
Thanks in advance,
Frank
On Thu, May 7, 2015 at 2:49 PM, Frank li fudon...@gmail.com wrote:
Is there any book to read so I won't ask such dummy questions? Thanks.
On Thu, May 7, 2015 at 2:32 PM, Frank li fudon...@gmail.com wrote:
This one does not
On one of my fields (the phrase suggestion field) has 30'860'099 terms. Is
this too much?
Another field (the single word suggestion) has 2'156'218 terms.
-Ursprüngliche Nachricht-
Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch]
Gesendet: Freitag, 8. Mai 2015 15:54
An:
Hi All,
Looking at data directory in my solrcloud cluster I have found a lot of old
snapshot directory in
Like these:
snapshot.20150506003702765
snapshot.20150506003702760
snapshot.20150507002849492
snapshot.20150507002849473
snapshot.20150507002849459
or even a month older. These directories
This is a quite big Sinonym corpus !
If it's not feasible to have only 1 big synonym file ( I haven't checked,
so I assume the 1 Mb limit is true, even if strange)
I would do an experiment :
1) testing query time with a Solr Classic config
2) Use an Ad Hoc Solr Core to manage Synonyms ( in this
Context: Solr/Lucene 5.1
Is there a way to determine documents that occupy alot space in the index. As
I don't store any fields that have text, it must be the terms extracted from
the documents occupying the space.
So my question is: which documents occupy a most space in the inverted index?
Each of the characters you identified are characters that have meaning
to the query parser, '+' is a mandatory clause, '-' is a NOT operator
and * is a wildcard. To get through the query parser, these (and a
bunch of others, see below) must be escaped.
Personally, though, I'd pre-scrub the data.
Oops, this may be a better link: http://lucidworks.com/blog/indexing-with-solrj/
On Fri, May 8, 2015 at 9:55 AM, Erick Erickson erickerick...@gmail.com wrote:
bq: has 30'860'099 terms. Is this too much
Depends on how you indexed it. If you used shingles, then maybe, maybe
not. If you just do
I¹ve been looking into this again. The phrase highlighter is much slower
than the default highlighter, so you might be able to add
hl.usePhraseHighlighter=false to your query to make it faster. Note that
web interface will NOT help here, because that param is true by default,
and the checkbox is
Thank you for your suggestions.
I can't do a proper testing on that yet as I'm currently using a 4GB RAM
normal PC machine, and all these probably requires more RAM that what I
have.
I've tried running the setup with 20 synonyms file, and the system went Out
of Memory before I could test
Not that I know of. newest doc id is pretty ambiguous. If I transmit
a batch of 100 docs then commit, they're all committed at once. Which
one, then, is newest? And consider what happens if (in SolrCloud)
mode, I send updates to two separate nodes. The docs are forwarded to
the leader for the
All,
With a cloud setup for a collection in 4.6.1, what is the most elegant way
to backup and restore an index?
We are specifically looking into the application of when doing a full
reindex, with the idea of building an index on one set of servers, backing
up the index, and then restoring that
bq: has 30'860'099 terms. Is this too much
Depends on how you indexed it. If you used shingles, then maybe, maybe
not. If you just do normal text analysis, it's suspicious to say the
least. There are about 300K words in the English language and you have
100X that. So either
1 you have a lot of
Steven:
They're listed on the ref guide I posted. Not a concise list, but
you'll see || and other interesting bits.
On Fri, May 8, 2015 at 9:20 AM, Steven White swhite4...@gmail.com wrote:
Hi Erick,
Is there a documented list of all operators (AND, OR, NOT, etc.) that also
need to be
Hi Erick,
Is there a documented list of all operators (AND, OR, NOT, etc.) that also
need to be escaped? Are there more beside the 3 I listed?
Thanks
Steve
On Fri, May 8, 2015 at 11:47 AM, Erick Erickson erickerick...@gmail.com
wrote:
Each of the characters you identified are characters
Hi,
Actually we are facing lot of issues with Solr shards in our environment.
Our environment is fully loaded with around 150 million documents where
each document will have around 50+ stored fields which has multiple values.
And also we have lot of custom components in this environment which are
Never mind.. used the zkcli.sh that comes with solr to accomplish the
firewall
--
View this message in context:
http://lucene.472066.n3.nabble.com/Not-able-to-Add-docValues-in-Solr-tp4204405p4204579.html
Sent from the Solr - User mailing list archive at Nabble.com.
I looking to use Solr search over the byte code in Classes and Jars.
Does anyone know or have experience of Analyzers, Tokenizers, and Token
Filters for such a task?
Regards
Mark
FWIW you may also want to drop the boolean ops in favour of + and - (OR
being default)
pozdrawiam,
LAFK
2015-05-08 18:59 GMT+02:00 Erick Erickson erickerick...@gmail.com:
Steven:
They're listed on the ref guide I posted. Not a concise list, but
you'll see || and other interesting bits.
Thanks Show and Hoss.
Just added lowercaseOperators=false to my edismax config and everything
seems to be working.
*Thanks,*
*Rajesh,*
*(mobile) : 8328789519.*
On Mon, Apr 27, 2015 at 11:53 AM, Rajesh Hazari rajeshhaz...@gmail.com
wrote:
I did go through the documentation of edismax (solr 5.1
What do the various Java IDEs use for indexing classes for
field/type/variable/method usage search? I imagine it's got to be bytecode.
On Fri, May 8, 2015 at 2:40 PM, Tomasz Borek tomasz.bo...@gmail.com wrote:
Out of curiosity: why bytecode?
pozdrawiam,
LAFK
2015-05-08 21:31 GMT+02:00 Mark
Out of curiosity: why bytecode?
pozdrawiam,
LAFK
2015-05-08 21:31 GMT+02:00 Mark javam...@gmail.com:
I looking to use Solr search over the byte code in Classes and Jars.
Does anyone know or have experience of Analyzers, Tokenizers, and Token
Filters for such a task?
Regards
Mark
Short answer: wget skips body on 400 assuming you didn't want error page
stored.
Long answer: get your error page with additional wget params, like so:
✗ wget -Sd http://10.0.3.113:8080/solr/collection1/vitas\?q\=coreD%3A25
DEBUG output created by Wget 1.15 on linux-gnu.
URI encoding = `UTF-8'
To answer why bytecode - because mostly the use case I have is looking to
index as much detail from jars/classes.
extract class names,
method names
signatures
packages / imports
I am considering using ASM in order to generate an analysis view of the
class
The sort of usecases I have would be
Best I found so far is:
+place:(+word1~ +word2~ +word3~)
pozdrawiam,
LAFK
2015-04-26 3:20 GMT+02:00 Tomasz Borek tomasz.bo...@gmail.com:
Ave!
How do I make fuzzy search on lengthy names? As in La Riviera Montana de
los Diablos or Unified Mega Corp Super Dwelling? Across all queries?
My
Oh, and sorry, I omitted a couple of details:
# creating the “java” core/collection
bin/solr create -c java
# I ran this from my Solr source code checkout, so that SolrLogFormatter.class
just happened to be handy
Erik
On May 8, 2015, at 4:11 PM, Erik Hatcher
Erik,
Thanks for the pretty much OOTB approach.
I think I'm going to just try a range of approaches, and see how far I get.
The IDE does this suggestion would be worth looking into as well.
On 8 May 2015 at 22:14, Mark javam...@gmail.com wrote:
https://searchcode.com/
looks really
What kinds of searches do you want to run? Are you trying to extract class
names, method names, and such and make those searchable? If that’s the case,
you need some kind of “parser” to reverse engineer that information from .class
and .jar files before feeding it to Solr, which would happen
There are a number of reverse compilers for Java. Some are quite good and
very detailed, so long as the byte code has not been deliberately obfuscated.
Of course the original sources would be better for picking up comments. But,
then you'd need a java parser (the compiler front end), of
https://searchcode.com/
looks really interesting, however I want to crunch as much searchable
aspects out of jars sititng on a classpath or under a project structure...
Really early days so I'm open to any suggestions
On 8 May 2015 at 22:09, Mark javam...@gmail.com wrote:
To answer why
Thank you Erick for your answer!
I just tried to restart the first node and now the error is not yet there!
Sorry for my too-early email :-)
Bye!
2015-05-06 17:05 GMT+02:00 Erick Erickson erickerick...@gmail.com:
Have you looked arond at your directories on disk? I'm _not_ talking
about the
On 5/7/2015 11:52 PM, Rahul Singh wrote:
ERROR - 2015-05-08 11:15:25.738; org.apache.solr.common.SolrException;
null:java.lang.IllegalArgumentException: You cannot set an index-time bo
ost on an unindexed field, or one that omits norms
This seems to be the problem. You are trying to set an
I have just added a comment to the CWiki.
Thanks again for your prompt answer Erick.
Best,
Vincenzo
On Fri, May 8, 2015 at 12:39 AM, Erick Erickson erickerick...@gmail.com
wrote:
bq: ...forwards the index notation to itself and any replicas...
That's just odd phrasing.
All that means is
49 matches
Mail list logo