Re: Range faceting on timestamp field

2020-12-24 Thread Erick Erickson
Then you need to form your start at relative to your timezone. What I’d actually recommend is that if you need to bucket by day, you index the day in a separate field. Of course, if you have to bucket by day in arbitrary timezones that won’t work….. Best, Erick > On Dec 24, 2020, at 4:42 PM, ufu

Re: Data Import Handler (DIH) - Installing and running

2020-12-23 Thread Erick Erickson
Have you done what the message says and looked at your Solr log? If so, what information is there? > On Dec 23, 2020, at 5:13 AM, DINSD | SPAutores > wrote: > > Hi, > > I'm trying to install the package "data-import-handler", since it was > discontinued from core SolR distro. > > https://git

Re: Solr cloud facet query returning incorrect results

2020-12-21 Thread Erick Erickson
This should work as you expect, so the first thing I’d do is add &debug=query and see the parsed query in both cases. If that doesn’t show anything, please post the full debug response in both cases. Best, Erick > On Dec 21, 2020, at 4:31 AM, Alok Bhandari wrote: > > Hello All , > > we are

Re: solrCloud client socketTimeout initiates retries

2020-12-18 Thread Erick Erickson
eferring to this jira > https://issues.apache.org/jira/browse/SOLR-10479 > We have seen that some malicious queries come to system which takes > significant time and these queries propagating to other solr servers choke > the entire cluster. > > Regards, > kshitij > >

Re: Data Import Blocker - Solr

2020-12-18 Thread Erick Erickson
Have you tried escaping that character? > On Dec 18, 2020, at 2:03 AM, basel altameme > wrote: > > Dear, > While trying to Import & Index data from MySQL DB custom view i am facing the > error below: > Data Config problem: The value of attribute "query" associated with an > element type "enti

Re: solrCloud client socketTimeout initiates retries

2020-12-18 Thread Erick Erickson
Why do you want to do this? This sounds like an XY problem, you think you’re going to solve some problem X by doing Y. Y in this case is setting the numServersToTry, but you haven’t explained what X, the problem you’re trying to solve is. Offhand, this seems like a terrible idea. If you’re request

Re: Best example solrconfig.xml?

2020-12-15 Thread Erick Erickson
I’d start with that config set, making sure that “schemaless” is disabled. Do be aware that some of the defaults have changed, although the big change for docValues was there in 6.0. One thing you might want to do is set uninvertible=false in your schema. That’ll cause Solr to barf if you, say,

Re: Solr Collection Reload

2020-12-15 Thread Erick Erickson
Well, there’s no information here to help. The first thing I’d check is what the Solr logs are saying. Especially if you’ve changed any of your configuration files. If that doesn’t show anything, I'd take a thread dump and look at that, perhaps there’s some deadlock. But that said, a reload sho

Re: No numShards attribute exists in 'core.properties' with the newly added replica

2020-12-08 Thread Erick Erickson
I raised this JIRA: https://issues.apache.org/jira/browse/SOLR-15035 What’s not clear to me is whether numShards should even be in core.properties at all, even on the create command. In the state.json file it’s a collection-level property and not reflected in the individual replica’s informatio

Re: optimize boosting parameters

2020-12-08 Thread Erick Erickson
Before worrying about it too much, exactly _how_ much has the performance changed? I’ve just been in too many situations where there’s no objective measure of performance before and after, just someone saying “it seems slower” and had those performance changes disappear when a rigorous test is don

Re: Is there a way to search for "..." (three dots)?

2020-12-08 Thread Erick Erickson
Yes, but… Odds are your analysis configuration for the field is removing the dots. Go to the admin/analysis page, pick your field type and put examples in the “index” and “query” boxes and you’ll see what I mean. You need something like WhitespaceTokenizer, as your tokenizer, and avoid things li

Re: is there a way to trigger a notification when a document is deleted in solr

2020-12-07 Thread Erick Erickson
No, it’s marked “unresolved”…. > On Dec 7, 2020, at 9:22 AM, Pushkar Mishra wrote: > > Hi All > https://issues.apache.org/jira/browse/SOLR-13609, was this fixed ever ? > > Regards > > On Mon, Dec 7, 2020 at 6:32 PM Pushkar Mishra wrote: > >> Hi All, >> >> Is there a way to trigger a notific

Re: Collection deleted still in zookeeper

2020-12-07 Thread Erick Erickson
t; > Anyway, I found that the config where still in the UI/cloud/tree/configs > and they can be removed using the solr zk -r configs/myconfig and this > solve the issue. > > Thanks > > > > > > > On Fri, 4 Dec 2020 at 15:46, Erick Erickson wrote: >

Re: Migrate Legacy Solr Cores to SolrCloud

2020-12-05 Thread Erick Erickson
First thing I’d do is run one of the examples to insure you have Zookeeper set up etc. You can create a collection that uses the default configset. Once that’s done, start with ‘SOLR_HOME/solr/bin/solr zk upconfig’. There’s extensive help if you just type “bin/solr zk -help”. You give it the pat

Re: What's the most efficient way to check if there are any matches for a query?

2020-12-05 Thread Erick Erickson
Have you looked at the Term Query Parser (_not_ the TermS Query Parser) or Raw Query Parser? https://lucene.apache.org/solr/guide/8_4/other-parsers.html NOTE: these perform _no_ analysis, so you have to give them the exact term... These are pretty low level, and if they’re “fast enough” you won

Re: Collection deleted still in zookeeper

2020-12-04 Thread Erick Erickson
This almost always a result of one of two things: 1> you didn’t upload the config to the correct place or the ZK that Solr uses. or 2> you still have a syntax problem in the config. The solr.log file on the node that’s failing may have a more useful error message about what’s wrong. Also, you can

Re: Solrj supporting term vector component ?

2020-12-04 Thread Erick Erickson
To expand on Shawn’s comment. There are a lot of built-in helper methods in SolrJ, but they all amount to setting a value in the underlying map of params, which you can do yourself for any parameter you could specify on a URL or cURL command. For instance, SolrQuery.setStart(start) is just: thi

Re: Solr8.7 - How to optmize my index ?

2020-12-03 Thread Erick Erickson
ize is about 390Go for 130M docs (3-5ko / doc), around 25 fields >> (indexed, stored) >> All Tuesday I do an update of around 1M docs and all Thusday I do an add new >> docs (around 50 000). >> >> Many thanks ! >> >> Regards, >> Bruno >>

Re: ConcurrentUpdateSolrClient stall prevention bug in Solr 8.4+

2020-12-03 Thread Erick Erickson
Exactly _how_ are you indexing? In particular, how often are commits happening? If you’re committing too often, Solr can block until some of the background merges are complete. This can happen particularly when you are doing hard commits in rapid succession, either through, say, committing from

Re: chaining charFilter

2020-12-02 Thread Erick Erickson
Images are stripped by the mail server, so we can’t see the result. I looked at master and the admin UI has problems, I just raised a JIRA, see: https://issues.apache.org/jira/browse/SOLR-15024 The _functionality_ is fine. If you go to the analysis page and enter values, you’ll see the transforma

Re: Solr8.7 - How to optmize my index ?

2020-12-02 Thread Erick Erickson
expungeDeletes is unnecessary, optimize is a superset of expungeDeletes. The key difference is commit=true. I suspect if you’d waited until your indexing process added another doc and committed, you’d have seen the index size drop. Just to check, you send the command to my_core but talk about coll

Re: Need help to configure automated deletion of shard in solr

2020-12-02 Thread Erick Erickson
f any shard gets empty , need to delete > the shard as well. > > So lets say , is there a way to know, when solr completes the purging of > deleted documents, then based on that flag we can configure shard deletion > > Thanks > Pushkar > > On Tue, Dec 1, 2020 at 9:0

Re: Can solr index replacement character

2020-12-01 Thread Erick Erickson
Solr handles UTF-8, so it should be able to. The problem you’ll have is getting the UTF-8 characters to get through all the various transport encodings, i.e. if you try to search from a browser, you need to encode it so the browser passes it through. If you search through SolrJ, it needs to be enco

Re: Need help to configure automated deletion of shard in solr

2020-12-01 Thread Erick Erickson
;> empty (or may be lets say nominal documents are left ) , then delete the >>> shard. And I am exploring to do this using configuration . >>> >> 3. Also it will not be in live shard for sure as only those documents are >> deleted which have TTL got over . TTL

Re: Need help to configure automated deletion of shard in solr

2020-11-30 Thread Erick Erickson
Are you using the implicit router? Otherwise you cannot delete a shard. And you won’t have any shards that have zero documents anyway. It’d be a little convoluted, but you could use the collections COLSTATUS Api to find the names of all your replicas. Then query _one_ replica of each shard with so

Re: write.lock file after unloading core

2020-11-30 Thread Erick Erickson
I’m a little confused here. Are you unloading/copying/creating the core on master? I’ll assume so since I can’t really think of how doing this on one of the other cores would make sense….. I’m having a hard time wrapping my head around the use-case. You’re “delivering a new index”, which I take

Re: data import handler deprecated?

2020-11-29 Thread Erick Erickson
If you like Java instead of Python, here’s a skeletal program: https://lucidworks.com/post/indexing-with-solrj/ It’s simple and single-threaded, but could serve as a basis for something along the lines that Walter suggests. And I absolutely agree with Walter that the DB is often where the bottle

Re: Query generation is different for search terms with and without "-"

2020-11-25 Thread Erick Erickson
x this so it > doesn't have to be solved client side? > > On Tue, Nov 24, 2020 at 7:50 AM matthew sporleder > wrote: > >> Is the normal/standard solution here to regex remove the '-'s and >> combine them into a single token? >> >> On Tue, No

Re: Query generation is different for search terms with and without "-"

2020-11-24 Thread Erick Erickson
This is a common point of confusion. There are two phases for creating a query, query _parsing_ first, then the analysis chain for the parsed result. So what e-dismax sees in the two cases is: Name_enUS:“high tech” -> two tokens, since there are two of them pf2 comes into play. Name_enUS:“high-

Re: Atomic update wrongly deletes child documents

2020-11-24 Thread Erick Erickson
Sure, raise a JIRA. Thanks for the update... > On Nov 24, 2020, at 4:12 AM, Andreas Hubold > wrote: > > Hi, > > I was able to work around the issue. I'm now using a custom > UpdateRequestProcessor that removes undefined fields, so that I was able to > remove the catch-all dynamic field "ignore

Re: Unloading and loading a Collection in SolrCloud with external Zookeeper ensemble

2020-11-15 Thread Erick Erickson
I don’t really have any good alternatives. There’s an open JIRA for this, see: SOLR-6399 This would be a pretty big chunk of work, which is one of the reasons this JIRA has languished… Sorry I can’t be more helpful, Erick > On Nov 15, 2020, at 11:00 AM, Gajanan wrote: > > Hi Erick, thanks for

Re: Why am I able to sort on a multiValued field?

2020-11-14 Thread Erick Erickson
From the “Common Query Paramters” (sort) section of the ref guide: "In the case of primitive fields, or SortableTextFields, that are multiValued="true" the representative value used for each doc when sorting depends on the sort direction: The minimum value in each document is used for ascending

Re: Unloading and loading a Collection in SolrCloud with external Zookeeper ensemble

2020-11-12 Thread Erick Erickson
As stated in the docs, using the core admin API when using SolrCloud is not recommended, for just reasons like this. While SolrCloud _does_ use the Core Admin API, it’s usage has to be very precise. You apparently didn’t heed this warning in the UNLOAD command for the collections API: "Unload

Re: Using Multiple collections with streaming expressions

2020-11-10 Thread Erick Erickson
You need to open multiple streams, one to each collection then combine them. For instance, open a significantTerms stream to collection1, another to collection2 and wrap both in a merge stream. Best, Erick > On Nov 9, 2020, at 1:58 PM, ufuk yılmaz wrote: > > For example the streaming expressi

Re: SolrCloud shows cluster still healthy even the node data directory is deleted

2020-11-09 Thread Erick Erickson
Depends. *nix systems have delete-on-close semantics, that is as long as there’s a single file handle open, the file will be still be available to the process using it. Only when the last file handle is closed will the file actually be deleted. Solr (Lucene actually) has file handle open to every

Re: count mismatch with and without sort param

2020-11-05 Thread Erick Erickson
You need to provide examples in order for anyone to try to help you. Include 1> example docs (just the relevant field(s) will do) 2> your actual output 3> your expected output 4> the query you use. Best, Erick > On Nov 5, 2020, at 10:56 AM, Raveendra Yerraguntla > wrote: > > Hello > my date qu

Re: Solr 8.1.1 installation in Azure App service

2020-11-05 Thread Erick Erickson
I _strongly_ recommend you use the collections API CREATE command rather than try what you’re describing. You’re trying to mix manually creating core.properties files, which was the process for stand-alone Solr, with SolrCloud and hoping that it somehow gets propagated to Zookeeper. This has some

Re: Solr migration related issues.

2020-11-05 Thread Erick Erickson
Oh dear. You made me look at the reference guide. Ouch. We have this nice page “Defining core.properties” that talks about defining core.properties. Unfortunately it has _no_ warning about the fact that trying to use this in SolrCloud is a really bad idea. As in “don’t do it”. To make matters

Re: Solr migration related issues.

2020-11-04 Thread Erick Erickson
solr.xml in Zookeeper!). Best, Erick > Best, > Modassar > > On Wed, Nov 4, 2020 at 3:20 AM Erick Erickson > wrote: > >> Do note, though, that the default value for legacyCloud changed from >> true to false so even though you can get it to work by setting >> this cl

Re: Commits (with openSearcher = true) are too slow in solr 8

2020-11-04 Thread Erick Erickson
I completely agree with Shawn. I’d emphasize that your heap is that large probably to accommodate badly mis-configured caches. Why it’s different in 5.4 I don’t quite know, but 10-12 minutes is unacceptable anyway. My guess is that you made your heaps that large as a consequence of having low hit

Re: docValues usage

2020-11-04 Thread Erick Erickson
You don’t need to index the field for function queries, see: https://lucene.apache.org/solr/guide/8_6/docvalues.html. Function queries, as opposed to sorting, faceting and grouping are evaluated at search time where the search process is already parked on the document anyway, so answering the

Re: when to use stored over docValues and useDocValuesAsStored

2020-11-04 Thread Erick Erickson
> On Nov 4, 2020, at 6:43 AM, uyilmaz wrote: > > Hi, > > I heavily use streaming expressions and facets, or export large amounts of > data from Solr to Spark to make analyses. > > Please correct me if I know wrong: > > + requesting a non-docValues field in a response causes whole document t

Re: Solr migration related issues.

2020-11-03 Thread Erick Erickson
I second Erick's recommendation, but just for the record legacyCloud was > removed in (upcoming) Solr 9 and is still available in Solr 8.x. Most > likely this explains Modassar why you found it in the documentation. > > Ilan > > > On Tue, Nov 3, 2020 at 5:11 PM Erick Eric

Re: Solr migration related issues.

2020-11-03 Thread Erick Erickson
t dynamically > set? > > I will try to use the APIs to create the collection as well as the cores. > > Best, > Modassar > > On Tue, Nov 3, 2020 at 5:55 PM Erick Erickson > wrote: > >> You’re relying on legacyMode, which is no longer supported. In >

Re: how do you manage your config and schema

2020-11-03 Thread Erick Erickson
The caution I would add is that you should be careful that you don’t enable schemaless mode without understanding the consequences in detail. There is, in fact, some discussion of removing schemaless entirely, see: https://issues.apache.org/jira/browse/SOLR-14701 Otherwise, I usually recommend

Re: Search issue in the SOLR for few words

2020-11-03 Thread Erick Erickson
There is not nearly enough information here to begin to help you. At minimum we need: 1> your field definition 2> the text you index 3> the query you send You might want to review: https://wiki.apache.org/solr/UsingMailingLists Best, Erick > On Nov 3, 2020, at 1:08 AM, Viresh Sasalawad > wro

Re: Solr migration related issues.

2020-11-03 Thread Erick Erickson
You’re relying on legacyMode, which is no longer supported. In older versions of Solr, if a core.properties file was found on disk Solr attempted to create the replica (and collection) on the fly. This is no longer true. Why are you doing it this manually instead of using the collections API? You

Re: SOLR uses too much CPU and GC is also weird on Windows server

2020-11-02 Thread Erick Erickson
y thing is that 6.0.0. > handled these requests somehow, but newer version did not. > Anyway, we will observe this and try to improve our code as well. > > Best regards, > Jaan > > -Original Message- > From: Erick Erickson > Sent: 28 October 2020 17:18 >

Re: Simulate facet.exists for json query facets

2020-10-30 Thread Erick Erickson
I don’t think there’s anything to do what you’re asking OOB. If all of those facet queries are _known_ to be a performance hit, you might be able to do something custom.That would require custom code though and I wouldn’t go there unless you can demonstrate need. If you issue a debug=timing you’

Re: SOLR uses too much CPU and GC is also weird on Windows server

2020-10-28 Thread Erick Erickson
DocValues=true are usually only used for “primitive” types, string, numerics, booleans and the like, specifically _not_ text-based. I say “usually” because there’s a special “SortableTextField” where it does make some sense to have a text-based field have docValues, but that’s intended for rela

Re: Simulate facet.exists for json query facets

2020-10-28 Thread Erick Erickson
This really sounds like an XY problem. The whole point of facets is to count the number of documents that have a value in some number of buckets. So trying to stop your facet query as soon as it matches a hit for the first time seems like an odd thing to do. So what’s the “X”? In other words, what

Re: Solrcloud create collection ignores createNodeSet parameter

2020-10-27 Thread Erick Erickson
You’re confusing replicas and shards a bit. Solr tries its best to put multiple replicas _of the same shard_ on different nodes. You have two shards though with _one_ replica. Thi is a bit of a nit, but important to keep in mind when your replicatinFactor increases. So from an HA perspective, th

Re: SOLR uses too much CPU and GC is also weird on Windows server

2020-10-27 Thread Erick Erickson
Jean: The basic search uses an “inverted index”, which is basically a list of terms and the documents they appear in, e.g. my - 1, 4, 9, 12 dog - 4, 8, 10 So the word “my” appears in docs 1, 4, 9 and 12, and “dog” appears in 4, 8, 10. Makes it easy to search for my AND dog for instance, obviou

Re: Performance issues with CursorMark

2020-10-26 Thread Erick Erickson
8.6 still has uninvertible=true, so this should go ahead and create an on-heap docValues structure. That’s going to consume 38M ints to the heap. Still, that shouldn’t require 500M additional space, and this would have been happening in your old system anyway so I’m at a loss to explain… Unless

Re: TieredMergePolicyFactory question

2020-10-26 Thread Erick Erickson
"Some large segments were merged into 12GB segments and deleted documents were physically removed.” and “So with the current natural merge strategy, I need to update solrconfig.xml and increase the maxMergedSegmentMB often" I strongly recommend you do not continue down this path. You’re making a m

Re: TieredMergePolicyFactory question

2020-10-23 Thread Erick Erickson
of deletes as part of the REGULAR segment > merging cycles. Is my understanding correct? > > > > > On Fri, Oct 23, 2020 at 9:47 AM Erick Erickson > wrote: > >> Just go ahead and optimize/forceMerge, but do _not_ optimize to one >> segment. Or you can expungeDelet

Re: TieredMergePolicyFactory question

2020-10-23 Thread Erick Erickson
Just go ahead and optimize/forceMerge, but do _not_ optimize to one segment. Or you can expungeDeletes, that will rewrite all segments with more than 10% deleted docs. As of Solr 7.5, these operations respect the 5G limit. See: https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/ B

Re: When are the score values evaluated?

2020-10-22 Thread Erick Erickson
You’d get a much better idea of what goes on if you added &explain=true and analyzed the output. That’d show you exactly what is calculated when. Best, Erick > On Oct 22, 2020, at 4:05 AM, Taisuke Miyazaki > wrote: > > Hi, > > If you use a high value for the score, the values on the smaller s

Re: Add single or batch document in Solr 6.1.0

2020-10-20 Thread Erick Erickson
Batching is better, see: https://lucidworks.com/post/really-batch-updates-solr-2/ > On Oct 20, 2020, at 9:03 AM, vishal patel > wrote: > > I am using solr 6.1.0. We have 2 shards and each has one replica. > > I want to insert 100 documents in one collection. I am using below code. > > org.ap

Re: [EXT: NEWSLETTER] SolrDocument difference between String and text_general

2020-10-20 Thread Erick Erickson
Owen: Collection reload is necessary but not sufficient. You’ll still get wonky results even if you re-index everything unless you delete _all_ the documents first or start with a whole new collection. Each Lucene index is a “mini index” with its own picture of the structure of that index (i.e.

Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread Erick Erickson
uyilmaz: Hmm, that _is_ confusing. And inaccurate. In this context, it should read something like The Text field should have indexed="true" docValues=“false" if used for searching but not faceting and the String field should have indexed="false" docValues=“true" if used for faceting but not s

Re: converting string to solr.TextField

2020-10-17 Thread Erick Erickson
Did you read the long explanation in this thread already about segment merging? If so, can you ask specific questions about the information in those? Best, Erick > On Oct 17, 2020, at 8:23 AM, Vinay Rajput wrote: > > Sorry to jump into this discussion. I also get confused whenever I see this >

Re: Index Replication Failure

2020-10-17 Thread Erick Erickson
None of your images made it through the mail server. You’ll have to put them somewhere and provide a link. > On Oct 17, 2020, at 5:17 AM, Parshant Kumar > wrote: > > Architecture image: If not visible in previous mail > > > > > On Sat, Oct 17, 2020 at 2:38 PM Parshant Kumar > wrote: > Hi

Re: converting string to solr.TextField

2020-10-16 Thread Erick Erickson
pe is still in the index until > merged/optimized as well, wouldnt that cause almost the same conflicts > until then? > > On Fri, Oct 16, 2020 at 3:51 PM Erick Erickson > wrote: > >> Doesn’t re-indexing a document just delete/replace…. >> >> It’s complicated.

Re: Info about legacyMode cluster property

2020-10-16 Thread Erick Erickson
You should not be using the core api to do anything with cores in SolrCloud. True, under the covers the collections API uses the core API to do its tricks, but you have to use it in a very precise manner. As for legacyMode, don’t use it, please. it’s not supported any more, has been completely re

Re: converting string to solr.TextField

2020-10-16 Thread Erick Erickson
Doesn’t re-indexing a document just delete/replace…. It’s complicated. For the individual document, yes. The problem comes because the field is inconsistent _between_ documents, and segment merging blows things up. Consider. I have segment1 with documents indexed with the old schema (String in th

Re: Rotate Solr Logfiles

2020-10-15 Thread Erick Erickson
Possibly browser caches? Try using a private window or purging your browser caches. Shot in the dark… Best, Erick > On Oct 15, 2020, at 5:41 AM, DINSD | SPAutores > wrote: > > Hi, > > I'm currently using Solr-8.5.2 with a default log4j2.xml and trying to do the > following : > > Each time

Re: solr1.3からsolr8.4へのデータ移行について

2020-10-14 Thread Erick Erickson
Kaya is certainly correct. I’d add that Solr has changed so much in the last 12 years that you should treat it as a new Solr installation. Do not, for instance, just use the configs from Solr 1.3, start with the ones from the version of Solr you install. Best, Erick > On Oct 14, 2020, at 3:41 AM,

Re: Need urgent help -- High cpu on solr

2020-10-14 Thread Erick Erickson
Zisis makes good points. One other thing is I’d look to see if the CPU spikes coincide with commits. But GC is where I’d look first. Continuing on with the theme of caches, yours are far too large at first glance. The default is, indeed, size=512. Every time you open a new searcher, you’ll be exe

Re: Memory line in status output

2020-10-12 Thread Erick Erickson
Solr doesn’t manage this at all, it’s the JVM’s garbage collection that occasionally kicks in. In general, memory creeps up until the GC threshold is set (which there are about a zillion parameters that you can set) and then GC kicks in. Generally, the recommendation is to use the G1GC collector a

Re: Any solr api to force leader on a specified node

2020-10-12 Thread Erick Erickson
First, I totally agree with Walter. See: https://lucidworks.com/post/indexing-with-solrj/ Second, DIH is being deprecated. It is being moved to a package that will be supported if, and only if, there is enough community support for it. “Community support” means people who use it need to step up

Re: Solr Memory

2020-10-10 Thread Erick Erickson
_Have_ they crashed due to OOMs? It’s quite normal for Java to create a sawtooth pattern of memory consumption. If you attach, say, jconsole to the running Solr and hit the GC button, does the memory drop back? To answer your question, though, no there’s no reason memory should creep. That said, t

Re: PositionGap

2020-10-09 Thread Erick Erickson
No. It’s just an integer multiplication. X * 5 is no different than X*1... > On Oct 9, 2020, at 2:48 PM, Jae Joo wrote: > > Does increasing of Position Gap make Search Slow? > > Jae

Re: Folding Repeated Letters

2020-10-09 Thread Erick Erickson
Anything you do will be wrong ;). I suppose you could kick out words that weren’t in some dictionary and accumulate a list of words not in the dictionary and just deal with them “somehow", but that’s labor-intensive since you then have to deal with proper names and the like. Sometimes you can g

Re: Question about solr commits

2020-10-08 Thread Erick Erickson
This is a bit confused. There will be only one timer that starts at time T when the first doc comes in. At T+ 15 seconds, all docs that have been received since time T will be committed. The first doc to hit Solr _after_ T+15 seconds starts a single new timer and the process repeats. Best, rick >

Re: Master/Slave

2020-10-08 Thread Erick Erickson
What Jan said. I wanted to add that the replication API also makes use of it. A little-known fact: you can use the replication API in SolrCloud _without_ configuring anything in solrconfig.xml. You can specify the URL to pull from on the fly in the command…. Best, Erick > On Oct 8, 2020, at 2:

Re: Question about solr commits

2020-10-07 Thread Erick Erickson
Yes. > On Oct 7, 2020, at 6:40 PM, yaswanth kumar wrote: > > I have the below in my solrconfig.xml > > > > ${solr.Data.dir:} > > > ${solr.autoCommit.maxTime:6} > false > > > ${solr.autoSoftCommit.maxTime:5000} > > > > Does this mean even thoug

Re: Java GC issue investigation

2020-10-06 Thread Erick Erickson
12G is not that huge, it’s surprising that you’re seeing this problem. However, there are a couple of things to look at: 1> If you’re saying that you have 16G total physical memory and are allocating 12G to Solr, that’s an anti-pattern. See: https://blog.thetaphi.de/2012/07/use-lucenes-mmapdire

Re: Solr Issue - Logger : UpdateLog Error Message : java.io.EOFException

2020-10-03 Thread Erick Erickson
Very strange things start to happen when GC becomes unstable. The first and simplest thing to do would be to bump up your heap, say to 20g (note, try to stay under 32G or be prepared to jump significantly higher. At 32G long pointers have to be used and you actually have less memory available th

Re: Transaction not closed on ms sql

2020-10-01 Thread Erick Erickson
First of all, I’d just use a stand-alone program to do your processing for a number of reasons, see: https://lucidworks.com/post/indexing-with-solrj/ 1- I suspect your connection will be closed eventually. Since it’s expensive to open one of these, the driver may keep it open for a while. 2

Re: Slow Solr 8 response for long query

2020-09-30 Thread Erick Erickson
Increasing the number of rows should not have this kind of impact in either version of Solr, so I think there’s something fundamentally strange in your setup. Whether returning 10 or 300 documents, every document has to be scored. There are two differences between 10 and 300 rows: 1> when retu

Re: How to Resolve : "The request took too long to iterate over doc values"?

2020-09-29 Thread Erick Erickson
Let’s see the query. My bet is that you are _searching_ against the field and have indexed=false. Searching against a docValues=true indexed=false field results in the equivalent of a “table scan” in the RDBMS world. You may use the docValues efficiently for _function queries_ to mimic some searc

Re: Solr storage of fields <-> indexed data

2020-09-28 Thread Erick Erickson
Fields are placed in the index totally separately from each other, so it’s no wonder that removing the copyField results in this kind of savings. And they have to be separate. Consider what comes out of the end of the analysis chain. The same input could produce totally different output. As a tri

Re: SOLR Cursor Pagination Issue

2020-09-28 Thread Erick Erickson
TATUS:C^-30&rows=1&cursorMark=* > > DocId does not change during data update.During data updating process in > solrCloud skript returnd incorect Number of requests and Collected series. > > Best, > Vlad > > > > Mon, 28 Sep 2020 08:54:57 -0400,

Re: Unable to upload updated solr config set

2020-09-28 Thread Erick Erickson
Until then, you can use bin/solr zk upconfig…. Best, Erick > On Sep 28, 2020, at 10:06 AM, Houston Putman wrote: > > Until the next Solr minor version is released you will not be able to > overwrite an existing configSet with a new configSet of the same name. > > The ticket for this feature i

Re: Corrupted records after successful commit

2020-09-28 Thread Erick Erickson
> 2. The second entry is really strange; this isn't a valid record at all and > I don't have any record of creating it. > > I've ruled out reindexing items both from my indexing script (I just don't > run it) and an external code snippet updating the record at a lat

Re: SOLR Cursor Pagination Issue

2020-09-28 Thread Erick Erickson
Define “incorrect” please. Also, showing the exact query you use would be helpful. That said, indexing data at the same time you are using CursorMark is not guaranteed do find all documents. Consider a sort with date asc, id asc. doc53 has a date of 2001 and you’re already returned the doc. Ne

Re: Corrupted records after successful commit

2020-09-28 Thread Erick Erickson
There are several possibilities: 1> you simply have some process incorrectly updating documents. 2> you’ve changed your schema sometime without completely deleting your old index and re-indexing all documents from scratch. I recommend in fact indexing into a new collection and using collection

Re: Solr 8.6.2 text_general

2020-09-25 Thread Erick Erickson
Uhhh, this is really dangerous. If you’ve indexed documents since upgrading, some were indexed with multiValued=false. Now you’ve changed the definition at a fundamental Lucene level and Things Can Go Wrong. You’re OK if (and only if) you have indexed _no_ documents since you upgraded. But

Re: TimeAllowed and Partial Results

2020-09-22 Thread Erick Erickson
TimeAllowed stops looking when the timer expires. If it hasn’t found any docs with a non-zero score by then, you’ll get zero hits. It has to be this way, because Solr doesn’t know whether a doc is a hit until Solr scores it. So this is normal behavior, assuming that some part of the processing

Re: Many small instances, or few large instances?

2020-09-21 Thread Erick Erickson
In a word, yes. G1GC still has spikes, and the larger the heap the more likely you’ll be to encounter them. So having multiple JVMS rather than one large JVM with a ginormous heap is still recommended. I’ve seen some cases that used the Zing zero-pause product with very large heaps, but they we

Re: Pining Solr

2020-09-18 Thread Erick Erickson
Well, this is doesn’t look right at all: /solr/cpsearch/select_cpsearch/select It should just be: /solr/cpsearch/select_cpsearch Best, Erick > On Sep 18, 2020, at 3:18 PM, Steven White wrote: > > /solr/cpsearch/select_cpsearch/select

Re: Pining Solr

2020-09-18 Thread Erick Erickson
This looks kind of confused. I’m assuming what you’re after is a way to get to your select_cpsearch request handler to test if Solr is alive and calling that “ping”. The ping request handler is just that, a separate request handler that you hit by going to http://sever:port/solr/admin/ping. I

Re: Handling failure when adding docs to Solr using SolrJ

2020-09-17 Thread Erick Erickson
I recommend _against_ issuing explicit commits from the client, let your solrconfig.xml autocommit settings take care of it. Make sure either your soft or hard commits open a new searcher for the docs to be searchable. I’ll bend a little bit if you can _guarantee_ that you only ever have one index

Re: Doing what does using SolrJ API

2020-09-17 Thread Erick Erickson
e of the simpler URPs >> will do the job or a chain of them, that may be a better path to take. >> >> Regards, >> Alex. >> >> >> On Thu, 17 Sep 2020 at 13:13, Steven White wrote: >> >>> Thanks Erick. Where can I learn more about "sta

Re: Doing what does using SolrJ API

2020-09-17 Thread Erick Erickson
s of those fields to the catch-all field and the catch-all field > is my default search field, I don't expect any problem for having 1000 > fields in Solr's schema, or should I? > > Thanks > > Steven > > > On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson > wrote: &

Re: Unable to create core Solr 8.6.2

2020-09-17 Thread Erick Erickson
Look in your solr log, there’s usually a more detailed message > On Sep 17, 2020, at 9:35 AM, Anuj Bhargava wrote: > > Getting the following error message while trying to create core > > # sudo su - solr -c "/opt/solr/bin/solr create_core -c 9lives" > WARNING: Using _default configset with data

Re: Doing what does using SolrJ API

2020-09-17 Thread Erick Erickson
“there over 1000 of them[fields]” This is often a red flag in my experience. Solr will handle that many fields, I’ve seen many more. But this is often a result of “database thinking”, i.e. your mental model of how all this data is from a DB perspective rather than a search perspective. It’s unw

Re: join query limitations

2020-09-14 Thread Erick Erickson
e" docValues="true" > /> > > --- > Yes it is just for functions, sorting, and boosting > > On Mon, Sep 14, 2020 at 4:51 PM Erick Erickson > wrote: >> >> Have you seen “In-place updates”? >> >> See: >> https://lucene.apache.

Re: join query limitations

2020-09-14 Thread Erick Erickson
Have you seen “In-place updates”? See: https://lucene.apache.org/solr/guide/8_1/updating-parts-of-documents.html Then use the field as part of a function query. Since it’s non-indexed, you won’t be searching on it. That said, you can do a lot with function queries to satisfy use-cases. Best. Er

  1   2   3   4   5   6   7   8   9   10   >