Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-20 Thread Yonik Seeley
Congrats Jan! Go Solr! -Yonik On Thu, Feb 18, 2021 at 1:56 PM Anshum Gupta wrote: > Hi everyone, > > I’d like to inform everyone that the newly formed Apache Solr PMC nominated > and elected Jan Høydahl for the position of the Solr PMC Chair and Vice > President. This decision was approved by t

Re: Help using Noggit for streaming JSON data

2020-09-17 Thread Yonik Seeley
See this method: /** Reads a JSON string into the output, decoding any escaped characters. */ public void getString(CharArr output) throws IOException And then the idea is to create a subclass of CharArr to incrementally handle the string that is written to it. You could overload write method

Re: Solr admin interface freezes on Chrome

2019-10-02 Thread Yonik Seeley
Can someone open a JIRA to track this problem? -Yonik On Wed, Oct 2, 2019 at 7:04 PM Solr User wrote: > > Works fine on Firefox, and I > > haven't made any changes to our Solr instance (v8.1.1) in a while. > > Had a co-worker with a similar issue. He had a pop-blocker enabled in > chrome that wa

Re: Optimizing fq query performance

2019-04-13 Thread Yonik Seeley
More constrained but matching the same set of documents just guarantees that there is more information to evaluate per document matched. For your specific case, you can optimize fq = 'field1:* AND field2:value' to &fq=field1:*&fq=field2:value This will at least cause field1:* to be cached and reuse

Re: Problem with white space or special characters in function queries

2019-03-29 Thread Yonik Seeley
On Thu, Mar 28, 2019 at 6:05 PM Jan Høydahl wrote: > Functions can never contain spaces. Spaces work fine in functions in general. The issue is the "bf" parameter as it uses whitespace to delimit multiple functions IIRC. -Yonik > Try to substitute the term with a variable, i.e. a request pa

Re: Solr 7.X negative filter not working

2018-09-20 Thread Yonik Seeley
I just tried the master branch quickly, and I can't reproduce this. "params":{ "q":"*:*", "debug":"true", "fq":"title_t:(NOT Kings)"}}, [...] "QParser":"LuceneQParser", "filter_queries":["title_t:(NOT Kings)"], "parsed_filter_queries":["-title_t:kings"], Knowing

Re: CACHE -> fieldValueCache usage

2018-09-20 Thread Yonik Seeley
On Wed, Sep 19, 2018 at 9:44 AM Vincenzo D'Amore wrote: > Looking at Solr Admin Panel I've found the CACHE -> fieldValueCache tab > where all the values are 0. > > [...] > > what do you thing, is that normal? Yep, that's completely normal. That cache is only used by certain operations on multi-

Re: 7.3 appears to leak

2018-06-28 Thread Yonik Seeley
> * SortedIntDocSet instances ánd ConcurrentLRUCache$CacheEntry instances are > both leaked on commit; If these are actually filterCache entries being leaked, it stands to reason that a whole searcher is being leaked somewhere. -Yonik

Re: Retrieving json.facet from a search

2018-06-28 Thread Yonik Seeley
There isn't typed support, but you can use the generic support like so: .getResponse().get("facets") -Yonik On Thu, Jun 28, 2018 at 2:31 PM, Webster Homer wrote: > I have a fairly large existing code base for querying Solr. It is > architected where common code calls solr and returns a solrj Qu

Re: Solr 7.3, FunctionScoreQuery no longer displays debug output

2018-05-17 Thread Yonik Seeley
If this used to work, I wonder if it's something to do with changes to boost: https://issues.apache.org/jira/browse/LUCENE-8099 -Yonik On Thu, May 17, 2018 at 5:48 PM, Markus Jelsma wrote: > Hello, > > Sorry to disturb. Is there anyone here able to reproduce and verify this > issue? > > Many t

Re: Error using multiple terms in function query

2018-05-15 Thread Yonik Seeley
Problems like this are usually caused by the whole query not even making it to Solr due to bad HTTP param encoding. For example, if you're using curl with request parameters in the URL, you need to manually encode spaces as either "+" or "%20" -Yonik On Tue, May 15, 2018 at 7:41 PM, Shamik Bando

Re: Solr Json Facet

2018-05-08 Thread Yonik Seeley
; This is the HTTP response: > > response.content > > ' 2.0//EN">\n\n400 Bad > Request\n\nBad Request\nYour browser sent > a request that this server could not understand. />\n\n\nApache/2.2.15 (Oracle) Server at leydenh Port > 80\n\n' > > > Thank you,

Re: Solr Json Facet

2018-05-08 Thread Yonik Seeley
On Tue, May 8, 2018 at 1:36 PM, Kojo wrote: > If I tag the fq query and I query for a simple word it works fine too. But > if query a multi word with space in the middle it breaks: Most likely the full query is not getting to Solr because of an HTTP protocol error (i.e. the request is not encoded

Re: Error in indexing JSON with space in value

2018-03-22 Thread Yonik Seeley
29 2c 70 6f 73 69 74 char=(EOF),posit > 0120: 69 6f 6e 3d 32 34 20 41 46 54 45 52 3d 27 27 22 ion=24 AFTER=''" > 0130: 2c 0a 20 20 20 20 22 63 6f 64 65 22 3a 34 30 30 ,."code":400 > 0140: 7d 7d 0a }}. > { > &quo

Re: Error in indexing JSON with space in value

2018-03-22 Thread Yonik Seeley
It looks like a curl globbing issue from the curl error message you included: "curl: (3) [globbing] bad range specification in column 39" You can try turning off curl globbing with the -g param. That may not be the only issue though, as the command shown shouldn't have triggered curl globbing. Pe

Re: Issue Using JSON Facet API Buckets in Solr 6.6

2018-02-22 Thread Yonik Seeley
I've reproduced the issue and opened https://issues.apache.org/jira/browse/SOLR-12020 -Yonik On Thu, Feb 22, 2018 at 11:03 AM, Yonik Seeley wrote: > Thanks Antelmo, I'm trying to reproduce this now. > -Yonik > > > On Mon, Feb 19, 2018 at 10:13 AM, Antelmo Aguilar

Re: Issue Using JSON Facet API Buckets in Solr 6.6

2018-02-22 Thread Yonik Seeley
://pastebin.com/rsHvKK63 >> >> https://pastebin.com/8amxacAj >> >> I am not using any custom code or plugins with the Solr instance. >> >> Please let me know if you need anything else and thanks for looking into >> this. >> >> -Antelmo >&g

Re: facet.method=uif not working in solr cloud?

2018-02-15 Thread Yonik Seeley
we are doing frequent auto commits, fieldvaluecache will be invalidated > and uif will have to pay the upfront cost again after each commit? Right. It's not good for frequently changing indexes. -Yonik > > > On Wed, Feb 14, 2018 at 11:51 AM, Yonik Seeley wrote: > >> On We

Re: facet.method=uif not working in solr cloud?

2018-02-14 Thread Yonik Seeley
that cost has been paid. -Yonik > On Tue, Feb 13, 2018 at 7:41 AM, Yonik Seeley wrote: > >> Great, thanks for tracking that down! >> It's interesting that a mincount of 0 disables uif processing in the >> first place. IIRC, it's only the hash-based method (as opp

Re: Issue Using JSON Facet API Buckets in Solr 6.6

2018-02-14 Thread Yonik Seeley
Could you provide the full stack trace containing "Invalid Date String" and the full request that causes it? Are you using any custom code/plugins in Solr? -Yonik On Mon, Feb 12, 2018 at 4:55 PM, Antelmo Aguilar wrote: > Hi, > > I was using the following part of a query to get facet buckets so

Re: facet.method=uif not working in solr cloud?

2018-02-13 Thread Yonik Seeley
Great, thanks for tracking that down! It's interesting that a mincount of 0 disables uif processing in the first place. IIRC, it's only the hash-based method (as opposed to array-based) that can't return zero counts. -Yonik On Tue, Feb 13, 2018 at 6:17 AM, Alessandro Benedetti wrote: > *Update

Re: facet.method=uif not working in solr cloud?

2018-02-12 Thread Yonik Seeley
Feels like we should open an issue for this (that facet.method=uif is only respected if you specify another esoteric parameter...) -Yonik On Mon, Feb 12, 2018 at 8:34 PM, Wei wrote: > Adding facet.distrib.mco=true did the trick. Thanks Toke and Alessandro! > > Cheers, > Wei > > On Thu, Feb 8,

Re: Solr4 To Solr6 CPU load issues

2018-02-12 Thread Yonik Seeley
On Sun, Feb 11, 2018 at 8:47 AM, ~$alpha` wrote: > I have upgraded Solr4.0 Beta to Solr6.6. The Cache results look Awesome but > overall the CPU load on solr6.6 is double the load on solr4.0 and hence I am > not able to roll solr6.6 to 100% of my traffic. > > *Some Key Stats In Performance of Sol6

Re: Solr 7.2.1 - cursorMark and elevateIds

2018-01-25 Thread Yonik Seeley
Yes, please open a JIRA issue. The elevate component modifies the sort parameter, and it looks like that doesn't play well with cursorMark, which needs to serialize/deserialize sort values. We can either fix the issue, or at a minimum provide a better error message if cursorMark is limited to sorti

Re: Json Facet Query Stripping Field Name with Hyphen

2018-01-04 Thread Yonik Seeley
The JSON Facet API uses the function query parser for something like sum(week_-91) so you'll probably have problems with any function that uses these fields as well. As Erick says, you're better off renaming the fields. There is a workaround for wonky field names via the "field" function: sum(fiel

Re: Solr Aggregation queries are way slower than Elastic Search

2017-12-12 Thread Yonik Seeley
;re slower than ES for some reason, it should be very easy to fix. -Yonik > On Tue, Dec 12, 2017 at 7:27 PM, Yonik Seeley wrote: > >> OK great, so it's definitely not the main query (which is just a >> single term query in this example!) >> >> > Also I hav

Re: Solr Aggregation queries are way slower than Elastic Search

2017-12-12 Thread Yonik Seeley
#x27; stats.field='{!sum=true }metric_78' > stats.field='{!sum=true }metric_79' stats.field='{!sum=true > }metric_80' stats.field='{!sum=true }metric_81' > stats.field='{!sum=true }metric_82' stats.field='{!sum=true > }metric_83'

Re: Solr Aggregation queries are way slower than Elastic Search

2017-12-11 Thread Yonik Seeley
I think the SolrJ below uses the old stats component. Hopefully the JSON Facet API would be faster for this, but it's not completely clear what the main query here looks like, and if it's the source of any bottleneck rather than the aggregations. What does the generated query string actually look l

Re: Skewed IDF in multi lingual index, again

2017-12-05 Thread Yonik Seeley
On Tue, Dec 5, 2017 at 5:15 AM, alessandro.benedetti wrote: > "Lucene/Solr doesn't actually delete documents when you delete them, it > just marks them as deleted. I'm pretty sure that the difference between > docCount and maxDoc is deleted documents. Maybe I don't understand what > I'm talking

Re: Skewed IDF in multi lingual index, again

2017-12-04 Thread Yonik Seeley
On Mon, Dec 4, 2017 at 1:35 PM, Shawn Heisey wrote: > I'm pretty sure that the difference between docCount and maxDoc is deleted > documents. docCount (not the best name) here is the number of documents with the field being searched. docFreq (df) is the number of documents actually containing t

Re: JVM GC Issue

2017-12-03 Thread Yonik Seeley
On Sat, Dec 2, 2017 at 8:59 PM, S G wrote: > I am a bit curious on the docValues implementation. > I understand that docValues do not use JVM memory and > they make use of OS cache - that is why they are more performant. > > But to return any response from the docValues, the values in the > docVal

Re: Solr 7.x: Issues with unique()/hll() function on a string field nested in a range facet

2017-11-21 Thread Yonik Seeley
I opened https://issues.apache.org/jira/browse/SOLR-11664 to track this. I should be able to look into this shortly if no one else does. -Yonik On Tue, Nov 21, 2017 at 6:02 PM, Yonik Seeley wrote: > Thanks for the complete info that allowed me to easily reproduce this! > The bug se

Re: Solr 7.x: Issues with unique()/hll() function on a string field nested in a range facet

2017-11-21 Thread Yonik Seeley
Thanks for the complete info that allowed me to easily reproduce this! The bug seems to extend beyond hll/unique... I tried min(string_s) and got wonky results as well. -Yonik On Tue, Nov 21, 2017 at 7:47 AM, Volodymyr Rudniev wrote: > Hello, > > I've encountered 2 issues while trying to apply

Re: Nested facet complete wrong counts

2017-11-11 Thread Yonik Seeley
Also, If you're looking at all constraints, you shouldn't need refine:true But if you do need it, it was only added in Solr 7.0 (and I see you're using 6.6) -Yonik On Sat, Nov 11, 2017 at 9:48 AM, Yonik Seeley wrote: > On Sat, Nov 11, 2017 at 9:18 AM, Kenny Knecht wrote: &g

Re: Nested facet complete wrong counts

2017-11-11 Thread Yonik Seeley
On Sat, Nov 11, 2017 at 9:18 AM, Kenny Knecht wrote: > Hi Yonik, > > I am aware of the estimate on the hll. But we don't use the hll as a > baseline for comparison. We ask the values for one facet (for example > Gender). We store these counts for each bucket. Next we do another request. > This tim

Re: Nested facet complete wrong counts

2017-11-10 Thread Yonik Seeley
I do notice you are using hll (hyper-log-log) which is a distributed cardinality *estimate* : https://en.wikipedia.org/wiki/HyperLogLog -Yonik On Fri, Nov 10, 2017 at 11:32 AM, kenny wrote: > Hi all, > > We are doing some tests in solr 6.6 with json facet api and we get > completely wrong count

Re: Upgrade path from 5.4.1

2017-11-01 Thread Yonik Seeley
On Wed, Nov 1, 2017 at 2:36 PM, Erick Erickson wrote: > I _always_ prefer to reindex if possible. Additionally, as of Solr 7 > all the numeric types are deprecated in favor of points-based types > which are faster on all fronts and use less memory. They are a good step forward in genera, and fast

Re: Really slow facet performance in 6.6

2017-10-25 Thread Yonik Seeley
On Mon, Oct 23, 2017 at 3:06 PM, John Davis wrote: > Hello, > > We are seeing really slow facet performance with new solr release. This is > on an index of 2M documents. A few things we've tried: What happens when you run this facet request again? The first time a UIF faceting method runs for a f

Re: Jetty maxThreads

2017-10-20 Thread Yonik Seeley
The high number of maxThreads is to avoid distributed deadlock. The fix is multiple thread pools, depending on request type: https://issues.apache.org/jira/browse/SOLR-7344 -Yonik On Wed, Oct 18, 2017 at 4:41 PM, Walter Underwood wrote: > Jetty maxThreads is set to 10,000 which seams way too bi

Re: Solr facets counts deep paged returns inconsistent counts

2017-10-20 Thread Yonik Seeley
t;discoveres" a new constraint which ranks higher). Regular faceting does more overrequest by default, and does refinement by default. So adding refine:true and a deeper overrequest for json facets should perform equivalently. -Yonik Kenny > > On 20-10-17 17:12, Yonik Seeley wrote: &g

Re: Solr facets counts deep paged returns inconsistent counts

2017-10-20 Thread Yonik Seeley
Facet refinement in Solr guarantees that counts for returned constraints are correct, but does not guarantee that the top N returned isn't missing a constraint. Consider the following shard counts (3 shards) for the following constraints (aka facet values): constraintA: 2 0 0 constraintB: 0 2 0 co

Re: Trying to fix Too Many Boolean Clauses Exception

2017-10-18 Thread Yonik Seeley
On Wed, Oct 18, 2017 at 12:23 PM, Erick Erickson wrote: > What have you tried? And what is the current setting? > > This usually occurs when you are assembling very large OR clauses, > sometimes for ACL calculations. > > So if you have a query of the form > q=field:(A OR B OR C OR) > or >

Re: Concern on solr commit

2017-10-18 Thread Yonik Seeley
On Wed, Oct 18, 2017 at 5:09 AM, Leo Prince wrote: > Is there any known negative impacts in setting up autoSoftCommit as 1 > second other than RAM usage..? Briefly: Don't use autowarming (but keep caches enabled!) Use docValues for fields you will facet and sort on (this will avoid using FieldCac

Re: [ANNOUNCE] Apache Solr 7.1.0 released

2017-10-17 Thread Yonik Seeley
It pointed to 7.1.0 for me perhaps a browser cache issue? Anyway, you can go directly as well: http://www.apache.org/dyn/closer.lua/lucene/solr/7.1.0 -Yonik On Tue, Oct 17, 2017 at 11:25 AM, Susheel Kumar wrote: > Thanks, Shalin. > > But the download mirror still has 7.0.1 not 7.1.0. > > ht

Re: Concern on solr commit

2017-10-17 Thread Yonik Seeley
Related: maxWarmingSearchers behavior was fixed (block for another commit to succeed first rather than fail) in Solr 6.4 and later. https://issues.apache.org/jira/browse/SOLR-9712 Also, if any of your "realtime" search requests only involve retrieving certain documents by ID, then you can use "rea

Re: FieldValueCache in solr 6.6

2017-10-06 Thread Yonik Seeley
On Fri, Oct 6, 2017 at 12:45 PM, sile wrote: > Hi Yonik, > > Thanks for your answer :). > > It works. > > Another question: > > What is recommended to be used in solr 6.6 for faceting (docValues or > UnInvertedField), because UnInvertedField performs better for subsequent > requests? > > I assume

Re: FieldValueCache in solr 6.6

2017-10-06 Thread Yonik Seeley
If you're using regular faceting (as opposed to the JSON Facet API), you can try facet.method=uif https://issues.apache.org/jira/browse/SOLR-8466 Background: UIF (UnInvertedField which are the entries in the FieldValueCache) was completely removed from use at some point in the 5.x timeframe. It wa

Re: FilterCache size should reduce as index grows?

2017-10-06 Thread Yonik Seeley
On Fri, Oct 6, 2017 at 6:50 AM, Toke Eskildsen wrote: > Letting the default use maxSizeMB would be better IMO. But I assume > that FastLRUCache is used for a reason, so that would have to be > extended to support that parameter first. FastLRUCache is the default on the filter cache because it was

Re: FilterCache size should reduce as index grows?

2017-10-05 Thread Yonik Seeley
On Thu, Oct 5, 2017 at 3:20 AM, Toke Eskildsen wrote: > On Wed, 2017-10-04 at 21:42 -0700, S G wrote: > > It seems that the memory limit option maxSizeMB was added in Solr 5.2: > https://issues.apache.org/jira/browse/SOLR-7372 > I am not sure if it works with all caches in Solr, but in my world it

Re: FilterCache size should reduce as index grows?

2017-10-05 Thread Yonik Seeley
On Thu, Oct 5, 2017 at 10:07 AM, Erick Erickson wrote: > The other thing I'd point out is that if your hit ratio is low, you > might as well disable it entirely. I'd normally recommend against turning it off entirely, except in *very* custom cases. Even if the user doesn't reuse filter queries,

Re: SOLR 6.1 | Continuous hits coming for unwanted URL pattern

2017-09-26 Thread Yonik Seeley
Looks like it's some sort of ping (liveness) query, probably from a load balancer? Actually, it looks like it's a SolrJ client... here's the code that sets up that exact query: https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/LBHttpSolrClient.j

Re: When will be solr 7.1 released?

2017-09-26 Thread Yonik Seeley
On Tue, Sep 26, 2017 at 2:02 PM, Nawab Zada Asad Iqbal wrote: > Thanks Yonik and Erick. > > That is helpful. > I am slightly confused about the branch name conventions. I expected 7x to > be named as branch_7_0 branch_7x is the main branch for all 7.x releases. When it's time for 7.1 to be relea

Re: When will be solr 7.1 released?

2017-09-26 Thread Yonik Seeley
One can also use a nightly snapshot build to try out the latest stuff: 7.x: https://builds.apache.org/job/Solr-Artifacts-7.x/lastSuccessfulBuild/artifact/solr/package/ 8.0: https://builds.apache.org/job/Solr-Artifacts-master/lastSuccessfulBuild/artifact/solr/package/ -Yonik On Tue, Sep 26, 201

Re: Consecutive calls to a query give different results

2017-09-07 Thread Yonik Seeley
n't (and can't easily) update statistics when a document is marked as deleted. -Yonik > Erick > > On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley wrote: >> Different replicas of the same shard can have different numbers of >> deleted documents (really just marked

Re: Consecutive calls to a query give different results

2017-09-06 Thread Yonik Seeley
Different replicas of the same shard can have different numbers of deleted documents (really just marked as deleted), and deleted documents are irrelevant to term statistics (like the number of documents a term appears in). Documents marked for deletion stop contributing to corpus statistics when

Re: NumberFormatException for multvalue, pint

2017-09-06 Thread Yonik Seeley
On Wed, Sep 6, 2017 at 4:09 PM, Steve Pruitt wrote: > Can't get a multi-valued pint field to update. > > The schema defines the field: multiValued="true" required="false" docValues="true" stored="true"/> > > I get the exception on this input: 7780386,7313483 > > Caused by: java.lang.NumberForma

Re: slow solr facet processing

2017-09-05 Thread Yonik Seeley
The number-of-segments noise probably swamps this... but one optimization around deep-facet-paging that didn't get carried forward is https://issues.apache.org/jira/browse/SOLR-2092 -Yonik On Tue, Sep 5, 2017 at 6:49 AM, Toke Eskildsen wrote: > On Mon, 2017-09-04 at 11:03 -0400, Yoni

Re: slow solr facet processing

2017-09-04 Thread Yonik Seeley
On Mon, Sep 4, 2017 at 6:38 AM, Toke Eskildsen wrote: > On Mon, 2017-09-04 at 13:21 +0300, Ere Maijala wrote: >> Thanks for the insight, Yonik. I can confirm that #2 is true. I ran >> >> >> >> and after it completed I was able to retrieve 2000 values in 17ms. > > Very interesting. Is this on spin

Re: slow solr facet processing

2017-09-01 Thread Yonik Seeley
0.2, but a whole lot better. It seems >> that docValues needs to be disabled for facet.method=uif to have effect >> though, which is unfortunate. Otherwise it reports that applied method is >> UIF, but the performance is actually much worse than with FC. I'll do just >> another

Re: slow solr facet processing

2017-09-01 Thread Yonik Seeley
nfortunate. Otherwise it reports that applied method is >> UIF, but the performance is actually much worse than with FC. I'll do just >> another round of testing to verify all this. I can report to SOLR-8096 when >> I have something conclusive. >> >> --Ere >&

Re: slow solr facet processing

2017-08-31 Thread Yonik Seeley
A possible improvement for some multiValued fields might be to use the "uif" facet method (UnInvertedField was the default method for multiValued fields in 4.x) I'm not sure if you would need to reindex without docValues on that field to try it though. Example: to enable on the "union" field, add

Re: Huge Facets and Streaming

2017-08-21 Thread Yonik Seeley
On Mon, Aug 21, 2017 at 6:01 AM, Mikhail Khludnev wrote: > Hello! > > I need to count really wide facet on 30 shards index with roughly 100M > docs, the facet response is about 100M values takes 0.5G in text file. > > So, far I experimented with old facets. It calculates per shard facets > fine, b

Re: QueryParser changes query by itself

2017-08-16 Thread Yonik Seeley
The queryCache shouldn't be involved, this is somehow an issue in parsing (and Solr doesn't currently cache parsing). Perhaps there is something shared in your SynonymQParser instances that isn't quite thread safe? It could also be something in the text analysis in lucene as well (related to the ne

Re: JSON facet SUM precision and accuracy is incorrect

2017-08-08 Thread Yonik Seeley
This is due to function queries currently lacking type information (this problem will occur anywhere function queries are used and is not unique to JSON Facet). Function queries were originally only used in lucene scoring (which only uses float). The inner sum(amount1_d,amount2_d) uses SumFloatFunc

Re: _version_ as LongPointField returns error

2017-06-12 Thread Yonik Seeley
7;s not yet supported for Point* fields. -Yonik > On Mon, Jun 12, 2017 at 10:13 AM Yonik Seeley wrote: > >> I think the _version_ field should be >> - indexed="false" >> - stored="false" >> - docValues="true" >> >> -Yo

Re: _version_ as LongPointField returns error

2017-06-12 Thread Yonik Seeley
I think the _version_ field should be - indexed="false" - stored="false" - docValues="true" -Yonik On Mon, Jun 12, 2017 at 12:08 PM, Shawn Feldman wrote: > I changed all my TrieLong Fields to Point fields. _version_ always returns > an error unless i turn on docvalues > > > > > Gettin

Re: JSON facet performance for aggregations

2017-05-24 Thread Yonik Seeley
On Mon, May 8, 2017 at 11:27 AM, Yonik Seeley wrote: > I opened https://issues.apache.org/jira/browse/SOLR-10634 to address > this performance issue. OK, this has been committed. A quick test shows about a 30x speedup when faceting on a string/numeric docvalues field with 100K unique valu

Re: JSON facet performance for aggregations

2017-05-08 Thread Yonik Seeley
ming at that case? > > Please advise. > > Thanks > Mikhail > > -Original Message- > From: Yonik Seeley [mailto:ysee...@gmail.com] > Sent: Sunday, May 07, 2017 6:25 PM > To: solr-user@lucene.apache.org > Subject: Re: JSON facet performance for aggregations &g

Re: JSON facet performance for aggregations

2017-05-07 Thread Yonik Seeley
30, 2017 at 8:58 AM, Mikhail Ibraheem wrote: > Hi Yonik, > We are using Solr 6.5 > Both studentId and grades are double: >stored="true" docValues="true" multiValued="false" required="false"/> > > We have 1.5 million records. >

Re: Poll: Master-Slave or SolrCloud?

2017-04-30 Thread Yonik Seeley
On Tue, Apr 25, 2017 at 1:33 PM, Otis Gospodnetić wrote: > I think I saw mentions (maybe on user or dev MLs or JIRA) about > potentially, in the future, there only being SolrCloud mode (and dropping > SolrCloud name in favour of Solr). I personally never saw this actually happening, and not becau

Re: JSON facet performance for aggregations

2017-04-30 Thread Yonik Seeley
It is odd there would be quite such a big performance delta. What version of solr are you using? What is the fieldType of "grades"? -Yonik On Sun, Apr 30, 2017 at 5:15 AM, Mikhail Ibraheem wrote: > 1- > studentId has docValue = true . it is of type double which is name="double" class="solr.Trie

Re: prefix facet performance

2017-04-24 Thread Yonik Seeley
In SimpleFacets.getFacetTermEnumCounts, we seek to the first term matching the prefix using the index and then for each term after compare the prefix until it no longer matches. -Yonik On Mon, Apr 24, 2017 at 5:04 AM, alessandro.benedetti wrote: > Thanks Yonik and Maria. > It make sense, if we

Re: prefix facet performance

2017-04-21 Thread Yonik Seeley
On Fri, Apr 21, 2017 at 4:25 PM, Maria Muslea wrote: > The field is: > > > > and using unique() I found that it has 700K+ unique values. > > The query before (that takes ~10s): > > wt=json&indent=true&q=*:*&rows=0&facet=true&facet.field=concept&facet.prefix=A/ > > the query after (that is almost

Re: prefix facet performance

2017-04-18 Thread Yonik Seeley
How many unique values in the index? You could try facet.method=enum -Yonik On Tue, Apr 18, 2017 at 8:16 PM, Maria Muslea wrote: > Hi, > > I have ~40K documents in SOLR (not many) and a multivalued facet field that > contains at least 2K values per document. > > The values of the facet field lo

Re: Disable All kind of caching in Solr/Lucene

2017-03-31 Thread Yonik Seeley
On Fri, Mar 31, 2017 at 1:53 PM, Nilesh Kamani wrote: > @Alexandre - Could you please point me to reference doc to remove default > cache settings ? > > @Yonik - The code change is in Solr Indexer to sort the results. OK, so to test indexing performance, there are no caches to worry about (as lon

Re: Disable All kind of caching in Solr/Lucene

2017-03-31 Thread Yonik Seeley
On Fri, Mar 31, 2017 at 9:44 AM, Nilesh Kamani wrote: > I am planning to do load testing for some of my code changes and I need to > disable all kind of caching. Perhaps you should be aiming to either: 1) seek a config + query load that maximizes time spent in your code in order to optimize it 2)

Re: JSON Facet API Virtual Field Support

2017-03-24 Thread Yonik Seeley
On Fri, Mar 24, 2017 at 7:52 PM, Furkan KAMACI wrote: > Hi, > > I test JSON Facet API of Solr. Is it possible to create a virtual field > which is generated by using existing fields at response and supports > elementary arithmetic operations? > > Example: > > Schema fields: > > products, > sold_pr

Re: fq performance

2017-03-17 Thread Yonik Seeley
On Fri, Mar 17, 2017 at 2:17 PM, Shawn Heisey wrote: > On 3/17/2017 8:11 AM, Yonik Seeley wrote: >> For Solr 6.4, we've managed to circumvent this for filter queries and >> other contexts where scoring isn't needed. >> http://yonik.com/solr-6-4/ "More effici

Re: fq performance

2017-03-17 Thread Yonik Seeley
On Fri, Mar 17, 2017 at 9:09 AM, Shawn Heisey wrote: [...] > Lucene has a global configuration called "maxBooleanClauses" which > defaults to 1024. For Solr 6.4, we've managed to circumvent this for filter queries and other contexts where scoring isn't needed. http://yonik.com/solr-6-4/ "More ef

Re: Get handler not working

2017-03-16 Thread Yonik Seeley
at it has distributed the > documents appropriately from our basic testing. > > On Thu, Mar 16, 2017 at 9:42 AM David Hastings > wrote: > > i still would like to see an experiment where you change the field to id > instead of iqdocid, > > On Thu, Mar 16, 2017 at 9:33 AM

Re: Get handler not working

2017-03-16 Thread Yonik Seeley
Something to do with routing perhaps? (the mapping of ids to shards, by default is based on hashes of the id) -Yonik On Thu, Mar 16, 2017 at 9:16 AM, Chris Ulicny wrote: > iqdocid is already set to be the uniqueKey value. > > I tried reindexing a few documents back into the problematic cloud and

Re: Simulating group.facet for JSON facets, high mem usage w/ sorting on aggregation...

2017-02-10 Thread Yonik Seeley
FYI, I just opened https://issues.apache.org/jira/browse/SOLR-10122 for this -Yonik On Fri, Feb 10, 2017 at 4:32 PM, Yonik Seeley wrote: > On Thu, Feb 9, 2017 at 6:58 AM, Bryant, Michael > wrote: >> Hi all, >> >> I'm converting my legacy facets to JSON fac

Re: Simulating group.facet for JSON facets, high mem usage w/ sorting on aggregation...

2017-02-10 Thread Yonik Seeley
On Thu, Feb 9, 2017 at 6:58 AM, Bryant, Michael wrote: > Hi all, > > I'm converting my legacy facets to JSON facets and am seeing much better > performance, especially with high cardinality facet fields. However, the one > issue I can't seem to resolve is excessive memory usage (and OOM errors)

Re: ClassCastException: BasicResultContext cannot be cast to SolrDocumentList

2016-12-20 Thread Yonik Seeley
This is a bug (that code should no longer be expecting a SolrDocumentList) Can you open a JIRA issue? -Yonik On Tue, Dec 20, 2016 at 12:02 PM, Yago Riveiro wrote: > I'm hitting this exception in 6.3.0, any ideas? > > null:java.lang.ClassCastException: > org.apache.solr.response.BasicResultConte

Re: Nested JSON Facets (Subfacets)

2016-12-15 Thread Yonik Seeley
Interesting I don't recall a bug like that being fixed. Anyway, glad it works for you now! -Yonik On Thu, Dec 15, 2016 at 11:01 AM, Chantal Ackermann wrote: > Hi Yonik, > > after upgrading to Solr 6.3.0, the nested function works as expected! (Both > with and without docValues.) > > "facets

Re: Nested JSON Facets (Subfacets)

2016-12-14 Thread Yonik Seeley
That should work... what version of Solr are you using? Did you change the type of the popularity field w/o completely reindexing? You can try to verify the number of documents in each bucket that have the popularity field by adding another sub-facet next to cat_pop: num_pop:{query:"popularity:[*

Re: Rollback w/ Atomic Update

2016-12-13 Thread Yonik Seeley
On Tue, Dec 13, 2016 at 10:36 AM, Todd Long wrote: > We've noticed that partial updates are not rolling back with subsequent > commits based on the same document id. Our only success in mitigating this > issue has been to issue an empty commit immediately following the rollback. "rollback" is a l

Re: empty result set for a sort query

2016-12-12 Thread Yonik Seeley
Ah, 2-phase distributed search is the most likely answer (and currently classified as more of a limitation than a bug)... Phase 1 collects the top N ids from each shard (and merges them to find the global top N) Phase 2 retrieves the stored fields for the global top N If any of the ids have been d

Re: empty result set for a sort query

2016-12-11 Thread Yonik Seeley
On Sun, Dec 11, 2016 at 11:22 AM, moscovig wrote: > Hi > In solr 6.2.1 as server and solr 6.2.0 for client > It's a 2 shards index, 3 replicas for each shard. > > We are fetching the latest document with sorting over creationTime desc and > rows=1. > > At the same time we are committing sanity tes

Re: "on deck" searcher vs warming searcher

2016-12-09 Thread Yonik Seeley
We've got a patch to prevent the exceptions: https://issues.apache.org/jira/browse/SOLR-9712 -Yonik On Fri, Dec 9, 2016 at 7:45 PM, Joel Bernstein wrote: > The question about allowing more the one on-deck searcher is a good one. > The current behavior with maxWarmingSearcher config is to throw

Re: Solr 6 Performance Suggestions

2016-11-22 Thread Yonik Seeley
It depends highly on what your requests look like, and which ones are slower. If you're request mix is heterogeneous, find the types of requests that seem to have the largest slowdown and let us know what they look like. -Yonik On Tue, Nov 22, 2016 at 8:54 AM, Max Bridgewater wrote: > I migrate

Re: How to get "max(date)" from a facet field? (Solr 6.3)

2016-11-21 Thread Yonik Seeley
On Mon, Nov 21, 2016 at 3:42 PM, Michael Joyner wrote: > Help, > > (Solr 6.3) > > Trying to do a "sub-facet" using the new json faceting API, but can't seem > to figure out how to get the "max" date in the subfacet? > > I've tried a couple of different ways: > > == query == > > json.facet={ >

Re: SolrJ optimize method -- not returning immediately when the "wait" options are false

2016-11-08 Thread Yonik Seeley
https://issues.apache.org/jira/browse/SOLR-2018 There used to be a waitFlush parameter (wait until the IndexWriter has written all the changes) as well as a waitSearcher parameter (wait until a new searcher has been registered... i.e. whatever changes you made will be guaranteed to be visible). The

Re: Parallelize Cursor approach

2016-11-04 Thread Yonik Seeley
No, you can't get cursor-marks ahead of time. They are the serialized representation of the last sort values encountered (hence not known ahead of time). -Yonik On Fri, Nov 4, 2016 at 8:48 PM, Chetas Joshi wrote: > Hi, > > I am using the cursor approach to fetch results from Solr (5.5.0). Most

Re: Facets based on sampling

2016-11-04 Thread Yonik Seeley
Sampling has been on my TODO list for the JSON Facet API. How much it would help depends on where the bottlenecks are, but that in conjunction with a hashing approach to collection (assuming field cardinality is high) should definitely help. -Yonik On Fri, Nov 4, 2016 at 3:02 PM, John Davis wro

Re: Aggregate Values Inside a Facet Range

2016-11-04 Thread Yonik Seeley
On Fri, Nov 4, 2016 at 2:25 PM, Furkan KAMACI wrote: > I mean, I have to facet by dates and aggregate values inside that facet > range. Is it possible to do that without multiple queries at Solr? This (old) blog shows a percentiles calculation under a range facet: http://yonik.com/percentiles-for

Re: Merge policy

2016-10-27 Thread Yonik Seeley
On Thu, Oct 27, 2016 at 9:56 AM, Arkadi Colson wrote: > Thanks for the answer! > Do you know if there is a way to trigger an optimize for only 1 shard and > not the whole collection at once? > Adding a "distrib=false" parameter should work I think. -Yonik

Re: JSON Facet Syntax Sorting

2016-10-26 Thread Yonik Seeley
On Wed, Oct 26, 2016 at 3:16 AM, Zheng Lin Edwin Yeo wrote: > Hi, > > I'm using Solr 6.2.1. > > For the JSON Facet Syntax, are we able to sort on multiple values at one go? > > Like for example, if I want to sort by count, follow by the average price. > is this the correct way tot do? Sorting by

Re: Graph Traversal Question

2016-10-26 Thread Yonik Seeley
On Wed, Oct 26, 2016 at 7:13 AM, Grant Ingersoll wrote: > On Tue, Oct 25, 2016 at 6:26 PM Yonik Seeley wrote: > > In your example below it would be akin to injecting the rating onto those > responses as well, not just in the 'fq'. Gotcha... Yeah, I remember wondering

Re: Does _version_ field in schema need to be indexed and/or stored?

2016-10-25 Thread Yonik Seeley
On Tue, Oct 25, 2016 at 6:41 PM, Brent wrote: > I know that in the sample config sets, the _version_ field is indexed and not > stored, like so: > > > > Is there any reason it needs to be indexed? It may depend on your solr version, but the starting configsets currently only have docvalues: ./s

  1   2   3   4   5   6   7   8   9   10   >