Re: How to query against dynamic fields without listing them all?

2019-07-14 Thread David Santamauro
Hi Steven, You can dump all the dynamic fields into a copyField Then you can just set "qf":"CC_COMP_NAME_ALL" On 7/14/19, 10:42 AM, "Steven White" wrote: Hi everyone, In my schema, I have the following field: When I index, I create dynamic

Re: Urgent help on solr optimisation issue !!

2019-06-07 Thread David Santamauro
I use the same algorithm and for me, initialMaxSegments is always the number of segments currently in the index (seen, e.g, in the SOLR admin UI). finalMaxSegments depends on what kind of updates have happened. If I know that "older" documents are untouched, then I'll usually use -60% or even

Re: Urgent help on solr optimisation issue !!

2019-06-07 Thread David Santamauro
5. Best, Erick > On Jun 7, 2019, at 7:07 AM, David Santamauro wrote: > > Erick, on 6.0.1, optimize with maxSegments only merges down to the specified number. E.g., given an index with 75 segments, optimize with maxSegments=74 will only merge 2 segments leaving 74

Re: Urgent help on solr optimisation issue !!

2019-06-07 Thread David Santamauro
/clarification/ ... expungeDeletes will merge every segment *touched by the current commit* that has a deleted document. On 6/7/19, 10:07 AM, "David Santamauro" wrote: Erick, on 6.0.1, optimize with maxSegments only merges down to the specified number. E.g., given an ind

Re: Urgent help on solr optimisation issue !!

2019-06-07 Thread David Santamauro
Erick, on 6.0.1, optimize with maxSegments only merges down to the specified number. E.g., given an index with 75 segments, optimize with maxSegments=74 will only merge 2 segments leaving 74 segments. It will choose a segment to merge that has deleted documents, but does not merge every segment

Re: Solr boolean query with phrase match

2019-03-25 Thread David Santamauro
Perhaps the Complex Phrase Query Parser might be what you are looking for. https://lucene.apache.org/solr/guide/7_3/other-parsers.html // On 3/25/19, 1:41 AM, "krishan goyal" wrote: Hi, I want to execute a solr query with boolean clauses using the eDismax Query Parser.

Re: Is it possible to force solr show all facet values for the field with an enum type?

2019-01-06 Thread David Santamauro
Seeing that the field is an enumeration, couldn't you just use a set of facet.query(s)? ?q=*:* =user_s:Bar =true =enumfield:A =enumfield:B =0 // On 1/5/19, 3:01 PM, "Arvydas Silanskas" wrote: Thanks for your reply. No, not exactly what I want. Consider I

Re: ComplexPhraseQParser vs phrase slop

2018-10-10 Thread David Santamauro
Anyone have any insight here? On 10/8/18, 3:34 PM, "David Santamauro" wrote: Hi, quick question. Should 1) {!complexphrase inOrder=false}f: ( "cat jump"~2 ) ... and 2) f: ( "cat jump"~2 ) ... yield the same results?

ComplexPhraseQParser vs phrase slop

2018-10-08 Thread David Santamauro
Hi, quick question. Should 1) {!complexphrase inOrder=false}f: ( "cat jump"~2 ) ... and 2) f: ( "cat jump"~2 ) ... yield the same results? I'm trying to diagnose a more complicated discrepancy that I've boiled down to this simple case. I understand #1 creates a SpanQuery and #2 a

Re: how to access solr in solrcloud

2018-09-12 Thread David Santamauro
... or haproxy. On 9/12/18, 10:23 AM, "Vadim Ivanov" wrote: Hi, Steve If you are using solr1:8983 to access solr and solr1 is down IMHO nothing helps you to access dead ip. You should switch to any other live node in the cluster or I'd propose to have nginx as frontend to

Re: Overlapped Gap Facets

2016-11-17 Thread David Santamauro
I had a similar question a while back but it was regarding date differences. Perhaps that might give you some ideas. http://lucene.472066.n3.nabble.com/date-difference-faceting-td4249364.html // On 11/17/2016 09:49 AM, Furkan KAMACI wrote: Is it possible to do such a facet on a date

Re: Aggregate Values Inside a Facet Range

2016-11-04 Thread David Santamauro
I believe your answer is in the subject => facet.range https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-RangeFaceting // On 11/04/2016 02:25 PM, Furkan KAMACI wrote: I have documents like that id:5 timestamp:NOW //pseudo date representation count:13 id:4 timestamp:NOW

Re: how to remove duplicate from search result

2016-09-27 Thread David Santamauro
Have a look at https://cwiki.apache.org/confluence/display/solr/Result+Grouping On 09/27/2016 11:03 AM, googoo wrote: hi, We want to provide remove duplicate from search result function. like we have below documents. id(uniqueKey) guid doc1G1 doc2G2

Re: Removing SOLR fields from schema

2016-09-22 Thread David Santamauro
On 09/22/2016 08:55 AM, Shawn Heisey wrote: On 9/21/2016 11:46 PM, Selvam wrote: We use SOLR 5.x in cloud mode and have huge set of fields. We now want to remove some 50 fields from Index/schema itself so that indexing & querying will be faster. Is there a way to do that without losing

Re: script to get core num docs

2016-09-19 Thread David Santamauro
https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API wget -O- -q \ '/admin/cores?action=STATUS=coreName=json=true' \ | grep numDocs // /admin/cores?action=STATUS=alexandria_shard2_replica1=json=1'|grep numDocs|cut -f2 -d':'| On 09/19/2016 11:22 AM, KRIS MUSSHORN wrote:

Re: analyzer for _text_ field

2016-07-15 Thread David Santamauro
The opening and closing single quotes don't match -data-binary '{ ... }’ it should be: -data-binary '{ ... }' On 07/15/2016 02:59 PM, Steve Rowe wrote: Waldyr, maybe it got mangled by my email client or yours? Here’s the same command:

Re: json facet - date range & interval

2016-06-28 Thread David Santamauro
Have you tried %-escaping? json.facet = { daterange : { type : range, field : datefield, start : "NOW/DAY%2D10DAYS", end : "NOW/DAY", gap : "%2B1DAY" } } On 06/28/2016 01:19 PM, Jay Potharaju wrote:

Re: Deleted documents and expungeDeletes

2016-04-01 Thread David Santamauro
The docs on reclaimDeletesWeight say: "Controls how aggressively merges that reclaim more deletions are favored. Higher values favor selecting merges that reclaim deletions." I can't imagine you would notice anything after only a few commits. I have many shards that size or larger and what

Re: Deleted documents and expungeDeletes

2016-03-30 Thread David Santamauro
On 03/30/2016 08:23 AM, Jostein Elvaker Haande wrote: On 30 March 2016 at 12:25, Markus Jelsma wrote: Hello - with TieredMergePolicy and default reclaimDeletesWeight of 2.0, and frequent updates, it is not uncommon to see a ratio of 25%. If you want deletes to

Re: docValues error

2016-02-29 Thread David Santamauro
thanks Shawn, that seems to be the error exactly. On 02/29/2016 09:22 AM, Shawn Heisey wrote: On 2/28/2016 3:31 PM, David Santamauro wrote: I'm porting a 4.8 schema to 5.3 and I came across this new error when I tried to group.field=f1: unexpected docvalues type SORTED_SET for field 'f1

Re: docValues error

2016-02-29 Thread David Santamauro
On 02/29/2016 07:59 AM, Tom Evans wrote: On Mon, Feb 29, 2016 at 11:43 AM, David Santamauro <david.santama...@gmail.com> wrote: You will have noticed below, the field definition does not contain multiValues=true What version of the schema are you using? In pre 1.1 schemas, multiValued

Re: docValues error

2016-02-29 Thread David Santamauro
On 02/29/2016 06:05 AM, Mikhail Khludnev wrote: On Mon, Feb 29, 2016 at 12:43 PM, David Santamauro < david.santama...@gmail.com> wrote: unexpected docvalues type SORTED_SET for field 'f1' (expected=SORTED). Use UninvertingReader or index with docvalues. DocValues is primary citiz

Re: docValues error

2016-02-29 Thread David Santamauro
) at org.apache.solr.search.grouping.CommandHandler.searchWithTimeLimiter(CommandHandler.java:233) at org.apache.solr.search.grouping.CommandHandler.execute(CommandHandler.java:160) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:398) etc ... On 02/28/2016 05:31 PM, David Santamauro wrote: I'm

docValues error

2016-02-28 Thread David Santamauro
I'm porting a 4.8 schema to 5.3 and I came across this new error when I tried to group.field=f1: unexpected docvalues type SORTED_SET for field 'f1' (expected=SORTED). Use UninvertingReader or index with docvalues. f1 is defined as positionIncrementGap="100">

Re: date difference faceting

2016-01-08 Thread David Santamauro
hardware. This assumes that the stat you're interested in is predictable of course... Best, Erick On Fri, Jan 8, 2016 at 2:23 AM, David Santamauro <david.santama...@gmail.com> wrote: Hi, I have two date fields, d_a and d_b, both of type solr.TrieDateField, that represent different events asso

date difference faceting

2016-01-08 Thread David Santamauro
Hi, I have two date fields, d_a and d_b, both of type solr.TrieDateField, that represent different events associated with a particular document. The interval between these dates is relevant for corner-case statistics. The interval is calculated as the difference: sub(d_b,d_a) and I've been

Re: How to check when a search exceeds the threshold of timeAllowed parameter

2015-12-23 Thread David Santamauro
On 12/23/2015 01:42 AM, William Bell wrote: I agree that when using timeAllowed in the header info there should be an entry that indicates timeAllowed triggered. If I'm not mistaken, there is => partialResults:true "responseHeader":{ "partialResults":true } // This is the only reason

collection mbeans: requests

2015-08-04 Thread David Santamauro
I have a question about how the stat 'requests' is calculated. I would really appreciate it if anyone could shed some light on the figures below. Assumptions: version: 5.2.0 layout: 8 node solrcloud, no replicas (node71-node78) collection: col1 handler: /search stats request:

Re: collection mbeans: requests

2015-08-04 Thread David Santamauro
I have your suggested shards.qt set up in another collection for another reason but I'll do that redirect here as well, thanks for the confirmation. On 08/04/2015 10:45 AM, Shawn Heisey wrote: On 8/4/2015 5:19 AM, David Santamauro wrote: I have a question about how the stat 'requests

Re: Frequent deletions

2015-01-11 Thread David Santamauro
[ disclaimer: this worked for me, ymmv ... ] I just battled this. Turns out incrementally optimizing using the maxSegments attribute was the most efficient solution for me. In particular when you are actually running out of disk space. #!/bin/bash # n-segments I started with high=400 #

Re: A bad idea to store core data directory over NAS?

2014-11-04 Thread David Santamauro
Interestingly enough, one of our installations has a 16-node cluster using 4 NAS devices (xen as virtualization backbone). The data drive for the individual node that holds the index is a stripe of 2x 500GB disks. Each disk of the stripe is on a different NAS device (scattered pattern). With

moving to new core.properties setup

2014-06-11 Thread David Santamauro
I have configured many tomcat+solrCloud setups but I'm trying now to research the new solr.properties configuration. I have a functioning zookeeper to which I manually loaded a configuration using: zkcli.sh -cmd upconfig \ -zkhost xx.xx.xx.xx:2181 \ -d /test/conf \ -n test My

Re: Stuck on SEVERE: Error filterStart

2014-04-16 Thread David Santamauro
You need to copy solr/example/lib/ext/*.jar into your tomcat lib directory (/usr/share/tomcat/lib) Also make sure a /usr/share/tomcat/conf/log4j.properties is there as well. ... then restart. HTH David On 4/16/2014 11:47 AM, Arthur Pemberton wrote: I am trying Solr for the first time,

Re: Strange relevance scoring

2014-04-08 Thread David Santamauro
Is there any general setting that removes this punishment or must omitNorms=false be part of every field definition? On 4/8/2014 7:04 AM, Ahmet Arslan wrote: Hi, length normal is computed for every document at index time. I think it is 1/sqrt(number of terms). Please see section 6.

Re: Facetting by field then query

2014-03-27 Thread David Santamauro
For pivot facets in SolrCloud, see https://issues.apache.org/jira/browse/SOLR-2894 Resolution: Unresolved Fix Version/s 4.8 I am waiting patiently ... On 03/27/2014 05:04 AM, Alvaro Cabrerizo wrote: I don't think you can do it, as pivot

Re: Facets, termvectors, relevancy and Multi word tokenizing

2014-02-28 Thread David Santamauro
Have you tried to just use a copyField? For example, I had a similar use case where I needed to have particular field (f1) tokenized but also needed to facet on the complete contents. For that, I created a copyField copyField source=f1 dest=f2 / f1 used tokenizers and filters but f2 was

boost group doclist members

2014-02-11 Thread David Santamauro
Without falling into the x/y problem area, I'll explain what I want to do: I would like to group my result set by a field, f1 and within each group, I'd like to boost the score of the most appropriate member of the group so it appears first in the doc list. The most appropriate member is

Re: UTF-8 encoding problems while replicating an index using SolrCloud

2014-02-05 Thread David Santamauro
I had that same error. I cleared it up by commenting out all the /update/xxx handlers and changing /update class to solr.UpdateRequestHandler Hope that helps David On 02/05/2014 01:37 PM, Ugo Matrangolo wrote: Hi, we are having problems with an installation of SolrCloud where a leader

Re: shard1 gone missing ... (upgrade to 4.6.1)

2014-02-03 Thread David Santamauro
Miller wrote: On Jan 31, 2014, at 11:15 AM, David Santamauro david.santama...@gmail.com wrote: On 01/31/2014 10:22 AM, Mark Miller wrote: I’d also highly recommend you try moving to Solr 4.6.1 when you can though. We have fixed many, many, many bugs around SolrCloud in the 4 releases since

Re: need help in understating solr cloud stats data

2014-02-03 Thread David Santamauro
Zabbix 2.2 has a jmx client built in as well as a few JVM templates. I wrote my own templates for my solr instance and monitoring and graphing is wonderful. David On 02/03/2014 12:55 PM, Joel Cohen wrote: I had to come up with some Solr stats monitoring for my Zabbix instance. I found

shard1 gone missing ...

2014-01-31 Thread David Santamauro
Hi, I have a strange situation. I created a collection with 4 ndoes (separate servers, numShards=4), I then proceeded to index data ... all has been seemingly well until this morning when I had to reboot one of the nodes. After reboot, the node I rebooted went into recovery mode! This is

Re: shard1 gone missing ...

2014-01-31 Thread David Santamauro
On 01/31/2014 10:35 AM, Mark Miller wrote: On Jan 31, 2014, at 10:31 AM, Mark Miller markrmil...@gmail.com wrote: Seems unlikely by the way. Sounds like what probably happened is that for some reason it thought when you restarted the shard that you were creating it with numShards=2

Re: shard1 gone missing ...

2014-01-31 Thread David Santamauro
On 01/31/2014 10:22 AM, Mark Miller wrote: I’d also highly recommend you try moving to Solr 4.6.1 when you can though. We have fixed many, many, many bugs around SolrCloud in the 4 releases since 4.4. You can follow the progress in the CHANGES file we update for each release. Can I do a

Re: Can I store only the index in Solr and not the actual data

2014-01-13 Thread David Santamauro
On 01/13/2014 06:16 AM, Bijoy Deb wrote: Hi, I have my data in HDFS,which I need to index using Solr.In that case,does Solr always store both the data (the fields that need to be retrieved) as well as the index, or can it be configured to store only the index that points to the original

Re: Perl Client for SolrCloud

2014-01-08 Thread David Santamauro
On 01/07/2014 04:41 PM, Saumitra Srivastav wrote: Is there any perl client for SolrCloud. There are some Solr clients in perl but they are for single node Solr. I couldn't find anyone which can connect to SolrCloud similar to SolrJ's CloudSolrServer. Since I have a load balancer in front of 8

combining cores into a collection

2014-01-02 Thread David Santamauro
Hi, I have a few cores on the same machine that share the schema.xml and solrconfig.xml from an earlier setup. Basically from the older distribution method of using shards=localhost:1234/core1,localhost:1234/core2[,etc] for searching. They are unique sets of documents, i.e., no overlap of

Re: combining cores into a collection

2014-01-02 Thread David Santamauro
On 01/02/2014 08:29 AM, michael.boom wrote: Hi David, They are loaded with a lot of data so avoiding a reload is of the utmost importance. Well, reloading a core won't cause any data loss. Is it 100% availability during the process is what you need? Not really ... uptime is irrelevant because

Re: combining cores into a collection

2014-01-02 Thread David Santamauro
On 01/02/2014 12:44 PM, Chris Hostetter wrote: : Not really ... uptime is irrelevant because they aren't in production. I just : don't want to spend the time reloading 1TB of documents. terminologiy confusion: you mean you don't wnat to *reindex* all of the documents ... in solr reloading a

Re: adding a node to SolrCloud

2013-12-26 Thread David Santamauro
On 12/23/2013 05:43 PM, Greg Preston wrote: I believe you can just define multiple cores: core default=true instanceDir=shard1/ name=collectionName_shard1 shard=shard1/ core default=true instanceDir=shard2/ name=collectionName_shard2 shard=shard2/ ... (this is the old style solr.xml. I don't

Re: adding a node to SolrCloud

2013-12-26 Thread David Santamauro
On 12/26/2013 02:29 PM, Shawn Heisey wrote: On 12/24/2013 8:35 AM, David Santamauro wrote: You may have one or more of the SolrCloud 'bootstrap' options on the startup commandline. The bootstrap options are intended to be used once, in order to bootstrap from a non-SolrCloud setup

Re: adding a node to SolrCloud

2013-12-24 Thread David Santamauro
On 12/23/2013 08:42 PM, Shawn Heisey wrote: On 12/23/2013 12:23 PM, David Santamauro wrote: I managed to create 8 new cores and the Solr Admin cloud page showed them wonderfully as active replicas. The only issue I have is what goes into solr.xml (I'm using tomcat)? Putting core name

Re: adding a node to SolrCloud

2013-12-23 Thread David Santamauro
On 12/22/2013 09:48 PM, Shawn Heisey wrote: On 12/22/2013 2:10 PM, David Santamauro wrote: My goal is to have a redundant copy of all 8 currently running, but non-redundant shards. This setup (8 nodes with no replicas) was a test and it has proven quite functional from a performance perspective

Re: adding a node to SolrCloud

2013-12-23 Thread David Santamauro
machine from the distribution (by removing zk attributes) and restarted .. all is well again. Any idea what could have went wrong on tomcat restart? thanks. On 12/22/2013 09:48 PM, Shawn Heisey wrote: On 12/22/2013 2:10 PM, David Santamauro wrote: My goal is to have a redundant copy

Re: adding a node to SolrCloud

2013-12-23 Thread David Santamauro
and throughput has increased 10-fold. Larger boolean queries can still take 2-3s but we can live with that. At any rate, I still can't figure out what my solr.xml is supposed to look like on the node with all 8 redundant shards. David On Mon, Dec 23, 2013 at 2:31 AM, David Santamauro david.santama

adding a node to SolrCloud

2013-12-22 Thread David Santamauro
Hi, I have an 8-node setup currently with 1 shard per node (no redundancy). These 8 nodes are smaller machines not capable of supporting the entire collection.. I have another machine resource that can act as other node and this last node is capable of holding the entire collection. I'd

Re: adding a node to SolrCloud

2013-12-22 Thread David Santamauro
any hint? On 12/22/2013 06:48 AM, David Santamauro wrote: Hi, I have an 8-node setup currently with 1 shard per node (no redundancy). These 8 nodes are smaller machines not capable of supporting the entire collection.. I have another machine resource that can act as other node and this last

Re: adding a node to SolrCloud

2013-12-22 Thread David Santamauro
they will be replicas of each shards and you will accomplish what you want. However if you can give more detail about your hardware infrastructure and needs I can offer you a design. Thanks; Furkan KAMACI 22 Aralık 2013 Pazar tarihinde David Santamauro david.santama...@gmail.com adlı kullanıcı şöyle yazdı

Re: AND query on multivalue text

2008-11-24 Thread David Santamauro
On Nov 24, 2008, at 8:52 AM, Erik Hatcher wrote: On Nov 24, 2008, at 8:37 AM, David Santamauro wrote: i need to search something as myText:billion AND guarantee i need to be extracted only the record where the words exists in the same value (in this case only the first record) because