Solr 6.5 autosuggest suggests misspelt words and unwanted words

2018-06-19 Thread Sri Sirisha Vallabhaneni
Hi , My Data contains un-curated data - which consists of *cuss words, misspelt words* like *nd* instead of *need. *We are using a auto-suggest/auto-complete that heavily relies on indexed data to recommend suggestions as the user types in his query. We are using a list of stop words consistin

How to split index more than 2GB in size

2018-06-19 Thread Sushant Vengurlekar
How do I split indexes which are more than 2GB in size. I get this error when I try to use SPLITSHARD on a collection of size more than 2GB 2018-06-20 02:25:49.810 ERROR (qtp1025799482-19) [ ] o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: SPLITSHARD failed to invoke SPLIT core

Re: Solrcloud doesn't like relative path

2018-06-19 Thread Sushant Vengurlekar
Hi Eric Based on your suggestion I moved the helpers to be under configsets/conf so my new folder structure looks -configsets - conf helpers synonyms_vendors.txt - collection1 -conf schema.xml solrconfig.xm

Re: CursorMarks and 'end of results'

2018-06-19 Thread Anshum Gupta
I might have been wrong there. Having an explicit check for the # results returned vs rows requested, would allow you to avoid the last request that would otherwise come back with 0 results. That check isn’t automatically done within Solr.  Anshum > On Jun 19, 2018, at 2:39 PM, Anshum Gupta

Re: Import data from standalone solr into a solrcloud collection

2018-06-19 Thread Shawn Heisey
On 6/19/2018 11:50 AM, Sushant Vengurlekar wrote: > I created a solr cloud collection with 2 shards and a replication factor of > 2. How can I load data into this collection which I have currently stored > in a core on a standalone solr. I used the conf from this core on > standalone solr to create

Re: Extracting top level URL when indexing document

2018-06-19 Thread Gus Heck
I don't understand the inclusion of 'n' in the character classes in this pattern... it's pretty clear that the broken examples in the OP were where the letter n occurred in the domain name. I expect a similar problem for user parts that contain n... ^https?://(?:[^@/n]+@)?(?:www.)?([^:/n]+) On Tu

Re: Import data from standalone solr into a solrcloud collection

2018-06-19 Thread Erick Erickson
Personally I'd start with a 1-shard, 1-replica collection (i.e. leader-only). >From there split the shard. once all that had been done satisfactorily, just use the collections API ADDREPLICA command to build out your collection to whatever degree of redundancy you need. Best, Erick On Tue, Jun

Re: CursorMarks and 'end of results'

2018-06-19 Thread Anshum Gupta
Hi David, The cursormark would be the same if you get back fewer than the max records requested and so you should exit, as per the documentation. I think the documentation says just what you are suggesting, but if you think it could be improved, feel free to put up a patch.  Anshum > On Ju

Re: MoreLikeThis in Solr 7.3.1

2018-06-19 Thread Anshum Gupta
That explains it :) I assume you did make those changes on disk and did not upload the updated configset to zookeeper. SolrCloud instances use the configset from zk, so all changed files would have to be uploaded to zk. You can re-uplaod the configset using the zkcli.sh script that comes with

Re: MoreLikeThis in Solr 7.3.1

2018-06-19 Thread Monique Monteiro
I reloaded the collection with the command: http://localhost:8983/solr/admin/collections?action=RELOAD&name=documentos_ce But stil the same problem... On Tue, Jun 19, 2018 at 4:48 PM Monique Monteiro wrote: > Hi Anshum, > > I'm using SolrCloud, but both instances are on the same Solr installat

Re: Is anybody using UIMA with Solr?

2018-06-19 Thread Nicolas Paris
sorry thought I was on UIMA mailing list. That being said, my position is the same : let UIMA folks load data into SolR by using the most optimized way. (what would be the best way ? Loading jsons ?) 2018-06-19 22:48 GMT+02:00 Nicolas Paris : > Hi > > Not realy a direct answer - Never used it, h

Re: Is anybody using UIMA with Solr?

2018-06-19 Thread Nicolas Paris
Hi Not realy a direct answer - Never used it, however this feature have been attractive to me while first looking at uima. Right now, I would say UIMA connectors in general are by design a pain to maintain. Source and target often do have optimised way to bulk export/import data. For example, usi

Re: Import data from standalone solr into a solrcloud collection

2018-06-19 Thread Aroop Ganguly
I see. By definition of splitting, the new shards will have the same number of replicas as the original shard. You could use the replicationFactor>=2 to ensure that both of your solr nodes are used. You could also use the maxShardsPerNode parameter alone or in conjunction with the replicationFa

Re: Import data from standalone solr into a solrcloud collection

2018-06-19 Thread Sushant Vengurlekar
Thank you Aroop After I import the data into the collection from the standalone solr core I want to split it into 2 shards across 2 nodes that I have. So I will have to set replicationfactor of 2 & numShards =2 ? On Tue, Jun 19, 2018 at 12:46 PM Aroop Ganguly wrote: > Hi Sushant > > replication

Re: MoreLikeThis in Solr 7.3.1

2018-06-19 Thread Monique Monteiro
Hi Anshum, I'm using SolrCloud, but both instances are on the same Solr installation (it's just for test purposes), so I suppose they share configuration in solr-7.3.1/server/solr/configsets/_default/conf/solrconfig.xml. So should I recreate the collection ? Thanks, Monique On Tue, Jun 19, 2018

Re: Import data from standalone solr into a solrcloud collection

2018-06-19 Thread Aroop Ganguly
Hi Sushant replicationFactor defaults to 1 and is not mandatory. numShards is mandatory, where you’d equate it to 1. Aroop > On Jun 19, 2018, at 12:29 PM, Sushant Vengurlekar > wrote: > > Thank you Eric. > > In the create collection command I need to set the replication factor > though corre

Re: MoreLikeThis in Solr 7.3.1

2018-06-19 Thread Anshum Gupta
Hi Monique, Is this standalone Solr or SolrCloud ? If it is cloud, then you’d have to make sure that you uploaded the right config and collection should also be reloaded if you enabled it after creating the collection. Also, did you check the MLT Query parser that does the same thing but doesn’

Re: Import data from standalone solr into a solrcloud collection

2018-06-19 Thread Sushant Vengurlekar
Thank you Eric. In the create collection command I need to set the replication factor though correct? On Tue, Jun 19, 2018 at 11:14 AM Erick Erickson wrote: > Probably the easiest way would be to recreate your collection with 1 > shard. Then copy the index from your standalone setup. > > After

Re: sharding and placement of replicas

2018-06-19 Thread Shawn Heisey
On 6/15/2018 11:08 AM, Oakley, Craig (NIH/NLM/NCBI) [C] wrote: > If I start with a collection X on two nodes with one shard and two replicas > (for redundancy, in case a node goes down): a node on host1 has > X_shard1_replica1 and a node on host2 has X_shard1_replica2: when I try > SPLITSHARD, I

Re: Import data from standalone solr into a solrcloud collection

2018-06-19 Thread Erick Erickson
Probably the easiest way would be to recreate your collection with 1 shard. Then copy the index from your standalone setup. After verifying your setup, use the Collections SPLITSHARD command. Best, Erick On Tue, Jun 19, 2018 at 10:50 AM, Sushant Vengurlekar wrote: > I created a solr cloud colle

Re: Solrcloud doesn't like relative path

2018-06-19 Thread Erick Erickson
Configsets are presumed to contain any auxiliary files under them, not a relative path _on Zookeeper_. So try putting your synonyms_vendors.txt in configsets/conf/helpers/synonyms_vendors.txt, then reference it as helpers/synonyms_vendors.txt. Best, Erick On Tue, Jun 19, 2018 at 10:28 AM, Sushan

MoreLikeThis in Solr 7.3.1

2018-06-19 Thread Monique Monteiro
Hi all, I'm trying to access /mlt in Solr, but the index returns HTTP 404 error. I've already configured the following: - /solr-7.3.1/server/solr/configsets/_default/conf/solrconfig.xml: ** ** * _text_* ** * * AND ** ** *list * * * * * But none of

RE: tlogs not deleting

2018-06-19 Thread Brian Yee
Does anyone have any additional possible causes for this issue? I checked the buffer status using "/cdcr?action=STATUS" and it says buffer disabled at both target and source. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, June 19, 2018 11:55 AM T

Import data from standalone solr into a solrcloud collection

2018-06-19 Thread Sushant Vengurlekar
I created a solr cloud collection with 2 shards and a replication factor of 2. How can I load data into this collection which I have currently stored in a core on a standalone solr. I used the conf from this core on standalone solr to create the collection on the solrcloud Thank you

Re: How to exclude certain values in multi-value field filter query

2018-06-19 Thread Wei
Thanks Mikhail and Alessandro. On Tue, Jun 19, 2018 at 2:37 AM, Mikhail Khludnev wrote: > you need to index num vals > apache/solr/update/processor/CountFieldValuesUpdateProcessorFactory.html> > in the separate field, and then *:* -(V:(A AN

Re: Limited Search Results During Full Reindexing - Fine Once Completed

2018-06-19 Thread Erick Erickson
I'd set your soft commit interval to as long as you can stand. Every soft commit opens a new searcher and does significant work, including throwing away your queryResultCache and filterCache. The time here should be as long as you can afford to not be able to search updates. Don't go totally overb

Solrcloud doesn't like relative path

2018-06-19 Thread Sushant Vengurlekar
I have this line in my schema.xml synonyms="../../helpers/synonyms_vendors.txt" My current folder structure is solr - helpers synonyms_vendors.txt -configsets - collection1 -conf schema.xml solrconfig.xml I get the below error wh

RE: Remove schema.xml in favor of managed-schema

2018-06-19 Thread Davis, Daniel (NIH/NLM) [C]
Elastic allows the mappings to be set all at once, either in the template or as index settings. That is an important feature because it allows the field definitions to be source code artifacts, which can be deployed very easily by an automatic script. Solr's Managed Schema API allows multiple

Re: Limited Search Results During Full Reindexing - Fine Once Completed

2018-06-19 Thread THADC
thanks I changed the autosoftCommit from -1 and 3000 and that seemed to do the trick. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Remove schema.xml in favor of managed-schema

2018-06-19 Thread Doug Turnbull
I actually prefer the classic config-files approach over managed schemas. Having done both Elasticsearch (where everything is configed through an API), managed and non-managed Solr, I prefer the legacy non-managed Solr way of doing things when its possible - With 'managed' approaches, the config c

Re: Spring Boot missing required field:

2018-06-19 Thread Andrea Gazzarini
Hi Rushikesh, If the issue is: "when I set required=true Solr says the field is missing, and if I set required="false" I have no problem at all, but Solr documents have no value for that field", then trust me, the field is missing. I see two possible points where the issue could be: * clien

Re: Remove schema.xml in favor of managed-schema

2018-06-19 Thread Alexandre Rafalovitch
And that managed-schema will reorder the entries and delete the comments on first API modification. Regards, Alex On Tue, Jun 19, 2018, 11:47 AM Shawn Heisey, wrote: > On 6/17/2018 6:48 PM, S G wrote: > > I only wanted to know if schema.xml offer anything that managed-schema > does > > not.

Re: tlogs not deleting

2018-06-19 Thread Erick Erickson
bq. Do you recommend disabling the buffer on the source SolrCloud as well? Disable them all on both source and target IMO. On Tue, Jun 19, 2018 at 8:50 AM, Brian Yee wrote: > Thank you Erick. I am running Solr 6.6. From the documentation: > "Replicas do not need to buffer updates, and it is reco

RE: tlogs not deleting

2018-06-19 Thread Brian Yee
Thank you Erick. I am running Solr 6.6. From the documentation: "Replicas do not need to buffer updates, and it is recommended to disable buffer on the target SolrCloud." Do you recommend disabling the buffer on the source SolrCloud as well? It looks like I already have the buffer disabled at ta

Re: Remove schema.xml in favor of managed-schema

2018-06-19 Thread Shawn Heisey
On 6/17/2018 6:48 PM, S G wrote: > I only wanted to know if schema.xml offer anything that managed-schema does > not. The only difference between the two is that there is a different filename and the managed version can be modified by API calls.  The schema format and what you can do within that f

Re: A field-wide remove duplicate tokens filter

2018-06-19 Thread sarita
- what would be the solution if search query is (lenovo-A600 in lenovo mobile) . As per need i have to use 'worldelimeterfilterfactory' because user some time search (lenovoA600) and some time (lenovo a600) . after filter pass 'worlddelimeterfilterfactory' tokens of main query (lenovo-A600 i

Re: tlogs not deleting

2018-06-19 Thread Erick Erickson
Take a look at the CDCR section of your reference guide, be sure you get the version which you can download from here: https://archive.apache.org/dist/lucene/solr/ref-guide/ There's the CDCR API call you can use for in-flight disabling, and depending on the version of Solr you can set it in solrco

Re: Retrieving Results from both child and parent

2018-06-19 Thread Rushikesh Garadade
I found one solution, however i am not sure whether this is optimized/corrct solution to it. Return a mails which contains a word 'pdf' anywhere : *({!parent which=internetMessageId:* v=pdf}) OR (internetMessageId:* AND pdf)* a) ({!parent which=internetMessageId:* v=pdf}) ==> return mails whose a

RE: tlogs not deleting

2018-06-19 Thread Brian Yee
Thanks for the suggestion. Can you please elaborate a little bit about what DISABLEBUFFER does? The documentation is not very detailed. Is this something that needs to be done manually whenever this problem happens or is it something that we can do to fix it so it won't happen again? -Origi

Re: Streaming Expressions: Merge array values? Inverse of cartesianProduct()

2018-06-19 Thread Christian Spitzlay
Here it is: https://issues.apache.org/jira/browse/SOLR-12499 -- Christian Spitzlay Diplom-Physiker, Senior Software-Entwickler Tel: +49 69 / 348739116 E-Mail: christian.spitz...@biologis.com bio.logis Genetic Information Management GmbH Altenhöferallee 3 60438 Frankfurt am Main Geschäftsfüh

Running Solr 5.3.1 with JDK10

2018-06-19 Thread Li, Yi
Hi, Currently we are running Solr 5.3.1 with JDK8 and we are trying to run Solr 5.3.1 with JDK10. Initially we got a few errors complaining some JVM options are removed since JDK9. We removed those options in solr.in.sh: UseConcMarkSweepGC UseParNewGC PrintHeapAtGC PrintGCDateStamps PrintGCTimeS

Re: Streaming Expressions: Merge array values? Inverse of cartesianProduct()

2018-06-19 Thread Christian Spitzlay
Ok. I'm about to create the issue and I have a draft version of what I had in mind in a branch on github. Christian Spitzlay > Am 19.06.2018 um 15:27 schrieb Joel Bernstein : > > Let's move the discussion to the jira ticket. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Tue, Jun

Re: Streaming Expressions: Merge array values? Inverse of cartesianProduct()

2018-06-19 Thread Joel Bernstein
Let's move the discussion to the jira ticket. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Jun 19, 2018 at 3:42 AM, Christian Spitzlay < christian.spitz...@biologis.com> wrote: > > > > Am 18.06.2018 um 15:30 schrieb Joel Bernstein : > > > > You are doing things correctly. I was incorrect

Re: Connection Problem with CloudSolrClient.Builder().build When passing a Zookeeper Addresses and RootParam

2018-06-19 Thread THADC
Thank you Andy, The problem was as you suspected, the "http://"; prefixes. The odd thing is that I used to use the one param constructor with the solr node URL list (like: CloudSolrClient.Builder(solrServerURLLList).build();). I could not get that one to work without the "http://"; prefix. Anyway

Re: Spring Boot missing required field:

2018-06-19 Thread Rushikesh Garadade
Yes Andrea, I have already tried that, I have value associated with field. This issue is coming when I set 'required="true"' . If i remove this then everything works fine. I am not getting why this issue occurs when I set required="true". Can you please provide me some pointers to look see what ma

Retrieving Results from both child and parent

2018-06-19 Thread Rushikesh Garadade
Hello, I have stored emails in solr, with its attachments as child documents. As per solr structure these attachments got stored in same lines as of mails. Ex: { "id":"1528801242887_f662e5fe-b5d7-4494-acab-c1a99e6cd025", "attachmentName ":"example_multipage.doc", "attachmentType":"application/mswor

Re: Solr cloud with different JVM size nodes

2018-06-19 Thread Emir Arnautović
Hi Rishi, It is not uncommon to have tiers in your cluster assuming you weighted if it is the best choice. I would remind you that 32GB is not a good heap size since you cannot use compressed OOPS. Check what is the limit of your JVM but 30GB is a safe bet. Also, what did you mean be “got high f

Re: SOLR migration

2018-06-19 Thread Emir Arnautović
Hi Ana, There is no documentation because this is not something that is common. Assuming you are using SolrCloud and that you don’t want any downtime. What you could do is set up new Solr node on the same box but configure it to use this new disk. After it is set, you use ADDREPLICA and REMOVERE

Re: some solr replicas down

2018-06-19 Thread Chris Ulicny
Satya, There should be some other log messages that are probably relevant to the issue you are having. Something along the lines of "leader cannot communicate with follower...publishing replica as down." It's likely there also is a message of "expecting json/xml but got html" in another instance's

Re: How to exclude certain values in multi-value field filter query

2018-06-19 Thread Mikhail Khludnev
you need to index num vals in the separate field, and then *:* -(V:(A AND B) AND numVals:2) -(V:(A OR B) AND numVals:1) On Tue, Jun 19, 2018 at 9:20 AM Wei wrote: > Hi,

Re: How to exclude certain values in multi-value field filter query

2018-06-19 Thread Alessandro Benedetti
The first idea that comes in my mind is to build a single valued copy field which concatenates them. in this way you will have very specific values to filter on : query1 -(copyfield:(A B AB)) To concatenate you can use this update request processor : https://lucene.apache.org/solr/6_6_0//solr-cor

Re: SOLR migration

2018-06-19 Thread Ana Mercan (RO)
Hello guys, I would appreciate if you could kindly treat this topic with priority as the lack of documentation is kind of a blocker for us. Thanks in advance, Ana On Mon, Jun 18, 2018 at 4:56 PM, Ana Mercan (RO) wrote: > Hi, > > I have the following scenario, I'm having a shared cluster solr

Re: Solrj does not support ltr ?

2018-06-19 Thread Alessandro Benedetti
Pretty sure you can't. As far as I know there is no client side implementation to help with managed resourced in general. Any contribution is welcome! - --- Alessandro Benedetti Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io -- Sent from: http://luce

Re: Streaming Expressions: Merge array values? Inverse of cartesianProduct()

2018-06-19 Thread Christian Spitzlay
> Am 18.06.2018 um 15:30 schrieb Joel Bernstein : > > You are doing things correctly. I was incorrect about the behavior of the > group() operation. > > I think the behavior you are looking for should be done using reduce() but > we'll need to create a reduce operation that does this. If you w

Solr cloud with different JVM size nodes

2018-06-19 Thread Rishikant Snigh
Hello everyone, I am planning to create a a solr cloud with 16GB and 32GB nodes. Some what to create an underneath pseudo cluster - 32G to hold historical data(got high field cache). 16G to hold regular collections. NOTE - Shards of collection placed on 16G will never be placed on 32G and vice ve