Re: Nested facet complete wrong counts

2017-11-10 Thread Yonik Seeley
I do notice you are using hll (hyper-log-log) which is a distributed cardinality *estimate* : https://en.wikipedia.org/wiki/HyperLogLog -Yonik On Fri, Nov 10, 2017 at 11:32 AM, kenny wrote: > Hi all, > > We are doing some tests in solr 6.6 with json facet api and we get > completely wrong count

Re: Solr / HDPSearch related

2017-11-10 Thread Cassandra Targett
Some of these questions should be directed to Hortonworks, but I'm glad you posted them here because I noticed you asked similar questions on the IRC channel but left before I could jump in and help. Full disclosure, I work for Lucidworks and one of my jobs is managing the development team that mak

Semantic Knowledge Graph

2017-11-10 Thread David Hastings
Im looking through the slides from 2016 as well as the presentation again from 2017 and in them there is a user interface for this project, that i dont see as being available so im assuming it was created as a different project, would be nice to have access to that. also, all of the examples in t

Re: Nested facet complete wrong counts

2017-11-10 Thread Amrit Sarkar
Kenny, This is a known behavior in multi-sharded collection where the field values belonging to same facet doesn't reside in same shard. Yonik Seeley has improved the Json Facet feature by introducing "overrequest" and "refine" parameters. Kindly checkout Jira: https://issues.apache.org/jira/brow

Nested facet complete wrong counts

2017-11-10 Thread kenny
Hi all, We are doing some tests in solr 6.6 with json facet api and we get completely wrong counts for some combination of  facets Setting: We have a set of fields for 376k documents in our query (total 120M documents). We work with 2 shards. When doing first a faceting over the first facet

Solr / HDPSearch related

2017-11-10 Thread Greenhorn Techie
Hi, We have a HDP product cluster and are now planning to build a search solution for some of our business requirements. In this regard, I have the following questions. Can you please answer the below questions with respect to Solr? - As I understand, it is more performant to have SolrCloud se

Re: Multiple collections for a write-alias

2017-11-10 Thread Emir Arnautović
This approach could work only if it is append only index. In case you have updates/deletes, you have to process in order, otherwise you will get incorrect results. I am thinking that is one of the reasons why it might not be supported since not too useful. Emir -- Monitoring - Log Management -

RE: How to routing document for send to particular shard range

2017-11-10 Thread Ketan Thanki
Thanks Amrit, I getting it know so can you please told me anyhow can I achieve using composite routing ? as mentions my requirement below. Because will need to send particular client data to particular shard. Regards, -Original Message- From: Amrit Sarkar [mailto:sarkaramr...@gmail.c

Spellcheck returning suggestions for words that exist in the dictionary

2017-11-10 Thread Sanjana Sridhar
Spellcheck works perfectly when I misspell a word, but if there is a word that already exists in the dictionary, Solr still returns suggestions for it. eg: bike gets spell corrected to bake. I unfortunately cannot use the *maxResultsForSuggest* field as I need to return the correct spelling irres

Re: Re: ygc problem on solr 5.5.1

2017-11-10 Thread Samuel Tatipamula
Promotion failure usually means your old gen doesn't have enough space to accommodate the incoming (promoted from young to old) object. In your case, you have specified NewRatio as 3, which means, you have approximately 30*(3/4) = 22.5 GB old gen heap space. If a heap this big gets fragmented, it w

Re: Solr - phrase suggestion returning duplicate

2017-11-10 Thread alessandro.benedetti
"In case you decide to use an entire new index for the autosuggestion, you can potentially manage that on your own" This refers to the fact that is possible to define an index just for autocompletion. You can model the document as you prefer in this additional index, defining the field types tha

Recovering shards from down state

2017-11-10 Thread decenttp
I have a 3 server (debian) ensemble using zookeeper and solr 6.6.0 on aws cloud. Setup has 3 shards per server with a replication factor of 3. It has around 11 collections 2 of which are large having over 5 million records each. Since i was maxing on the ram i tried to launch the servers with a hig

Re: How to routing document for send to particular shard range

2017-11-10 Thread Amrit Sarkar
Ketan, here I have also created new field 'core' which value is any shard where I > need to send documents and on retrieval use '_route_' parameter with > mentioning the particular shard. But issue facing still my > clusterstate.json showing the "router":{"name":"compositeId"} is it means > my se

RE: How to routing document for send to particular shard range

2017-11-10 Thread Ketan Thanki
Hi Erik, My requirement to index the documents of particular organization to specific shard. Also I have made changes in core.properties as menions below. Model Collection: name=model shard=shard1 collection=model router.name=implicit router.field=core shards=shard1,shard2 Workset Collection: n

Index Message-ID from EML file to Solr

2017-11-10 Thread Zheng Lin Edwin Yeo
Hi, Can we index the Message-ID that is from the EML file into Solr? Tika does have the Message-ID in the MIME data, but is Solr able to read it and index it? I'm using Solr 6.5.1. Regards, Edwin

Re: Make search on the particular field to be case sensitive

2017-11-10 Thread Karan Saini
Hi Erick, Please ignore my earlier mail. I got it worked ! I missed the rule attribute. Now it is working. Thanks, Karan On 10 November 2017 at 15:59, Karan Saini wrote: > Hi Erick, > > Thanks for the help. It is working fine with the *KeywordTokenizerFactory. > *Like you mentioned, i wan

Re: Make search on the particular field to be case sensitive

2017-11-10 Thread Karan Saini
Hi Erick, Thanks for the help. It is working fine with the *KeywordTokenizerFactory. *Like you mentioned, i want to search for "dog" or "*dog*" alone also. Case sensitivity is working fine, but i want to have the wild based search also. So I tried this changed code, but no luck !!

Re:Re: ygc problem on solr 5.5.1

2017-11-10 Thread 胡一博
Thank you for your suggest! about the NewRatio param. I found some 'promotion failed' in the gc log. It trigger a stw gc instead of cms gc. If i change the NewRatio to 2,the promotion fail maybe appeard more frequenly. Is the 'promotion failed' is caused by some inappropriate use on solr cloud

Re: ygc problem on solr 5.5.1

2017-11-10 Thread Samuel Tatipamula
Hi, There are a couple of things based on your configuration I can suggest. -XX:ConcGCThreads=4 - try removing this restriction on the threads. -Xms30g - you should re-consider this param, as 30 GB is huge heap size. Instead, in SolrCloud, try spawning multiple instances if you have system resou

Re: Solr server partial update is very slow

2017-11-10 Thread Sujay Bawaskar
Hi Erick, Some of the partial updates are taking huge time. Average QTime for updates in 15 minute interval is 14344. 2017-11-10 08:15:11.863 INFO (qtp225493257-43961) [ x:collection] o.a.s.c.S.Request [collection] webapp=/solr path=/update params={wt=javabin&version=2} status=0 QTime=1007390

Re: Atomic Updates with SolrJ

2017-11-10 Thread Martin Keller
Hello, as Amrit mentioned, I attached the schema.xml of such an index. Perhaps there is something to find in it. The responses of update and commit look quite normal: {responseHeader={status=0,QTime=2}} {responseHeader={status=0,QTime=14}} After committing the fields fld_BA1F56CD9C87419CB9A271D