Empty value fields not indexed

2017-04-27 Thread Zheng Lin Edwin Yeo
Hi, I'm using Solr 6.4.2, and I realized that for those fields which has no values, the field name is not index into Solr. It was working fine in the previous version. Any reason for this or any settings which needs to be done so that the field name can be indexed even though it's value is

Re: DIH Speed

2017-04-27 Thread Vijay Kokatnur
​Let me clarify - DIH is running on Solr 6.5.0 that calls a different solr instance running​ on 4.5.0, which has 150M documents. If we try fetch them using DIH onto new solr cluster, wouldn't it result in deep paging on solr 4.5.0 and drastically slow down indexing on solr 6.5.0? On Thu, Apr

Re: Poll: Master-Slave or SolrCloud?

2017-04-27 Thread David Lee
As someone who moved from ES to Solr, I can say that one of the things that makes ES so much easier to configure is that the majority of things that need to be set for a specific environment are all in pretty much one config file. Also, I didn't have to deal with the "magic stuff" that many

Re: DIH Speed

2017-04-27 Thread Shawn Heisey
On 4/27/2017 9:15 PM, Vijay Kokatnur wrote: > Hey Shawn, Unfortunately, we can't upgrade the existing cluster. That > was my first approach as well. Yes, SolrEntityProcessor is used so it > results in deep paging after certain rows. I have observed that > instead of importing for a larger period,

Re: DIH Speed

2017-04-27 Thread Vijay Kokatnur
Hey Shawn, Unfortunately, we can't upgrade the existing cluster. That was my first approach as well. Yes, SolrEntityProcessor is used so it results in deep paging after certain rows. I have observed that instead of importing for a larger period, if data is imported only for 4 hours at a time,

Re: 1 main collection or multiple smaller collections?

2017-04-27 Thread Derek Poh
Richard Iam considering the sameoption asyour suggestion to put them in 1 single collection of products documents. A product doccontaining the supplier info. In this option, a supplier info will get repeated in eachof the supplier's product doc.I may be influenced by DB concepts. Guess it's a

Re: 1 main collection or multiple smaller collections?

2017-04-27 Thread Derek Poh
Hi Shawn 1 set of data is suppliers info and 1 set isthe suppliers products info. Usercan eitherdo a product search or a supplier search. 1 optionI am thinking of is to put them in 1 single collectionwith each product as a document. Each productdocument will have the supplier info in it.

Re: DIH Speed

2017-04-27 Thread Shawn Heisey
On 4/27/2017 5:40 PM, Erick Erickson wrote: > I'm unclear why DIH an deep paging are mixed. DIH is indexing and deep paging > is querying. > > If it's querying, consider cursorMark or the /export handler. >

Re: Atomic Updates

2017-04-27 Thread Erick Erickson
Been there, done that, got the t-shirt. Thanks for closing it out! Erick On Thu, Apr 27, 2017 at 10:29 AM, Chris Ulicny wrote: > While recreating it with a fresh schema, I realized that this was a case of > a very, very stupid user error during configuring the cores. > > I

Re: DIH Speed

2017-04-27 Thread Erick Erickson
I'm unclear why DIH an deep paging are mixed. DIH is indexing and deep paging is querying. If it's querying, consider cursorMark or the /export handler. https://lucidworks.com/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/ If it's DIH, please explain a bit

Solr Query Performance benchmarking

2017-04-27 Thread Suresh Pendap
Hi, I am trying to perform Solr Query performance benchmarking and trying to measure the maximum throughput and latency that I can get from.a given Solr cluster. Following are my configurations Number of Solr Nodes: 4 Number of shards: 2 replication-factor: 2 Index size: 55 GB Shard/Core

DIH Speed

2017-04-27 Thread Vijay Kokatnur
We have a new solr 6.5.0 cluster, for which data is being imported via DIH from another Solr cluster running version 4.5.0. This question comes back to deep paging, but we have observed that after 30 minutes of querying the rate of processing goes down from 400/s to about 120/s. At that point it

TransactionLog doesn't know how to serialize class java.util.UUID; try implementing ObjectResolver?

2017-04-27 Thread Mahmoud Almokadem
Hello, When I try to update a document exists on solr cloud I got this message: TransactionLog doesn't know how to serialize class java.util.UUID; try implementing ObjectResolver? With the stack trace:

Re: Atomic Updates

2017-04-27 Thread Chris Ulicny
While recreating it with a fresh schema, I realized that this was a case of a very, very stupid user error during configuring the cores. I setup the testing cores with the wrong configset, and then proceeded to edit the schema in the right configset. So, the field was actually stored by default,

Re: Split Shard not working

2017-04-27 Thread Walter Underwood
What is the message in the log when it crashes? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 27, 2017, at 10:10 AM, Vijay Kokatnur wrote: > > We recently upgraded 4.5 index to 6.5 using IndexUpgrader. The index

Split Shard not working

2017-04-27 Thread Vijay Kokatnur
We recently upgraded 4.5 index to 6.5 using IndexUpgrader. The index size is around 600 GB on disk. When we try to split it using SPLITSHARD, it creates two new sub shards on the node and eventually crashes before completely the split. After restart, the original shard size if around 100 GB and

Re: 1 main collection or multiple smaller collections?

2017-04-27 Thread Walter Underwood
Design backwards from the search result pages (SRP). Make flat schema(s) with the fields you will search and display. One example is the schema I used at Netflix. I used one collection to hold movies, people (actors), and genres. There were collisions between the integer IDs, movies IDs were

Re: 1 main collection or multiple smaller collections?

2017-04-27 Thread Rick Leir
Does it make sense to use nested documents here? Products could be nested in a supplier document perhaps. Alternately, consider de-normalizing "til it hurts". A product doc might be able to contain supplier info. On April 27, 2017 8:50:59 AM EDT, Shawn Heisey wrote: >On

Re: size-estimator-lucene-solr.xls error in disk space estimator

2017-04-27 Thread Matteo Grolla
Right Alessandro that's another bug Cheers 2017-04-27 12:30 GMT+02:00 alessandro.benedetti : > +1 > I would add that what is called : "Avg. Document Size (KB)" seems more to > me > "Avg. Field Size (KB)". > Cheers > > > > - > --- > Alessandro Benedetti >

Re: Atomic Updates

2017-04-27 Thread Chris Ulicny
I'm sending commit=true with every update while testing. I'll write up the tests and see if someone else can reproduce it. On Thu, Apr 27, 2017 at 10:54 AM Erick Erickson wrote: > bq: but is there any possibility that the values stick around until > there is a segment

Re: Atomic Updates

2017-04-27 Thread Erick Erickson
bq: but is there any possibility that the values stick around until there is a segment merge for some strange reason There better not be or it's a bug. Things will stick around until you issue a commit, is there any chance that's the problem? If you can document the exact steps, maybe we can

Blocked ConcurrentUpdateSolrClient

2017-04-27 Thread Christian Belka
Hello I am trying to update larger amounts of Documents (mostly ADD/DELETE) through various threads. After a certain amount of time (a few hours) all my threads get stuck at taskExecutor-46" prio=5 tid=0x268 nid=0x10c BLOCKED owned by taskExecutor-9 Id=230 - stats: cpu=2788

Re: Indexing I/O errors and CorruptIndex messages

2017-04-27 Thread simon
Nope ... huge file system (600gb) only 50% full, and a complete index would be 80gb max. On Wed, Apr 26, 2017 at 4:04 PM, Erick Erickson wrote: > Disk space issue? Lucene requires at least as much free disk space as > your index size. Note that the disk full issue will

Re: Atomic Updates

2017-04-27 Thread Chris Ulicny
Yeah, something's not quite right somewhere. We never even considered in-place updates an option since it requires the fields to be non-indexed and non-stored. Our schemas never have any field that satisfies those two conditions let alone the other necessary ones. I went ahead and tested the

Re: Spatial Search: can not use FieldCache on a field which is neither indexed nor has doc values: latitudeLongitude_0_coordinate

2017-04-27 Thread freddy79
It does work with "solr.LatLonPointSpatialField" instead of "solr.LatLonType". But why not with "solr.LatLonType"? -- View this message in context:

Re: Update to Solr 6 - Amazon EC2 high CPU SYS usage

2017-04-27 Thread Shawn Heisey
On 4/27/2017 3:03 AM, Elodie Sannier wrote: > We have migrated from Solr 5.4.1 to Solr 6.4.0 on Amazon EC2 and we have > a high CPU SYS usage and it drastically decreases the Solr performance. > > The JVM version (java-1.8.0-openjdk-1.8.0.131-0.b11.el6_9.x86_64), the > Jetty version (9.3.14) and

Re: 1 main collection or multiple smaller collections?

2017-04-27 Thread Shawn Heisey
On 4/26/2017 11:57 PM, Derek Poh wrote: > There are some common fields between them. > At the source data end (database), the supplier info and product info > are updated separately. In this regard, I should separate them? > If it's In 1 single collection, when there are updatesto only the >

Spatial Search: can not use FieldCache on a field which is neither indexed nor has doc values: latitudeLongitude_0_coordinate

2017-04-27 Thread freddy79
Hi, when doing a query with spatial search i get the error: can not use FieldCache on a field which is neither indexed nor has doc values: latitudeLongitude_0_coordinate *SOLR Version:* 6.1.0 *schema.xml:* *Query:*

[ANNOUNCE] Apache Solr 6.5.1 released

2017-04-27 Thread jim ferenczi
27 April 2017, Apache Solr™ 6.5.1 available The Lucene PMC is pleased to announce the release of Apache Solr 6.5.1 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting,

[ANNOUNCE] Apache Solr 6.5.1 released

2017-04-27 Thread jim ferenczi
27 April 2017, Apache Solr™ 6.5.1 available The Lucene PMC is pleased to announce the release of Apache Solr 6.5.1 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted

[ANNOUNCE] Apache Solr 6.5.1 released

2017-04-27 Thread jim ferenczi
27 April 2017, Apache Solr™ 6.5.1 available The Lucene PMC is pleased to announce the release of Apache Solr 6.5.1 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted

Re: Help with facet.limit

2017-04-27 Thread alessandro.benedetti
In addition to what Erick mentioned, (if) you can use Json faceting and sort your facets according to your preferences using the stats integration [1]. Cheers [1] https://cwiki.apache.org/confluence/display/solr/Faceted+Search - --- Alessandro Benedetti Search Consultant, R

Re: counting_number_of_term_in_a_doc

2017-04-27 Thread alessandro.benedetti
I think the closest you get out of the box is the term vector component[1] . Cheers [1] https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io -- View this

Re: size-estimator-lucene-solr.xls error in disk space estimator

2017-04-27 Thread alessandro.benedetti
+1 I would add that what is called : "Avg. Document Size (KB)" seems more to me "Avg. Field Size (KB)". Cheers - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io -- View this message in context:

size-estimator-lucene-solr.xls error in disk space estimator

2017-04-27 Thread Matteo Grolla
It seems me that the estimation in MB is in fact an estimation in GB the formula includes the avg doc size, which is in kb, so the result is in kb and should be divided by 1024 to obtain the result in MB. But it's divided by 1024*1024

Update to Solr 6 - Amazon EC2 high CPU SYS usage

2017-04-27 Thread Elodie Sannier
Hello, We have migrated from Solr 5.4.1 to Solr 6.4.0 on Amazon EC2 and we have a high CPU SYS usage and it drastically decreases the Solr performance. The JVM version (java-1.8.0-openjdk-1.8.0.131-0.b11.el6_9.x86_64), the Jetty version (9.3.14) and the OS version (CentOS 6.9) have not changed

Re: Poll: Master-Slave or SolrCloud?

2017-04-27 Thread Emir Arnautovic
I think creating poll for ES ppl with question: "How do you run master nodes? A) on some data nodes B) dedicated node C) dedicated server" would give some insight how big issue is having ZK and if hiding ZK behind Solr would do any good. Emir On 25.04.2017 23:13, Otis Gospodnetić wrote: Hi