solr tika extraction video creation date problem (hours ahead)

2019-04-03 Thread Where is Where
Hello , I was following the instruction https://lucene.apache.org/solr/guide/7_1/uploading-data-with-solr-cell-using-apache-tika.html to upload files with metadata stored and indexed in solr. I was checking the extracted creation date ( attr_meta_creation_date ), for image, jpg etc, the creation

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Zheng Lin Edwin Yeo
Hi David, Yes, I do have this field "_root_" in the schema. However, I don't think I have use the field, and there is no difference in the indexing speed after I remove the field. Regards, Edwin On Wed, 3 Apr 2019 at 22:57, David Smiley wrote: > Hi Edwin, > > I'd like to rule something

high cpu threads (solr 7.5)

2019-04-03 Thread Hari Nakka
We are noticing high CPU utilization on below threads. Looks like a known issue with. (https://github.com/netty/netty/issues/327) But not sure if this has been addressed in any of the 1.8 releases. Can anyone help with this? Version: solr cloud 7.5 OS: CentOS 7 JDK: Oracle JDK 1.8.0_191

Re: [ANNOUNCE] Apache Solr 8.0.0 released

2019-04-03 Thread Noble Paul
Thanks Jim On Fri, Mar 15, 2019 at 1:39 AM Toke Eskildsen wrote: > > On Thu, 2019-03-14 at 13:16 +0100, jim ferenczi wrote: > > http://lucene.apache.org/solr/8_0_0/changes/Changes.html > > Thank you for the hard work of rolling the release! > Looking forward to upgrading. > > - Toke Eskildsen,

Re: Indexing PDF files in SqlBase database

2019-04-03 Thread Arunas Spurga
Yes, I know the reasons why put this work on a client rather than use Solr directly and it should be maybe the next my task. But I need to finish first my task - index a pdf files stored in SqlBase database. The pdf files are pretty simple, sometimes only dozens text lines. Regards, Aruna On

Re: Unexpected docvalues type SORTED_NUMERIC Exception when grouping by a PointField facet

2019-04-03 Thread Erick Erickson
Looks like: https://issues.apache.org/jira/browse/SOLR-11728 > On Apr 3, 2019, at 1:09 AM, JiaJun Zhu wrote: > > Hello, > > > I got an "Unexpected docvalues type SORTED_NUMERIC" exception when I perform > group facet on an IntPointField. Debugging into the source code, the cause is > that

Re: Indexing PDF files in SqlBase database

2019-04-03 Thread Erick Erickson
For a lot of reasons, I greatly prefer to put this work on a client rather than use Solr directly. Here’s a place to get started, it connects to a DB and also scans local file directory for docs to push through (local) Tika and index. So you should be able to modify it relatively easily to get

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread David Smiley
Hi Edwin, I'd like to rule something out. Does your schema define a field "_root_"? If you don't have nested documents then remove it. It's presence adds indexing weight in 8.0 that was not there previously. I'm not sure how much though; I've hoped small but who knows. ~ David Smiley Apache

Re: SolrCloud with separate JAVA instances

2019-04-03 Thread Shawn Heisey
On 4/3/2019 8:16 AM, Bernd Fehling wrote: If I now use the Admin GUI at port 8983 and select "Cloud"->"Graph" I see both collections. Also with Admin GUI at port port 7574. And I can select both collection in "Collection Selection" dropdown box. Why and is this how it should be? I thought

Re: SolrCloud with separate JAVA instances

2019-04-03 Thread Erick Erickson
bq. I thought different JAVA instances at different ports are separated by each other? Not at all. If that were true, how would you use more than one physical machine? The combination URL:PORT is, from Solr’s perspective, just some Solr node. There’s no assumption about what machine it’s

SolrCloud with separate JAVA instances

2019-04-03 Thread Bernd Fehling
I have SolrCloud with a collection "test1" with 5 shards 2 replicas accoss 5 server. This cloud is started at port 8983 on each server. Now I have a second collection "test2" with 5 shards 1 replica accross the same 5 server. But this second collection is started in seperate JAVA instances at

Re: Documentation for Apache Solr 8.0.0?

2019-04-03 Thread Yoann Moulin
Hello, I’m looking for the documentation for the latest release of SolR (8.0) but it looks like it’s not online yet. https://lucene.apache.org/solr/news.html http://lucene.apache.org/solr/guide/ Do you know when it will be available? >>> >>> The Solr

Re: Documentation for Apache Solr 8.0.0?

2019-04-03 Thread Cassandra Targett
The *DRAFT* 8.0 Guide is also available from Jenkins: https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-8.0/javadoc/ Cassandra On Apr 2, 2019, 3:23 AM -0500, Jan Høydahl , wrote: > There is also a *DRAFT* HTML version of the to-be 8.1 guide built by Jenkins, > see >

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread David Smiley
What/where is this benchmark? I recall once Ishan was working with a volunteer to set up something like Lucene has but sadly it was not successful On Wed, Apr 3, 2019 at 6:04 AM Đạt Cao Mạnh wrote: > Hi guys, > > I'm seeing the same problems with Shalin nightly indexing benchmark. This >

Spatial Search using two separate fields for lat and long

2019-04-03 Thread Tim Hedlund
Hi all, I'm importing documents (rows in excel file) that includes latitude and longitude fields. I want to use those two separate fields for searching with a bounding box. Is this possible (not using deprecated LatLonType) or do I need to combine them into one single field when indexing? The

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Toke Eskildsen
On Wed, 2019-04-03 at 18:04 +0800, Zheng Lin Edwin Yeo wrote: > I have tried to set all the docValues in my schema.xml to false and > do the indexing again. > There isn't any difference with the indexing speed as compared to > when we have enabled the docValues. Thank you for sparing me the work.

Re: Solr 7 not removing a node completely due to too small thread pool

2019-04-03 Thread Roger Lehmann
Oh great, thanks for the hint! I've upvoted this issue, since I think it might be worth to be able to configure that (rather low) ThreadPool count. On Wed, 3 Apr 2019 at 10:23, Shalin Shekhar Mangar wrote: > Thanks Roger. This was reported earlier but missed our attention. > > The issue is

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Zheng Lin Edwin Yeo
Hi Toke, I have tried to set all the docValues in my schema.xml to false and do the indexing again. There isn't any difference with the indexing speed as compared to when we have enabled the docValues. Seems like the cause of the regression might be somewhere else? Regards, Edwin On Wed, 3 Apr

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Đạt Cao Mạnh
Hi guys, I'm seeing the same problems with Shalin nightly indexing benchmark. This happen around this period git log --before=2018-12-07 --after=2018-11-21 On Wed, Apr 3, 2019 at 8:45 AM Toke Eskildsen wrote: > On Wed, 2019-04-03 at 15:24 +0800, Zheng Lin Edwin Yeo wrote: > > Yes, I am using

NestPathField

2019-04-03 Thread Vincenzo D'Amore
Hi all, I've found NestPathField fieldType in the solr 8.0.0 configuration. But looking in the documentation I haven't found anything. Just curious, someone have time to share something about? For example explain how to use this? Best regards, Vincenzo -- Vincenzo D'Amore

Unexpected docvalues type SORTED_NUMERIC Exception when grouping by a PointField facet

2019-04-03 Thread JiaJun Zhu
Hello, I got an "Unexpected docvalues type SORTED_NUMERIC" exception when I perform group facet on an IntPointField. Debugging into the source code, the cause is that internally the docvalue type for PointField is "NUMERIC" (single value) or "SORTED_NUMERIC" (multi value), while the

Re: Solr 7 not removing a node completely due to too small thread pool

2019-04-03 Thread Shalin Shekhar Mangar
Thanks Roger. This was reported earlier but missed our attention. The issue is https://issues.apache.org/jira/browse/SOLR-11208 On Tue, Apr 2, 2019 at 5:56 PM Roger Lehmann wrote: > To be more specific: I currently have 19 collections, where each node has > exactly one replica per collection.

Basic auth and index replication

2019-04-03 Thread Dwane Hall
Hey Solr community. I’ve been following a couple of open JIRA tickets relating to use of the basic auth plugin in a Solr cluster (https://issues.apache.org/jira/browse/SOLR-12584 , https://issues.apache.org/jira/browse/SOLR-12860) and recently I’ve noticed similar behaviour when adding tlog

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Toke Eskildsen
On Wed, 2019-04-03 at 15:24 +0800, Zheng Lin Edwin Yeo wrote: > Yes, I am using DocValues for most of my fields. So that's a culprit. Thank you. > Currently we can't share the test data yet as some of the records are > sensitive. Do you have any data from CSV file that you can test? Not

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Zheng Lin Edwin Yeo
Yes, I am using DocValues for most of my fields. I am using dynamicField, in which I have appended the field name with things like _s, _i, etc in the CSV file. Currently we can't share the test data yet as some of the

Solr not starting after enabling SSL

2019-04-03 Thread Anchal Sharma2
Hi All, We recently migrated our existing solr(version 5.3.0) from AIX OS server to Linux based server.And it works fine(http solr) . RHEL version 7.6 Java version 1.8(IBM Java) But now ,when trying to enable SSL over same ,the solr doesnt start after enabling SSL. It says "Address already

Indexing PDF files in SqlBase database

2019-04-03 Thread Arunas Spurga
Hello, I got a task to index in Solr 7.71 a PDF files which are stored in SqlBase database. I did half the job - I can to index all table fields, I can do a search in these fields except field in which is stored a pdf file content. As I am ttotally new in Solr, spent unsuccessfully a lot a time

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Toke Eskildsen
On Wed, 2019-04-03 at 10:17 +0800, Zheng Lin Edwin Yeo wrote: > What could be the reason that causes the indexing to be slower in > Solr 8.0.0? As Aroop states there can be multiple explanations. One of them is the change to how DocValues are handled in 8.0.0. The indexing impact should be tiny,