Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.

2015-12-14 Thread Pushkar Raste
Hi Philippa, Try taking a heap dump (when heap usage is high) and then using a profiler look at which objects are taking up most of the memory. I have seen that if you are using faceting/sorting on large number of documents then fieldCache grows very big and dominates most of of the heap. Enabling

Re: Moving to SolrCloud, specifying dataDir correctly

2015-12-14 Thread Jeff Wartes
Don’t set solr.data.dir. Instead, set the install dir. Something like: -Dsolr.solr.home=/data/solr -Dsolr.install.dir=/opt/solr I have many solrcloud collections, and separate data/install dirs, and I’ve never had to do anything with manual per-collection or per-replica data dirs. That said,

Re: Moving to SolrCloud, specifying dataDir correctly

2015-12-14 Thread Rahul Ramesh
We currently moved data from magnetic drive to SSD. We run Solr in cloud mode. Only data is stored in the drive configuration is stored in ZK. We start solr using the -s option specifying the data dir Command to start solr ./bin/solr start -c -h -p -z -s We followed the following steps to

Re: Defining SOLR nested fields

2015-12-14 Thread Alessandro Benedetti
Exacly, In Solr there is no concept of "nested fields" . But there's the concept of nested documents ( via Query time join and Index time (block) join ) . You can have a "flat" schema which actually will be used to model nested documents at index and query time. There is plenty of documentation

Re: Getting a document version back after updating

2015-12-14 Thread Debraj Manna
Is there any seperate api available in solrj 5.2.1 for setting version=true while adding or updating a solr doc? On Dec 13, 2015 8:03 AM, "Debraj Manna" wrote: > Thanks Alex. This is what I was looking for. One more query how to set > this from solrj while calling add()

Re: Highlighting large documents

2015-12-14 Thread Jens Brandt
Hi Edwin, you are limiting the portion of the document analyzed for highlighting in your solrconfig.xml by 100 Thus, snippets are only produced correctly if the query was found in the first 100 characters of the document. If you set this parameter to -1 the original highlighter

Re: Getting a document version back after updating

2015-12-14 Thread Mikhail Khludnev
what about UpdateRequest().getParam().add("versions","true") ? On Mon, Dec 14, 2015 at 1:15 PM, Debraj Manna wrote: > Is there any seperate api available in solrj 5.2.1 for setting version=true > while adding or updating a solr doc? > On Dec 13, 2015 8:03 AM, "Debraj

Re: Block Join query

2015-12-14 Thread Mikhail Khludnev
In addition to the link in the previous response, http://blog.griddynamics.com/2013/09/solr-block-join-support.html provides an example of such combination. From my experience fq doen't participate in highlighting nor scoring. On Mon, Dec 14, 2015 at 2:45 PM, Novin Novin

solr cloud invalid shard/collection configuration

2015-12-14 Thread ig01
I have an existing solrcloud 4.4 configured with zookeeper. The current setting is 3 shards, each shard has a leader and replica. All are mapped to the same collection1. {"collection1":{ "shards":{ "shard1":{ "range":"8000-d554", "state":"active", "replicas":{

Re: Block Join query

2015-12-14 Thread Novin Novin
Hi Mikhail, I'm having a little bit problem to construct the query for solr when I have been trying to use block join query. As you said, i can't use + or in front of block join query, so I have to put *{**!parent which="doctype:200"} *in front. and after this, all fields are child document, so

Re: Security Problems

2015-12-14 Thread Jan Høydahl
> 1) "read" should cover all the paths This is very fragile. If all paths were closed by default, forgetting to configure a path would not result in a security breach like today. /Jan

Re: pf2 pf3 and stopwords

2015-12-14 Thread Binoy Dalal
Moreover, the stopword de will work on your queries and not on your documents, meaning if you query 'Gare de Saint Lazare', the terms actually searched for will be Gare Saint and Lazare, 'de' will be filtered out. On Mon, Dec 14, 2015 at 8:49 PM Binoy Dalal wrote: > This

Re: pf2 pf3 and stopwords

2015-12-14 Thread Binoy Dalal
This isn't a bug. During pf3 matching, since your query has only three tokens, the entire query will be treated as a single phrase, and with slop = 0, any word that comes in the middle of your query - 'de' in this case will cause the phrase to not be matched. If you want to get around this, try

re: nested fields

2015-12-14 Thread Rick Leir
On Sun, Dec 13, 2015 at 8:26 PM, wrote: > > I want to define nested fileds in SOLR using schema.xml. Us too (using Solr 5.3.1). And doco is not jumping out at me. My approach is (please suggest a better way) 1/ create a blank core 2/ add a few nested

Re: Block Join query

2015-12-14 Thread Novin Novin
Thanks Man. On Mon, 14 Dec 2015 at 12:19 Mikhail Khludnev wrote: > In addition to the link in the previous response, > http://blog.griddynamics.com/2013/09/solr-block-join-support.html provides > an example of such combination. From my experience fq doen't

RE: how to secure standalone solr

2015-12-14 Thread Davis, Daniel (NIH/NLM) [C]
Wait a second. There are other sorts of ways to secure Solr that don't work with any sort role-based security control. What I do is place a reverse-proxy in front of Apache Solr on port 80, and have that reverse proxy use CAS authentication. I also have a list of "valid-users" who may

Memory leak in SolrCloud 4.6

2015-12-14 Thread Mark Houts
I am running a SolrCloud 4.6 cluster with three solr nodes and three external zookeeper nodes. Each Solr node has 12GB RAM. 8GB RAM dedicated to the JVM. When solr is started it consumes barely 1GB but over the course of 36 to 48 hours physical memory will be consumed and swap will be used. The

Re: Providing own _version field in solr doc

2015-12-14 Thread Debraj Manna
Can I somehow get "documentVersion" for each doc back in the Update Response like the way we get _version back in Optimistic Concurrency when we set "version=true" in the update request? On Dec 14, 2015 10:58 PM, "Chris Hostetter" wrote: > > The _version_ field used to

Partial sentence match with block join

2015-12-14 Thread Yangrui Guo
Hello I've been using 5.3.1. I would like to enable this feature: when user enters a query, the results should include documents that also partially match the query. For example, the document is Apple Company and user query is "apple computer company". Though the document is missing the term

Is DIH going to be removed from Solr future versions?

2015-12-14 Thread Anil Cherian
Dear Team, I use DIH extensively and even wrote my own custom transformers in some situations. Recently during an architecture discussion one of my team members told that Solr is going to take away DIH from its future versions. Is that true? Also is using DIH for say 2 or 3 million docs a good

Re: solr cloud invalid shard/collection configuration

2015-12-14 Thread ig01
Hi, thanks for the answer. We installed solr with solr.cmd -e cloud utility that comes with the installation. The names of shards are odd because in this case after the installation We've migrated an old index from our other environment (wich is solr single node) and splitted it with Collection

pf2 pf3 and stopwords

2015-12-14 Thread elisabeth benoit
Hello, I am using solr 4.10.1. I have a field with stopwords And I use pf2 pf3 on that field with a slop of 0. If the request is "Gare Saint Lazare", and I have a document "Gare de Saint Lazare", "de" being a stopword, this document doesn't get the pf3 boost, because of "de". I was

[ANNOUNCE] Apache Solr 5.4.0 released

2015-12-14 Thread Upayavira
14 December 2015, Apache Solr™ 5.4 available Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g.,

Re: Security Problems

2015-12-14 Thread Noble Paul
". If all paths were closed by default, forgetting to configure a path would not result in a security breach like today." But it will still mean that unauthorized users are able to access, like guest being able to post to "/update". Just authenticating is not enough without proper authorization

Re: Solr5.3.1 solrcloud Enabling Basic AUthentication

2015-12-14 Thread Noble Paul
You don't need to submit a sha256, Solr will do itself. Just use the provided commands please refer this https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin On Mon, Dec 14, 2015 at 6:56 AM, soledede_w...@ehsy.com < soledede_w...@ehsy.com> wrote: > I want to restrict

Re: Help Indexing Large File

2015-12-14 Thread Erick Erickson
Well, this usually means the maximum packet size has been exceeded, there are several possibilities here that I'm going to skip over because I have to ask the purpose of indexing a 5G file. Indexing such a huge file has several problems from a user's perspective: 1> assuming the bulk of it is

Providing own _version field in solr doc

2015-12-14 Thread Debraj Manna
We have a use case in which there are multiple clients writing concurrently to solr. Each of the doc is having an 'timestamp' field which indicates when these docs were generated. We also have to ensure that any old doc doesn't overwrite any new doc in solr. So to achieve this we were thinking if

Best practice for incremental Data Import Handler

2015-12-14 Thread Gian Maria Ricci - aka Alkampfer
Hi, I just want some feedback on best practice to run incremental DIH. During last years I always preferred to have dedicated application that pushes data inside ElasticSearch / Solr, but now I have a situation where we are forced to use DIH. I have several SQL Server database with a

Re: Help Indexing Large File

2015-12-14 Thread Toke Eskildsen
Antelmo Aguilar wrote: > I am trying to index a very large file in Solr (around 5GB). However, I >get out of memory errors using Curl. I tried using the post script and I > had some success with it. After indexing several hundred thousand records > though, I got the

Re: Providing own _version field in solr doc

2015-12-14 Thread Andrea Gazzarini
Hi Debraj, I think this nice article [1] from Yonik could be helpful. Andrea [1] http://yonik.com/solr/optimistic-concurrency/ 2015-12-14 18:17 GMT+01:00 Debraj Manna : > We have a use case in which there are multiple clients writing concurrently > to solr. Each of

Re: Defining SOLR nested fields

2015-12-14 Thread Tom Evans
On Sun, Dec 13, 2015 at 6:40 PM, santosh sidnal wrote: > Hi All, > > I want to define nested fileds in SOLR using schema.xml. we are using Apache > Solr 4.7.0. > > i see some links which says how to do, but not sure how can i do it in > schema.xml >

Re: Providing own _version field in solr doc

2015-12-14 Thread Alexandre Rafalovitch
At the first glance, this sounds like a perfect match to https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-DocumentCentricVersioningConstraints Just make sure your "timestamps" are truly atomic and not local clock-based. The drift could cause

Help Indexing Large File

2015-12-14 Thread Antelmo Aguilar
Hello, I am trying to index a very large file in Solr (around 5GB). However, I get out of memory errors using Curl. I tried using the post script and I had some success with it. After indexing several hundred thousand records though, I got the following error message: *SimplePostTool: FATAL:

Re: Providing own _version field in solr doc

2015-12-14 Thread Chris Hostetter
The _version_ field used to optimistic concurrency can't be user supplied -- it's not just a record of the *document's* version, but actually a record of the *update command* version -- so even deleteByQuery commands have one -- and the order must (internally) increase across all types of

Re: solr cloud invalid shard/collection configuration

2015-12-14 Thread Erick Erickson
On a quick glance those look OK, what commands did you use _exactly_ to create your new collection? The names are a bit odd and it's not clear how they could have gotten that way. how many documents have you tried to index to your new collection? Any errors in the logs? And how many documents are

Re: SOLR-7996

2015-12-14 Thread Upayavira
On Mon, Dec 14, 2015, at 06:20 PM, Jamie Johnson wrote: > Has anyone looked at this issue? I'd be willing to take a stab at it if > someone could provide some high level design guidance. This would be a > critical piece preventing us from moving to version 5. Just start working on it, Jamie.

How for distributed search only log collective search response

2015-12-14 Thread Koorosh Vakhshoori
In my use case, I have a number of shards where a query would run as distributed search. I am not using Solr Cloud, I have just a Solr server. Now, when the search runs, I see one entry for each shard query as well as the finally collective search query response. As the results, I am ending

Re: Moving to SolrCloud, specifying dataDir correctly

2015-12-14 Thread Shawn Heisey
On 12/14/2015 10:49 AM, Tom Evans wrote: > When I tried this in SolrCloud mode, specifying > "-Dsolr.data.dir=/mnt/solr/" when starting each node, it worked fine > for the first collection, but then the second collection tried to use > the same directory to store its index, which obviously failed.

Moving to SolrCloud, specifying dataDir correctly

2015-12-14 Thread Tom Evans
Hi all We're currently in the process of migrating our distributed search running on 5.0 to SolrCloud running on 5.4, and setting up a test cluster for performance testing etc. We have several cores/collections, and in each core's solrconfig.xml, we were specifying an empty , and specifying the

SOLR-7996

2015-12-14 Thread Jamie Johnson
Has anyone looked at this issue? I'd be willing to take a stab at it if someone could provide some high level design guidance. This would be a critical piece preventing us from moving to version 5. Jamie

Re: Moving to SolrCloud, specifying dataDir correctly

2015-12-14 Thread Tom Evans
On Mon, Dec 14, 2015 at 1:22 PM, Shawn Heisey wrote: > On 12/14/2015 10:49 AM, Tom Evans wrote: >> When I tried this in SolrCloud mode, specifying >> "-Dsolr.data.dir=/mnt/solr/" when starting each node, it worked fine >> for the first collection, but then the second

Re: Help Indexing Large File

2015-12-14 Thread Jack Krupansky
What is the nature of the file? Is it Solr XML, CSV, PDF (via Solr Cell), or... what? If a PDF, maybe it has lots of hi-resolution images. If so, you may need to strip out the images and just send the text, which would be a lot smaller. For example, you could run Tika locally to extract the text

RE: Re:Re: Implementing security.json is breaking ADDREPLICA

2015-12-14 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
Anshum and Nobel, I've downloaded 5.4, and this seems to be working so far Thanks again -Original Message- From: Anshum Gupta [mailto:ans...@anshumgupta.net] Sent: Tuesday, December 01, 2015 12:52 AM To: solr-user@lucene.apache.org Subject: Re: Re:Re: Implementing security.json is

Re: Moving to SolrCloud, specifying dataDir correctly

2015-12-14 Thread Erick Erickson
Currently, it'll be a little tedious but here's what you can do (going partly from memory)... When you create the collection, specify the special value EMPTY for createNodeSet (Solr 5.3+). Use ADDREPLICA to add each individual replica. When you do this, you can add a dataDir for each individual

Re: how to secure standalone solr

2015-12-14 Thread Ishan Chattopadhyaya
Hi Daniel, That sounds good. It is a custom solution, which is a way to secure just about any server. I think Noble's point was about out of the box, community supported, way of securing Solr. Regards, Ishan On Mon, Dec 14, 2015 at 9:26 PM, Davis, Daniel (NIH/NLM) [C] < daniel.da...@nih.gov>