Re: SolR performance problem

2014-01-31 Thread Furkan KAMACI
Hi; Could you give more information about your hardware infrastructure and JVM settings? Thanks; Furkan KAMACI 2014-01-30 MayurPanchal mayur.panc...@silvertouch.com: Hi, I am working on solr 4.2.1 jetty and we are facing some performance issue and heap memory overflow issue as well. So i

Realtimeget SolrCloud

2014-01-31 Thread StrW_dev
Hello, I am currently experimenting to move our Solr instance into a SolrCloud setup. I am getting an error trying to access the realtimeget handlers: HTTP ERROR 404 Problem accessing /solr/collection1/get. Reason: Not Found They work fine in the normal Solr setup. Do I need some changes

Re: Realtimeget SolrCloud

2014-01-31 Thread Rafał Kuć
Hello! Do you have realtime get handler defined in your solrconfig.xml? This part should be present: requestHandler name=/get class=solr.RealTimeGetHandler lst name=defaults str name=omitHeadertrue/str str name=wtjson/str str name=indenttrue/str /lst

Re: Realtimeget SolrCloud

2014-01-31 Thread StrW_dev
That seemed to be the issue. I had several other request handlers as I wasn't using the simple /get, but apparently in SolrCloud this handler must be present in order to use the class RealTimeGetHandler at all. Thank you! -- View this message in context:

Re: Regarding Solr Faceting on the query response.

2014-01-31 Thread Mikhail Khludnev
On Thu, Jan 30, 2014 at 9:35 PM, Kuchekar kuchekar.nil...@gmail.com wrote: docs: [ { id: ABC123, company: [ APPLE ] }, { id: ABC1234, company: [ APPLE ] }, { id: ABC1235, company: [ APPLE ] }, { id: ABC1236, company: [ APPLE ] } ] }, facet_counts: { facet_queries: { p_company:ucsf\n: 1 },

Re: Regarding Solr Faceting on the query response.

2014-01-31 Thread Jérôme Étévé
On 30 January 2014 17:35, Kuchekar kuchekar.nil...@gmail.com wrote: Hi Mikhail, I would like my faceting to run only on my resultset returned as in only on numFound, rather than the whole index. As far as I know, unless you define filter tagging and exclusion, this is the

Re: Realtimeget SolrCloud

2014-01-31 Thread Rafał Kuć
Hello! No problem. Also remember that you need the _version_ field to be present in your schema. -- Regards, Rafał Kuć Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ That seemed to be the issue. I had several other request

MoreLikeThis

2014-01-31 Thread rubenboada
Hi everybody, I'm working on DSpace 3.2 and I want to change 'Related Documents' functionality, which is based on Solr MoreLikeThis. Now when I open an item, below his metadata appears 'Related Documents' where would have to show other items of actual item's author, but appears items of another

Re: Realtimeget SolrCloud

2014-01-31 Thread Jack Krupansky
The reason is that although you can configure handlers with any name you want, internal requests to other shards (other Solr servers) will assume that the handlers have the default handler names, like /get. Probably that should be configurable, or have some way of determining what the original

Re: how to write an efficient query with a subquery to restrict the search space?

2014-01-31 Thread svante karlsson
It seems to be faster to first restrict the search space and then do the scoring compared to just use the full query and let solr handle everything. For example in my application one of the scoring fields effectivly hits 1/12 of the database (a month field) and if we have 100'' items in the

Re: Storing ranges on documents and searching all document with specific value included

2014-01-31 Thread Jack Krupansky
What does your actual query look like? Is it two range queries and an AND? Also, you have spaces in your field names, so that makes it more difficult to write queries since they need to be escaped. -- Jack Krupansky -Original Message- From: Avner Levy Sent: Saturday, January 18,

List and Edit Config Files at Zookeeper from a Client Application

2014-01-31 Thread Furkan KAMACI
Hi; I am developing an application that will have an ability to list and edit SolrCloud config files at Zookeeper. Basically operator will able to see stopwords, synonyms (also elevator). Operator will edit it from my dashboard and this files will be updated at Zookeper. Currently I use

shard1 gone missing ...

2014-01-31 Thread David Santamauro
Hi, I have a strange situation. I created a collection with 4 ndoes (separate servers, numShards=4), I then proceeded to index data ... all has been seemingly well until this morning when I had to reboot one of the nodes. After reboot, the node I rebooted went into recovery mode! This is

Re: shard1 gone missing ...

2014-01-31 Thread Mark Miller
Would probably need to see some logs to have an idea of what happened. Would also be nice to see the after state of zk in a text dump. You should be able to fix it, as long as you have the index on a disk, just make sure it is where it is expected and manually update the clusterstate.json.

Re: shard1 gone missing ...

2014-01-31 Thread Mark Miller
On Jan 31, 2014, at 10:13 AM, David Santamauro david.santama...@gmail.com wrote: Oh, and I'm assuming shard1 is completely corrupt. Seems unlikely by the way. Sounds like what probably happened is that for some reason it thought when you restarted the shard that you were creating it with

Re: shard1 gone missing ...

2014-01-31 Thread Mark Miller
On Jan 31, 2014, at 10:31 AM, Mark Miller markrmil...@gmail.com wrote: Seems unlikely by the way. Sounds like what probably happened is that for some reason it thought when you restarted the shard that you were creating it with numShards=2 instead of 1. No, that’s not right. Sorry. It

Re: JVM heap constraints and garbage collection

2014-01-31 Thread Michael Della Bitta
Here at Appinions, we use mostly m2.2xlarges, but the new i2.xlarges look pretty tasty primarily because of the SSD, and I'll probably push for a switch to those when our reservations run out. http://www.ec2instances.info/ Michael Della Bitta Applications Developer o: +1 646 532 3062

Re: JVM heap constraints and garbage collection

2014-01-31 Thread Joseph Hagerty
Thanks, Shawn. This information is actually not all that shocking to me. It's always been in the back of my mind that I was getting away with something in serving from the m1.large. Remarkably, however, it has served me well for nearly two years; also, although the index has not always been 30GB,

Re: shard1 gone missing ...

2014-01-31 Thread David Santamauro
On 01/31/2014 10:35 AM, Mark Miller wrote: On Jan 31, 2014, at 10:31 AM, Mark Miller markrmil...@gmail.com wrote: Seems unlikely by the way. Sounds like what probably happened is that for some reason it thought when you restarted the shard that you were creating it with numShards=2

Re: shard1 gone missing ...

2014-01-31 Thread David Santamauro
On 01/31/2014 10:22 AM, Mark Miller wrote: I’d also highly recommend you try moving to Solr 4.6.1 when you can though. We have fixed many, many, many bugs around SolrCloud in the 4 releases since 4.4. You can follow the progress in the CHANGES file we update for each release. Can I do a

Re: shard1 gone missing ...

2014-01-31 Thread Mark Miller
solr persistent=“false” You have to set that to true. When a core starts up, it’s assigned a coreNodeName. That is persisted in solr.xml. This will happen every time you restart with persistent=false. As far as fixing. Yes, you simple want shard1 and remove the replica info. You would also

Re: shard1 gone missing ...

2014-01-31 Thread Mark Miller
On Jan 31, 2014, at 11:15 AM, David Santamauro david.santama...@gmail.com wrote: On 01/31/2014 10:22 AM, Mark Miller wrote: I’d also highly recommend you try moving to Solr 4.6.1 when you can though. We have fixed many, many, many bugs around SolrCloud in the 4 releases since 4.4. You

Re: JVM heap constraints and garbage collection

2014-01-31 Thread Michael Della Bitta
Joesph: Not so much after using some of the settings available on Shawn's Solr Wiki page: https://wiki.apache.org/solr/ShawnHeisey This is what we're running with right now: -Xmx6g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 Michael Della Bitta Applications Developer o:

SolrCloudServer questions

2014-01-31 Thread Software Dev
Can someone clarify what the following options are: - updatesToLeaders - shutdownLBHttpSolrServer - parallelUpdates Also, I remember in older version of Solr there was an efficient format that was used between SolrJ and Solr that is more compact. Does this sill exist in the latest version of

Disabling Commit/Auto-Commit (SolrCloud)

2014-01-31 Thread Software Dev
Is there a way to disable commit/hard-commit at runtime? For example, we usually have our hard commit and soft-commit set really low but when we do bulk indexing we would like to disable this to increase performance. If there isn't a an easy way of doing this would simply pushing a new solrconfig

Re: Disabling Commit/Auto-Commit (SolrCloud)

2014-01-31 Thread Alexei Martchenko
Why don't you set both solrconfig commits to very high values and issue a commit command in sparsed, small updates? I've been doing this for ages and works perfecly for me. alexei martchenko Facebook http://www.facebook.com/alexeiramone | Linkedinhttp://br.linkedin.com/in/alexeimartchenko|

Re: Regarding Solr Faceting on the query response.

2014-01-31 Thread Kuchekar
Hi Mikhail, The Actual result is as following facet_counts: { facet_queries: {}, facet_fields: { company: [ Apple, 215, BOSE, 0, Walmart, 0, Oracle, 25, ... ... ... ... Microsoft, 34, ATT, 45 ] }, facet_dates: {}, facet_ranges: {} } The Expected result would be

Re: Disabling Commit/Auto-Commit (SolrCloud)

2014-01-31 Thread Mark Miller
It’s not a good idea to disable hard commit because the transaction can grow without limit in RAM. Also, try some performance tests. I’ve never seen it matter if it’s set to like a minute, both for bulk and NRT. As far as soft commit, you could turn it off and control visibility when adding

Special character search in Solr and boosting without altering the resultset

2014-01-31 Thread abhishek jain
Hi friends, I am facing a strange problem, When I search a term eg .Net , the solr searches for Net and not includes '.' Is dot a special character in Solr? I tried escaping it with backslash in the url call to solr, but no use same resultset, Also , is there a way to boost some

Re: Disabling Commit/Auto-Commit (SolrCloud)

2014-01-31 Thread Alexei Martchenko
I didn't mean to disable, just to put some high value there. I have a script that updates my solr in batches of thousands so I set my commit to 100,000 because when it runs it updates 100,000 records in short time. The other script updates in batches of hundreds and its not so fast, so its

Re: SolrCloudServer questions

2014-01-31 Thread Greg Walters
I'm assuming you mean CloudSolrServer here. If I'm wrong please ignore my response. -updatesToLeaders Only send documents to shard leaders while indexing. This saves cross-talk between slaves and leaders which results in more efficient document routing. shutdownLBHttpSolrServer

Re: SolrCloudServer questions

2014-01-31 Thread Mark Miller
On Jan 31, 2014, at 1:56 PM, Greg Walters greg.walt...@answers.com wrote: I'm assuming you mean CloudSolrServer here. If I'm wrong please ignore my response. -updatesToLeaders Only send documents to shard leaders while indexing. This saves cross-talk between slaves and leaders which

RE: Geospatial clustering + zoom in/out help

2014-01-31 Thread Smiley, David W.
Hi Bojan. You've got some good ideas here along the lines of some that others have tried. I've through together a page on the wiki about this subject some time ago that I'm sure you will find interesting. It references a relevant stack-overflow post, and also a presentation at DrupalCon

Re: Special character search in Solr and boosting without altering the resultset

2014-01-31 Thread Ahmet Arslan
Hi Abhishek, dot is not a special character. Your field type / analyzer is stripping that character.  Please see similar discussions and alternative solutions. http://search-lucene.com/m/6dbI9zMSob1 http://search-lucene.com/m/Ac71G0KlGz http://search-lucene.com/m/RRD2D1p1mi Ahmet On Friday,

Solr 4.x EdgeNGramFilterFactory and highlighting

2014-01-31 Thread Dmitriy Shvadskiy
Hello, We are using EdgeNGramFilterFactory to provide partial match on the search phrase for type ahead/autocomplete. Field type definition fieldType name=edgytext class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter

Re: SolrCloudServer questions

2014-01-31 Thread Software Dev
Which of any of these settings would be beneficial when bulk uploading? On Fri, Jan 31, 2014 at 11:05 AM, Mark Miller markrmil...@gmail.com wrote: On Jan 31, 2014, at 1:56 PM, Greg Walters greg.walt...@answers.com wrote: I'm assuming you mean CloudSolrServer here. If I'm wrong please

Re: SolrCloudServer questions

2014-01-31 Thread Mark Miller
Just make sure parallel updates is set to true. If you want to load even faster, you can use the bulk add methods, or if you need more fine grained responses, use the single add from multiple threads (though bulk add can also be done via multiple threads if you really want to try and push the

Removing last replica from a SolrCloud collection

2014-01-31 Thread David Smiley (@MITRE.org)
Hi, If I issue either a core UNLOAD command, or a collection DELETEREPLICA command, (which both seem pretty much equivalent) it works but if there are no other replicas for the shard, then the metadata for the shard is completely gone in clusterstate.json! That's pretty disconcerting because

Clone (or Restore) Solrcloud

2014-01-31 Thread David Smiley (@MITRE.org)
Hi, I'm attempting to come up with a SolrCloud restore / clone process for either recover to a known good state or to clone the environment for experimentation. At the moment my process involves either creating a new zookeeper environment or at least deleting the existing Collection so that I

Re: JVM heap constraints and garbage collection

2014-01-31 Thread Erick Erickson
Be a little careful when looking at on-disk index sizes. The *.fdt and *.fdx files are pretty irrelevant for the in-memory requirements. They are just read to assemble the response (usually 10-20 docs). That said, you can _make_ them more relevant by specifying very large document cache sizes.

facet.prefix or separation?

2014-01-31 Thread William Bell
What should be better for performance to get those facets that begins with A? 1. facet=truefacet.field=conditionsfacet.prefix=A 2. When indexing create a new field conditions_A, and use it: facet=truefacet.field=conditions_A Thoughts? -- Bill Bell billnb...@gmail.com cell 720-256-8076

Re: facet.prefix or separation?

2014-01-31 Thread Alexandre Rafalovitch
I am quite sure that the binary flag will be faster as you will just get a gigantic vector pre-loaded into memory. The problem starts if you are going to have lots of those prefixes. Then, the memory requirements may become an issue. Then, the facet becomes more flexible as it uses the same list

Re: facet.prefix or separation?

2014-01-31 Thread William Bell
Just to be perfectly clear, it is not a binary field. conditions = A west side story conditions = The edge of reason I look for those strings beginning with A and set that in conditions_A: conditions_A = A west side story OK? On Fri, Jan 31, 2014 at 9:29 PM, Alexandre Rafalovitch

Re: facet.prefix or separation?

2014-01-31 Thread Alexandre Rafalovitch
Ok, so you are pre-partitioning the facet field based on initial letter. So all the texts that start from A will go into conditions_A and all the texts that start from C will go into conditions_C. Interesting approach. Ignore whatever I said before. If this does not cause other issues, than it is

Re: facet.prefix or separation?

2014-01-31 Thread William Bell
This is the approach for words that begin with using an alpha-span on the site: A B C D E F G ... The user clicks A and I would use conditions_A. On Fri, Jan 31, 2014 at 9:42 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Ok, so you are pre-partitioning the facet field based on initial

solr joins

2014-01-31 Thread anand chandak
Folks, have a basic question regarding solr join, the wiki, http://wiki.apache.org/solr/Join states : - Fields or other properties of the documents being joined from are not available for use in processing of the resulting set of to documents (ie: you can not return fields in the from