RE: out of memory during indexing do to large incoming queue

2013-06-03 Thread Yoni Amir
Solrconfig.xml - http://apaste.info/dsbv Schema.xml - http://apaste.info/67PI This solrconfig.xml file has optimization enabled. I had another file which I can't locate at the moment, in which I defined a custom merge scheduler in order to disable optimization. When I say 1000 segments, I

Solr + Groovy

2013-06-03 Thread Achim Domma
Hi, I have some query building and result processing code, which is currently running as normal Solr client outside of Solr. I think it would make a lot of sense to move parts of this code into a custom SearchHandler or SearchComponent. Because I'm not a big fan of the Java language, I would

how are you handling killer queries?

2013-06-03 Thread Bernd Fehling
How are you handling killer queries with solr? While solr/lucene (currently 4.2.1) is trying to do its best I see sometimes stupid queries in my logs, located with extremly long query time. Example: q=???+and+??+and+???+and++and+???+and+?? I even get hits for this

Re: Estimating the required volume to

2013-06-03 Thread Mysurf Mail
Thanks for your answer. Can you please elaborate on mssql text searching is pretty primitive compared to Solr (Link or anything) Thanks. On Sun, Jun 2, 2013 at 4:54 PM, Erick Erickson erickerick...@gmail.comwrote: 1 Maybe, maybe not. mssql text searching is pretty primitive compared to

Re: Removing a single value from a multiValue field

2013-06-03 Thread Dotan Cohen
On Thu, May 30, 2013 at 5:01 PM, Jack Krupansky j...@basetechnology.com wrote: You gave an XML example, so I assumed you were working with XML! Right, I did give the output as XML. I find XML to be a great document markup language, but a terrible command format! Mostly, due to (mis-)use of the

/non/existent/dir/yields/warning

2013-06-03 Thread Raheel Hasan
Hi, I am constantly getting this error in my solr log: Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: E:\Projects\apache_solr\solr-4.3.0\example\solr\genesis_experimental\non\existent\dir\yields\warning). Anyone got any idea on how to solve

Re: /non/existent/dir/yields/warning

2013-06-03 Thread Rafał Kuć
Hello! You should remove that entry from your solrconfig.xml file. It is something like this: lib dir=/non/existent/dir/yields/warning / -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch Hi, I am constantly getting this error in my solr

Re: /non/existent/dir/yields/warning

2013-06-03 Thread Raheel Hasan
ok thanks :) But why was it there anyway? I mean it says in comments: If a 'dir' option (with or without a regex) is used and nothing is found that matches, a warning will be logged. So it looks like a kind of exception handling or logging for libs not found... so shouldnt this folder actually

HostPort attribute of core tag in solr.xml

2013-06-03 Thread Prathik Puthran
Hi, I am not very sure what the hostPort attribute in core tag of solr.xml mean. Can someone please let me know? Thanks, Prathik

Constant score for more like this reference document

2013-06-03 Thread Achim Domma
I call the mlt handler using a query which searches for a certain document (?q=id:some_document_id). The reference document is included in the result and the score is also returned. I found out, that the score if fixed, independent of the document. So for each document id I get the same score.

Re: /non/existent/dir/yields/warning

2013-06-03 Thread Rafał Kuć
Hello! That's a good question. I suppose its there to show users how to setup a custom path to libraries. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch ok thanks :) But why was it there anyway? I mean it says in comments: If a 'dir'

Re: How can a Tokenizer be CoreAware?

2013-06-03 Thread Michael Sokolov
Benson, I think the idea is that Tokenizers are created as needed (from the TokenizerFactory), while those other objects are singular (one created for each corresponding stanza in solrconfig.xml). So Tokenizers should be short-lived; they'll be cleaned up after each use, and the assumption is

Re: Solr + Groovy

2013-06-03 Thread Michael Sokolov
On 6/3/13 3:07 AM, Achim Domma wrote: Hi, I have some query building and result processing code, which is currently running as normal Solr client outside of Solr. I think it would make a lot of sense to move parts of this code into a custom SearchHandler or SearchComponent. Because I'm not a

Re: Reindexing strategy

2013-06-03 Thread Dotan Cohen
On Fri, May 31, 2013 at 3:57 AM, Michael Sokolov msoko...@safaribooksonline.comgt wrote: On UNIX platforms, take a look at vmstat for basic I/O measurement, and iostat for more detailed stats. One coarse measurement is the number of blocked/waiting processes - usually this is due to I/O

SpatialRecursivePrefixTreeFieldType Spatial Searching

2013-06-03 Thread Chris Atkinson
Hi, I'm seeing really slow query times. 7-25 seconds when I run a simple filter query that uses my SpatialRecursivePrefixTreeFieldType field. My index is about 30k documents. Prior to adding the Spatial field, the on disk space was about 100Mb, so it's a really tiny index. Once I add the spatial

ContributorsGroup

2013-06-03 Thread Emrah Kara
Hi, Could you please add EmrahKara to ContributorsGroup in solr wiki? -- *[image: CNT logo] http://www.cntbilisim.com.tr/ **Emrah Kara* Developer at CNT Email / Gtalk: em...@cntbilisim.com.tr Skype: rockipsiz TEL: +90 232 3481851 GSM: +90 533 3634362 FAX: +90 232

Re: ContributorsGroup

2013-06-03 Thread Erick Erickson
Done, looking forward to your contributions! Erick On Mon, Jun 3, 2013 at 7:22 AM, Emrah Kara em...@cntbilisim.com.tr wrote: Hi, Could you please add EmrahKara to ContributorsGroup in solr wiki? -- *[image: CNT logo] http://www.cntbilisim.com.tr/ **Emrah Kara*

Re: SpatialRecursivePrefixTreeFieldType Spatial Searching

2013-06-03 Thread Chris Atkinson
Also, here is a sample query, and the debugQuery output fq={!cost=200}*:* -availability_spatial:Intersects(182.6 0 199.4 1) Incase the formatting is bad, here is a raw past of the debugQuery: http://pastie.org/pastes/872/text?key=ksjyboect4imrha0rck8sa ?xml version=1.0 encoding=UTF-8?

Re: Estimating the required volume to

2013-06-03 Thread Erick Erickson
Here's a link to various transformations you can do while indexing and searching in Solr: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Consider stemming ngrams WordDelimiterFilterFactory ASCIIFoldingFilterFactory phrase queries boosting synonyms blah blah blah You can't do

Re: /non/existent/dir/yields/warning

2013-06-03 Thread Raheel Hasan
Hi, but the path looks like it shows how to setup non existent lib warning... :D On Mon, Jun 3, 2013 at 2:56 PM, Rafał Kuć r@solr.pl wrote: Hello! That's a good question. I suppose its there to show users how to setup a custom path to libraries. -- Regards, Rafał Kuć Sematext ::

Re: FieldCache insanity with field used as facet and group

2013-06-03 Thread Elodie Sannier
I'm reproducing the problem with the 4.2.1 example with 2 shards. 1) started up solr shards, indexed the example data, and confirmed empty fieldCaches [sanniere@funlevel-dx example]$ java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar

Multitable import - uniqueKey

2013-06-03 Thread Raheel Hasan
Hi, I am importing multiple table (by join) into solr using DIH. All is set, except for 1 confusion: what to do with *uniqueKey* in schema? When I had only 1 table, I had it fine. Now how to put 2 uniqueKeys (both from different table). For example: uniqueKeytable1_id/uniqueKey

Re: Estimating the required volume to

2013-06-03 Thread Mysurf Mail
Hi, Thanks for your answer. I want to refer to your message, because I am trying to choose the right tool. 1. regarding stemming: I am running in ms-sql SELECT * FROM sys.dm_fts_parser ('FORMSOF(INFLECTIONAL,provide)', 1033, 0, 0) and I receive group_id phrase_id occurrence special_term

Re: how are you handling killer queries?

2013-06-03 Thread Shawn Heisey
On 6/3/2013 2:39 AM, Bernd Fehling wrote: How are you handling killer queries with solr? While solr/lucene (currently 4.2.1) is trying to do its best I see sometimes stupid queries in my logs, located with extremly long query time. Example:

Re: HostPort attribute of core tag in solr.xml

2013-06-03 Thread Shawn Heisey
On 6/3/2013 3:16 AM, Prathik Puthran wrote: I am not very sure what the hostPort attribute in core tag of solr.xml mean. Can someone please let me know? This only has meaning if you are using SolrCloud. This is how each Solr server in the cloud informs the cloud what port it is using.

Re: /non/existent/dir/yields/warning

2013-06-03 Thread Shawn Heisey
On 6/3/2013 5:58 AM, Raheel Hasan wrote: but the path looks like it shows how to setup non existent lib warning... :D The reason for its existence is encoded in its name. A nonexistent path results in a warning. It's a way to illustrate to a novice what happens when you have a non-fatal

Can mm (min-match) be specified by field in dismax or edismax?

2013-06-03 Thread Eric Wilson
I would like to have the min-match set differently for different fields in my dismax handler. Is this possible?

Re: how are you handling killer queries?

2013-06-03 Thread Bernd Fehling
Hi Shawn, well, the user is the world and the servers have enough capacity. So its nothing really to worry about. OK, could raise timeout from standard 60 to 90, 120 or even 180 seconds. Just wanted to know how other solr developer handle this. The technical question, where is the difference

Re: Multitable import - uniqueKey

2013-06-03 Thread Raheel Hasan
Hi, Thanks for the replies. Actually, I had only a small confusion: From table_1 I got key_1; using this I join into table_2. But table_2 also gave another key key_2 which is needed for joining with table_3. So for Table1 and Table2 its obviously just fine... but what will happen when table3 is

Re: /non/existent/dir/yields/warning

2013-06-03 Thread Raheel Hasan
ok fantastic... now I will comment it to be sure thanks a lot Regards, Raheel On Mon, Jun 3, 2013 at 7:27 PM, Shawn Heisey s...@elyograg.org wrote: On 6/3/2013 5:58 AM, Raheel Hasan wrote: but the path looks like it shows how to setup non existent lib warning... :D The reason for

Re: how are you handling killer queries?

2013-06-03 Thread Shawn Heisey
On 6/3/2013 8:43 AM, Bernd Fehling wrote: Hi Shawn, well, the user is the world and the servers have enough capacity. So its nothing really to worry about. OK, could raise timeout from standard 60 to 90, 120 or even 180 seconds. Just wanted to know how other solr developer handle this. The

Re: Multitable import - uniqueKey

2013-06-03 Thread Jack Krupansky
Same answer. Whether it is 2, 3, 10 or 1000 tables, you, the data architect must decide how to uniquely identify Solr documents. In general, when joining n tables, combine the n keys into one composite key. Either do it on the SQL query side, or with a Solr update request processor. -- Jack

RE: Spell Checker (DirectSolrSpellChecker) correct settings

2013-06-03 Thread Dyer, James
My first guess is that no documents match the query provinical court. Because you have spellcheck.maxCollationTries set to a non-zero value, it will not return these as collations unless the correction will return hits. You can test my theory out by removing spellcheck.maxCollationTries from

Re: how are you handling killer queries?

2013-06-03 Thread Jack Krupansky
There are two radically distinct use cases: 1. Consumers on the open Internet. They do stupid things. Give them a very constrained search experience, enforced with query preprocessing. Maybe give them only dismax queries. 2. Professional power users. They typically have credentials for using

Re: Multitable import - uniqueKey

2013-06-03 Thread Raheel Hasan
ok. But do we need it? Thats what I am confused at. should 1 key from table_1 pull all the data in relationship as they were inserted? On Mon, Jun 3, 2013 at 7:53 PM, Jack Krupansky j...@basetechnology.comwrote: Same answer. Whether it is 2, 3, 10 or 1000 tables, you, the data architect must

Re: Can mm (min-match) be specified by field in dismax or edismax?

2013-06-03 Thread Jack Krupansky
No, but you can with the LucidWorks Search query parser: f1:(cat dog fox bat fish cow)~50% f2:(cat dog fox bat fish zebra)~2 See: http://docs.lucidworks.com/display/lweug/Minimum+Match+for+Simple+Queries -- Jack Krupansky -Original Message- From: Eric Wilson Sent: Monday, June 03,

Re: Can mm (min-match) be specified by field in dismax or edismax?

2013-06-03 Thread Jason Hellman
Well, there is a hack(ish) way to do it: _query_:{!type=edismax qf='someField' v='$q' mm=100%} This is clearly not a solrconfig.xml settings, but part of your query string using LocalParam behavior. This is going to get really messy if you have plenty of fields you'd like to search, where

Re: updating docs in solr cloud hangs

2013-06-03 Thread Yago Riveiro
Hi, My cluster hangs again running an update process, the HTTP POST request was aborted because a timeout error. After the hang, I couldn't do more updates without restart the cluster. I could see this error on node's log after kill it. Is like if solr waits for the update response forever …

RE: Spell Checker (DirectSolrSpellChecker) correct settings

2013-06-03 Thread Dyer, James
For each fot he 4 cases listed below, can you give your query request string (q=...fq=...qt=...etc) and also the spellchecker output? James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Raheel Hasan [mailto:raheelhasan@gmail.com] Sent: Monday, June 03, 2013

Re: how are you handling killer queries?

2013-06-03 Thread Roman Chyla
I think you should take a look at the TimeLimitingCollector (it is used also inside SolrIndexSearcher). My understanding is that it will stop your server from consuming unnecessary resources. --roman On Mon, Jun 3, 2013 at 4:39 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: How are

Solr: separating index and storage

2013-06-03 Thread Sourajit Basak
Consider the following use case. Certain words are extracted from a document and indexed. The exact sentence containing the word cannot be stored alongside the extracted word because of the volume at which the documents grow; How can the index and, lets call it doc servers be separated ? An

Re: Solr query performance tool

2013-06-03 Thread bbarani
You can use this tool to analyze the logs.. https://github.com/dfdeshom/solr-loganalyzer We use solrmeter to test the performance / Stress testing. https://code.google.com/p/solrmeter/ -- View this message in context:

Re: how are you handling killer queries?

2013-06-03 Thread Jack Krupansky
There is the timeAllowed parameter: http://wiki.apache.org/solr/CommonQueryParameters#timeAllowed -- Jack Krupansky -Original Message- From: Roman Chyla Sent: Monday, June 03, 2013 11:53 AM To: solr-user@lucene.apache.org Subject: Re: how are you handling killer queries? I think you

Saravanan Chinnadurai/Actionimages is out of the office.

2013-06-03 Thread Saravanan . Chinnadurai
I will be out of the office starting 03/06/2013 and will not return until 04/06/2013. Please email to itsta...@actionimages.com for any urgent issues. Action Images is a division of Reuters Limited and your data will therefore be protected in accordance with the Reuters Group Privacy / Data

Solr 4.2.1 higher memory footprint vs Solr 3.5

2013-06-03 Thread SandeepM
Hi, Using the same schema for both Solr 3.5 and Solr 4.2.1 and posting the same data to both these server, and the memory requirements seem to have gone up sharply during request handling. . Requests come in at around 200QPS. . Document sizes are very large but that did not seem to be a problem

Re: Can mm (min-match) be specified by field in dismax or edismax?

2013-06-03 Thread Jack Krupansky
Also, just to be clear, MM/minMatch, is not an option for a field but for a full BooleanQuery. I mean, you can't have two different MM values within the same BooleanQuery, except with nested BooleanQuerys, where each BQ has its own MM. -- Jack Krupansky -Original Message- From:

Re: Disable all caches in solr

2013-06-03 Thread bbarani
You can also check out this link. http://lucene.472066.n3.nabble.com/Is-there-a-way-to-remove-caches-in-SOLR-td4061216.html#a4061219 -- View this message in context: http://lucene.472066.n3.nabble.com/Disable-all-caches-in-solr-tp4066517p4067870.html Sent from the Solr - User mailing list

Re: Solr + Groovy

2013-06-03 Thread Achim Domma
Looks interesting, but it's just for the UpdateHandler. Right? Does a similar handler for searching already exist? Achim Am 03.06.2013 um 17:22 schrieb Jack Krupansky: Check out the support for external scripting of update request processors:

Re: Solr + Groovy

2013-06-03 Thread Jack Krupansky
Sorry about that. Unfortunately, scripting is only on the update side. But I imagine athat a lot of the logic could be repurposed for the query side. -- Jack Krupansky -Original Message- From: Achim Domma Sent: Monday, June 03, 2013 2:31 PM To: solr-user@lucene.apache.org Subject:

Re: Solr + Groovy

2013-06-03 Thread Erik Hatcher
Yeah, it's currently just for the update side of things. But this issue is open https://issues.apache.org/jira/browse/SOLR-3669 and assigned to me, for one of these days. I set it for my 5.0 radar. Certainly anyone that wants to make this happen sooner than I maybe will possibly hopefully

Re: Dynamic Indexing using DB and DIH

2013-06-03 Thread Shawn Heisey
On 6/3/2013 12:35 PM, PeriS wrote: I noticed the delta-import is creating a new indexed entry on top of the existing one..is that normal? Not sure what you are asking here, so I'll give an answer to the question I think you're asking: If you have a uniqueKey defined in your schema, then

Re: Dynamic Indexing using DB and DIH

2013-06-03 Thread PeriS
Shawn, You got the point; I do have a the unique key defined, but for some reason, when i run the delta-import; a new entry is created for the same record with a new unique key. Its almost somehow it doesn't detect the existing record. On Jun 3, 2013, at 3:51 PM, Shawn Heisey

Re: Custom Response Handler

2013-06-03 Thread vibhoreng04
Hi Erik, In my case I have to calculate a custom value depending on the retrieved candidates .This will be for each document.So my choice will be Doc Transformer. Lets say in this case if I need to include a java class which does the computation , how does I tie that with Doc transformer. Solr

Re: Custom Response Handler

2013-06-03 Thread bbarani
You can refer this post to use doctransforemers.. http://java.dzone.com/news/solr-40-doctransformers-first -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-Response-Handler-tp4067558p4067926.html Sent from the Solr - User mailing list archive at Nabble.com.

Inconsistent Full import document index counts.

2013-06-03 Thread chris . donaher
Hello All, I've been working on a 2-shard SolrCloud instance with several million documents, and the import process has recently begun to miss documents as they are added to the underlying Postgres database. There are no glaring failures in the log files (all SEVERE and WARNING level errors

RE: Solr query performance tool

2013-06-03 Thread Greg Harris
You have to be careful looking at the QTime's. They do not include garbage collection. I've run into issues where QTime is short (cause it was), it just happened that the query came in during a long garbage collection where everything was paused. So you can get into situations where once the

Re: Solr query performance tool

2013-06-03 Thread Shawn Heisey
On 6/3/2013 3:33 PM, Greg Harris wrote: You have to be careful looking at the QTime's. They do not include garbage collection. I've run into issues where QTime is short (cause it was), it just happened that the query came in during a long garbage collection where everything was paused. So

SolrCloud Load Balancer weight

2013-06-03 Thread Tim Vaillancourt
Hey guys, I have recently looked into an issue with my Solrcloud related to very high load when performing a full-import on DIH. While some work could be done to improve my queries, etc in DIH, this lead me to a new feature idea in Solr: weighted internal load balancing. Basically, I can think

Re: SpatialRecursivePrefixTreeFieldType Spatial Searching

2013-06-03 Thread Smiley, David W.
Hi Chris: Have you read: http://wiki.apache.org/solr/SpatialForTimeDurations You're modeling your data sub-optimally. Full precision rectangles (distErrPct=0) doesn't scale well and you're seeing that. You should represent your durations as a point and it will take up a fraction of the space

Leader election deadlock after restarting leader in 4.2.1

2013-06-03 Thread John Guerrero
SOLR 4.2.1, tomcat 6.0.35, CentOS 6.2 (2.6.32-220.4.1.el6.x86_64 #1 SMP), java 6u27 64 bit 6 nodes, 2 shards, 3 replicas each. Names changed to r1s2 (replica1 - shard 2), r2s2, and r3s2 for each replica in shard 2. What we see: * Under production load, we restart a leader (r1s2), and observe in

Re: SolrCloud Load Balancer weight

2013-06-03 Thread Mark Miller
On Jun 3, 2013, at 3:33 PM, Tim Vaillancourt t...@elementspace.com wrote: Should I JIRA this? Thoughts? Yeah - it's always been in the back of my mind - it's come up a few times - eventually we would like nodes to report some stats to zk to influence load balancing. - mark

How to Get Cluster State By Solrj?

2013-06-03 Thread Furkan KAMACI
I want to get cluster state of my SolrCloud by Solrj (I know that admin page shows it but I want to customize it at my application). Firstly wiki says that: CloudSolrServer server = new CloudSolrServer(localhost:9983); why CloudSolrServer takes only one Zookeeper host:port as an argument? I

Re: How to Get Cluster State By Solrj?

2013-06-03 Thread Mark Miller
It actually accepts a comma separated list of zk host addresses (your quorum). Same format as zk describes in it's docs. To get the cluster state, get the ZkStateReader from the CloudSolrServer and then it's getClusterState or something. - Mark On Jun 3, 2013, at 5:30 PM, Furkan KAMACI

Re: Leader election deadlock after restarting leader in 4.2.1

2013-06-03 Thread Mark Miller
Thanks - I can try and look into this perhaps next week. You might copy the details into a JIRA issue to prevent it from getting lost though... - Mark On Jun 3, 2013, at 4:46 PM, John Guerrero jguerr...@tagged.com wrote: SOLR 4.2.1, tomcat 6.0.35, CentOS 6.2 (2.6.32-220.4.1.el6.x86_64 #1