Re: Solr interface

2014-04-07 Thread Andre Bois-Crettez
You can use Solrj : https://wiki.apache.org/solr/Solrj Anyway, even using http the performance is good. André On 2014-04-07 13:52, Jonathan Varsanik wrote: Do you mean to tell me that the people on this list that are indexing 100s of millions of documents are doing this over http? I have

Re: Solr Cloud Bulk Indexing Questions

2014-01-22 Thread Andre Bois-Crettez
1 node having more load should be the leader (because of the extra work of receiving and distributing updates, but my experiences show only a bit more CPU usage, and no difference in disk IO). A suggestion would be to hard commit much less often, ie every 10 minutes, and see if there is a

Re: Why do people want to deploy to Tomcat?

2013-11-12 Thread Andre Bois-Crettez
We are using Solr running on Tomcat. I think the top reasons for us are : - we already have nagios monitoring plugins for tomcat that trace queries ok/error, http codes / response time etc in access logs, number of threads, jvm memory usage etc - start, stop, watchdogs, logs : we also use our

Re: optimization suggstions

2013-11-12 Thread Andre Bois-Crettez
I suggest putting autoCommit at something as big as your memory allows (eg 15 minutes) to flush the update log to disk and start merging segments, but not yet visible on the search. Then at the end, send an explicit commit/ wich will both persist on disk the remainder of indexed docs and make

Re: Facet performance

2013-10-22 Thread Andre Bois-Crettez
This is with Solr 1.4. Really ? This sound really outdated to me. Have you tried a tried more recent version, 4.5 just went out ? -- André Bois-Crettez Software Architect Search Developer http://www.kelkoo.com/ Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège

Solr 4.4 : using SolrCloud, on reconnection to zookeeper, core sometimes goes down, never coming back alive

2013-10-15 Thread Andre Bois-Crettez
Hello all, We had this problem twice in 4 days, only in one of our 14 servers (2 shards 7 replicas) in Solr 4.4 : after successful re-connection to Zookeeper (triggered by Connection expired - starting a new one), sometimes the core stays down without coming back, and we have to restart the

Re: Get only those documents that are fully satisfied.

2013-09-24 Thread Andre Bois-Crettez
(Your schema and query only appear on the nabble.com forum, it is mostly empty for me on the mailing list) What you want is probable to change OR to AND : params.set(q.op, AND); André On 09/23/2013 04:44 PM, asuka wrote: Hi Jack, I've been working with the following schema field

Re: Solr - how do I index barcode

2013-08-08 Thread Andre Bois-Crettez
I would go with a tokenizer to split each character as a separate token. (maybe https://cwiki.apache.org/confluence/display/solr/Tokenizers#Tokenizers-RegularExpressionPatternTokenizer can do) Add a LowerCaseFilterFactory so that casing is ignored. Untested :

Re: softCommit doesn't work - ?

2013-08-07 Thread Andre Bois-Crettez
(a bit late, I know) On 07/23/2013 02:09 PM, Erick Erickson wrote: First a minor nit. The server.add(doc, time) is a hard commit, not a soft one. By default, no, commitWithin is indeed a soft commit. As per

Re: Clearing old nodes from zookeper without restarting solrcloud cluster

2013-07-17 Thread Andre Bois-Crettez
Indeed we are using UNLOAD of cores before shutting down extra replica nodes, works well but already said, it needs such nodes to be up. Once UNLOADed it is possible to stop them, works well for our use case. But if nodes are already down, maybe it is possible to manually create and upload a

Re: solrcloud 4.3.1 - stability and failure scenario questions

2013-06-24 Thread Andre Bois-Crettez
On 06/23/2013 05:53 AM, Shalin Shekhar Mangar wrote: Use shards.tolerant=true to return documents that are available in the shards that are still alive. Beware that currently shards.tolerant=true prevents grouping and facets : https://issues.apache.org/jira/browse/SOLR-3369 -- André

Re: yet another optimize question

2013-06-19 Thread Andre Bois-Crettez
facet fields eh? Thanks for the tip. Thanks Robi -Original Message- From: Andre Bois-Crettez [mailto:andre.b...@kelkoo.com] Sent: Tuesday, June 18, 2013 3:03 AM To: solr-user@lucene.apache.org Subject: Re: yet another optimize question Recently we had steadily increasing memory usage

Re: yet another optimize question

2013-06-18 Thread Andre Bois-Crettez
Recently we had steadily increasing memory usage and OOM due to facets on dynamic fields. The default facet.method=fc need to build a large array of maxdocs ints for each field (a fieldCache or fieldValueCahe entry), whether it is sparsely populated or not. Once you have reduced your number of

Re: Solr 4 memory usage increase

2013-05-17 Thread Andre Bois-Crettez
Can you explain your setup more ? ie. is it master/slave, indexing in parallel, etc ? We had to commit more often to reduce JVM memory usage due to transaction logs in SolrCloud mode, compared with previous setups without tlogs. update?commit=trueopenSearcher=false André On 05/17/2013 09:56

Re: Getting explain information of more like this search in a more usable format

2013-05-14 Thread Andre Bois-Crettez
On 05/13/2013 03:12 PM, Achim Domma wrote: I'm mainly interested in showing the terms which each result document has in common with the reference document. regards, Achim It seems a good job for highlighting ? http://docs.lucidworks.com/display/solr/Highlighting

Re: Getting explain information of more like this search in a more usable format

2013-05-14 Thread Andre Bois-Crettez
On 05/14/2013 03:44 PM, Andre Bois-Crettez wrote: On 05/13/2013 03:12 PM, Achim Domma wrote: I'm mainly interested in showing the terms which each result document has in common with the reference document. regards, Achim It seems a good job for highlighting ? http://docs.lucidworks.com

Re: Search performance: shards or replications?

2013-05-07 Thread Andre Bois-Crettez
Some clarifications : 1) *lots of docs, few queries* : If you have a high number of documents (+dozen millions) and lowish number of queries per second (say less than 10), replicas will not help to reduce the Qtime. For this kind of task it is better to shard the index, as each query will

Re: iterate through each document in Solr

2013-05-06 Thread Andre Bois-Crettez
On 05/06/2013 06:03 AM, Michael Sokolov wrote: On 5/5/13 7:48 PM, Mingfeng Yang wrote: Dear Solr Users, Does anyone know what is the best way to iterate through each document in a Solr index with billion entries? I tried to use select?q=*:*start=xxrows=500 to get 500 docs each time and then

Re: Memory problems with HttpSolrServer

2013-05-06 Thread Andre Bois-Crettez
On 05/06/2013 09:32 AM, Rogowski, Britta wrote: Hi! When I write from our database to a HttpSolrServer, (using a LinkedBlockingQueue to write just one document at a time), I run into memory problems (due to various constraints, I have to remain on a 32-bit system, so I can use at most 2 GB

Re: Indexing off of the production servers

2013-05-06 Thread Andre Bois-Crettez
Excellent idea ! And it is possible to use collection aliasing with the CREATEALIAS to make this transparent for the query side. ex. with 2 collections named : collection_1 collection_2 /collections?action=CREATEALIASname=collectionaliascollections=collection_1 collectionalias is now a virtual

Re: XInclude in data-config.xml

2013-04-12 Thread Andre Bois-Crettez
On 04/12/2013 09:31 AM, stockii wrote: hello. is it possible to include some entities with XInclude in my data-config.xml? We first struggled with XInclude, and then switched to use custom entities, which worked much better for our needs (reusing common parts in several SearchHandlers). ex.

Re: solr 3.4: memory leak?

2013-04-11 Thread Andre Bois-Crettez
On 04/11/2013 08:49 AM, Dmitry Kan wrote: SEVERE: The web application [/solr] appears to have started a thread named [MultiThreadedHttpConnectionManager cleanup] but has failed to stop it. This is very likely to create a memory leak. Apr 11, 2013 6:38:14 AM

Re: solre scores remains same for exact match and nearly exact match

2013-04-04 Thread Andre Bois-Crettez
On 04/03/2013 07:22 AM, amit wrote: Below is my query http://localhost:8983/solr/select/?q=subject:session management in phpfq=category:[*%20TO%20*]fl=category,score,subject You specify that you want session to appear in field subject, but the other tokens only match to the default search

Re: Flow Chart of Solr

2013-04-02 Thread Andre Bois-Crettez
On 04/02/2013 04:20 PM, Koji Sekiguchi wrote: (13/04/02 21:45), Furkan KAMACI wrote: Is there any documentation something like flow chart of Solr. i.e. Documents comes into Solr(maybe indicating which classes get documents) and goes to parsing process (i.e. stemming processes etc.) and then

Re: Out of memory on some faceting queries

2013-04-02 Thread Andre Bois-Crettez
On 04/02/2013 05:04 PM, Dotan Cohen wrote: How might I time the warming? I've been googling warming since your earlier message but there does not seem to be any really good documentation on the subject. If there is anything that you feel I should be reading I would appreciate a link or a keyword

Re: [ANNOUNCE] Apache Solr 4.2 released

2013-03-12 Thread Andre Bois-Crettez
On 03/12/2013 01:37 AM, Robert Muir wrote: * Collection Aliasing. Got time based data? Want to re-index in a temporary collection and then swap it into production? Done. Stay tuned for Shard Aliasing. Nice :) Seems that this solves the main use case I have for core SWAP (was missing in

Re: Why SolrInputDocument use a LinkedHashMap

2013-02-14 Thread Andre Bois-Crettez
Almost. I did not benchmark it but tend to believe this http://docs.oracle.com/javase/6/docs/api/java/util/LinkedHashMap.html : iteration over the collection-views of a LinkedHashMap requires time proportional to the /size/ of the map, regardless of its capacity. Iteration over a HashMap is

Re: Why SolrInputDocument use a LinkedHashMap

2013-02-13 Thread Andre Bois-Crettez
Maybe it is more about having fast iterations even on a large collection of fields ? André On 02/13/2013 12:43 PM, knort wrote: Programming some tests I found that two SolrInputDocuments with the same fields and values are different. Trying to figure it out why it's happening I found that the

Re: Blogpost about SOLR at Issuu

2013-02-13 Thread Andre Bois-Crettez
Thanks, very interesting. The admin interface is very useful (although it would be useful with a sample admin-extras.html file somewhere - where it should go and what can go in it would be good to know. Right now, all we get is an exception in the logs about the file not existing). You only

Re: Solr exception when parsing XML

2013-01-16 Thread Andre Bois-Crettez
Worth to note that some characters are completely forbidden in XML, such as chr(0). When dealing with external text input, some cleanup might be necessary to avoid breaking indexation. For example you could replace each forbidden XML character with . André On 01/15/2013 09:55 PM, Alexandre

Re: Solr exception when parsing XML

2013-01-16 Thread Andre Bois-Crettez
Forgot the link : http://en.wikipedia.org/wiki/Valid_characters_in_XML André On 01/16/2013 02:24 PM, Andre Bois-Crettez wrote: Worth to note that some characters are completely forbidden in XML, such as chr(0). When dealing with external text input, some cleanup might be necessary to avoid

Re: how to optimize same query with different start values

2013-01-15 Thread Andre Bois-Crettez
It looks like a use case for using Solrj with queryAndStreamResponse ? http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/client/solrj/SolrServer.html#queryAndStreamResponse%28org.apache.solr.common.params.SolrParams,%20org.apache.solr.client.solrj.StreamingResponseCallback%29 André

Re: SOLRJ - Error while using CommonsHttpSolrServer

2012-12-12 Thread Andre Bois-Crettez
On 11/01/2012 05:06 AM, Jegannathan Mehalingam wrote: Here is my code which uses CommonsHttpSolrServer: String url = http://localhost:8983/solr/#/solr/update/;; your solr url looks wrong, try this : http://localhost:8983/solr/update/ or maybe this one is you have a core named solr :

Re: How to SWAP cores (or collections) with SolrCloud (SOLR-3866)

2012-12-05 Thread Andre Bois-Crettez
On 12/05/2012 02:09 AM, Mark Miller wrote: On Dec 4, 2012, at 4:57 AM, Andre Bois-Crettezandre.b...@kelkoo.com wrote: * what can we do to help progress on SOLR-3866 ? Maybe use case scenarios, detailing desired behavior ? Constrains on what cores or collections are allowed to SWAP, ie. same

Re: Restricting search results by field value

2012-12-05 Thread Andre Bois-Crettez
If you do grouping on source_id, it should be enough to request 3 times more documents than you need, then reorder and drop the bottom. Is a 3x overhead acceptable ? On 12/05/2012 12:04 PM, Tom Mortimer wrote: Hi everyone, I've got a problem where I have docs with a source_id field, and

Re: FW: Replication error and Shard Inconsistencies..

2012-12-05 Thread Andre Bois-Crettez
Not sure but, maybe you are running out of file descriptors ? On each solr instance, look at the dashboard admin page, there is a bar with File Descriptor Count. However if this was the case, I would expect to see lots of errors in the solr logs... André On 12/05/2012 06:41 PM, Annette Newton

How to SWAP cores (or collections) with SolrCloud (SOLR-3866)

2012-12-04 Thread Andre Bois-Crettez
Hello, With solr-4.0.0, the useful SWAP command http://wiki.apache.org/solr/CoreAdmin#SWAP that allows to have a main core serving searches, while a temp core can be re-indexed from scratch, no longer works on SolrCloud, as was discussed here :Solr Swap Function doesn't work when using Solr

Re: Solr Swap Function doesn't work when using Solr Cloud - SOLR3866

2012-10-31 Thread Andre Bois-Crettez
Hello, Same as Sam, I believe the SWAP command is important for important use cases. For example, with Solr 3, we do use Current and Temp cores, so that incremental updates to the index are done live on Current, as well as searches. Whenever a full/baseline/from scratch index need to be

Re: Building a resilient cluster

2012-02-29 Thread Andre Bois-Crettez
You have to run ZK on a at least 3 different machines for fault tolerance (a ZK ensemble). http://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_shard_replicas_and_zookeeper_ensemble Ranjan Bagchi wrote: Hi, I'm interested in setting up a solr cluster where each machine [at

Re: SolrCloud on Trunk

2012-02-28 Thread Andre Bois-Crettez
Consistent hashing seem like a solution to reduce the shuffling of keys when adding/deleting shards : http://www.tomkleinpeter.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/ Twitter describe a more flexible sharding in section Gizzard handles partitioning through a forwarding

Re: Solrj commit affecting documents count

2012-01-31 Thread Andre Bois-Crettez
Why do you commit in the middle of a full import then, if you don't have to ? dprasadx wrote: Hi, I am using solrj server to commit few changes in the data into the master index through a java program. It works OK unless we do not do a full-import. But when I do a full-import (say for 800

Re: Solr Tomcat Maximum Heap Memory

2011-12-21 Thread Andre Bois-Crettez
Try running a 64bit JVM on your 64bits OS, it should work for much larger heaps sizes, be it Linux or Windows. Beware that the memory need is around 30% more important with a 64 bits JVM (bigger object pointers) if you are not using Compressed Oops :

Re: Collection Distribution vs Replication in Solr

2011-11-23 Thread Andre Bois-Crettez
Indeed, I can not see any of the 3 images here : http://wiki.apache.org/solr/SolrReplication#Admin_Page_for_Replication It just displays the name of image file, as the img url seem to point to a logged-only link such as this one :

Re: FunctionQuery score=0

2011-11-18 Thread Andre Bois-Crettez
... no? Debug doesn't include filter query only the below (changed a bit): BoostedQuery(boost(+fieldName:,boostedFunction(ord(fieldName),query))) On Thu, Nov 17, 2011 at 5:04 PM, Andre Bois-Crettez andre.b...@kelkoo.comwrote: John wrote: Some of the results are receiving score=0 in my

Re: FunctionQuery score=0

2011-11-17 Thread Andre Bois-Crettez
John wrote: Some of the results are receiving score=0 in my function and I would like them not to appear in the search results. you can use frange, and filter by score: q=ipodfq={!frange l=0 incl=false}query($q) -- André Bois-Crettez Search technology, Kelkoo http://www.kelkoo.com/

Re: Out of memory, not during import or updates of the index

2011-11-10 Thread Andre Bois-Crettez
Using Solr 3.4.0. That changelog actually says it should reduce memory usage for that version. We were on a much older version previously, 1.something. Norms are off on all fields that it can be turned off on. I'm just hoping this new version doesn't have any leaks. Does FastLRUCache vs

Re: Search Correlated Data between Multivalued Fields

2011-11-09 Thread Andre Bois-Crettez
I do not think this is possbile directly out of the box in Solr. A quick workaround would be to fully denormalize the data, ie instead of multivalued notes for a customer, have a completely flat index of customer_note. Or maybe a custom request handler plugin could actually check that matches

Re: Out of memory during the indexing

2011-11-09 Thread Andre Bois-Crettez
How much memory you actually allocate to the JVM ? http://wiki.apache.org/solr/SolrPerformanceFactors#Memory_allocated_to_the_Java_VM You need to increase the -Xmx value, otherwise your large ram buffers won't fit in the java heap. sivaprasad wrote: Hi, I am getting the following error

Re: Search Correlated Data between Multivalued Fields

2011-11-09 Thread Andre Bois-Crettez
. Dave. Sent from my iPhone On Nov 9, 2011, at 6:03 AM, Andre Bois-Crettez andre.b...@kelkoo.com wrote: I do not think this is possbile directly out of the box in Solr. A quick workaround would be to fully denormalize the data, ie instead of multivalued notes for a customer, have a completely

Re: [Profiling] How to profile/tune Solr server

2011-11-04 Thread Andre Bois-Crettez
SolrMeter is useful too, it can be plugged to a production server just to watch evolution of caches usage : http://code.google.com/p/solrmeter/wiki/Screenshots#CacheHistoryStatistic André