You can use Solrj : https://wiki.apache.org/solr/Solrj
Anyway, even using http the performance is good.
André
On 2014-04-07 13:52, Jonathan Varsanik wrote:
Do you mean to tell me that the people on this list that are indexing 100s of
millions of documents are doing this over http? I have
1 node having more load should be the leader (because of the extra work
of receiving and distributing updates, but my experiences show only a
bit more CPU usage, and no difference in disk IO).
A suggestion would be to hard commit much less often, ie every 10
minutes, and see if there is a
We are using Solr running on Tomcat.
I think the top reasons for us are :
- we already have nagios monitoring plugins for tomcat that trace
queries ok/error, http codes / response time etc in access logs, number
of threads, jvm memory usage etc
- start, stop, watchdogs, logs : we also use our
I suggest putting autoCommit at something as big as your memory allows
(eg 15 minutes) to flush the update log to disk and start merging
segments, but not yet visible on the search.
Then at the end, send an explicit commit/ wich will both persist on
disk the remainder of indexed docs and make
This is with Solr 1.4.
Really ?
This sound really outdated to me.
Have you tried a tried more recent version, 4.5 just went out ?
--
André Bois-Crettez
Software Architect
Search Developer
http://www.kelkoo.com/
Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège
Hello all,
We had this problem twice in 4 days, only in one of our 14 servers (2
shards 7 replicas) in Solr 4.4 : after successful re-connection to
Zookeeper (triggered by Connection expired - starting a new one),
sometimes the core stays down without coming back, and we have to
restart the
(Your schema and query only appear on the nabble.com forum, it is mostly
empty for me on the mailing list)
What you want is probable to change OR to AND :
params.set(q.op, AND);
André
On 09/23/2013 04:44 PM, asuka wrote:
Hi Jack,
I've been working with the following schema field
I would go with a tokenizer to split each character as a separate token.
(maybe
https://cwiki.apache.org/confluence/display/solr/Tokenizers#Tokenizers-RegularExpressionPatternTokenizer
can do)
Add a LowerCaseFilterFactory so that casing is ignored.
Untested :
(a bit late, I know)
On 07/23/2013 02:09 PM, Erick Erickson wrote:
First a minor nit. The server.add(doc, time) is a hard commit, not a soft one.
By default, no, commitWithin is indeed a soft commit.
As per
Indeed we are using UNLOAD of cores before shutting down extra replica
nodes, works well but already said, it needs such nodes to be up.
Once UNLOADed it is possible to stop them, works well for our use case.
But if nodes are already down, maybe it is possible to manually create
and upload a
On 06/23/2013 05:53 AM, Shalin Shekhar Mangar wrote:
Use shards.tolerant=true to return documents that are available in the
shards that are still alive.
Beware that currently shards.tolerant=true prevents grouping and facets :
https://issues.apache.org/jira/browse/SOLR-3369
--
André
facet fields eh?
Thanks for the tip.
Thanks
Robi
-Original Message-
From: Andre Bois-Crettez [mailto:andre.b...@kelkoo.com]
Sent: Tuesday, June 18, 2013 3:03 AM
To: solr-user@lucene.apache.org
Subject: Re: yet another optimize question
Recently we had steadily increasing memory usage
Recently we had steadily increasing memory usage and OOM due to facets
on dynamic fields.
The default facet.method=fc need to build a large array of maxdocs ints
for each field (a fieldCache or fieldValueCahe entry), whether it is
sparsely populated or not.
Once you have reduced your number of
Can you explain your setup more ?
ie. is it master/slave, indexing in parallel, etc ?
We had to commit more often to reduce JVM memory usage due to
transaction logs in SolrCloud mode, compared with previous setups
without tlogs.
update?commit=trueopenSearcher=false
André
On 05/17/2013 09:56
On 05/13/2013 03:12 PM, Achim Domma wrote:
I'm mainly interested in showing the terms which each result document has in
common with the reference document.
regards,
Achim
It seems a good job for highlighting ?
http://docs.lucidworks.com/display/solr/Highlighting
On 05/14/2013 03:44 PM, Andre Bois-Crettez wrote:
On 05/13/2013 03:12 PM, Achim Domma wrote:
I'm mainly interested in showing the terms which each result document has in
common with the reference document.
regards,
Achim
It seems a good job for highlighting ?
http://docs.lucidworks.com
Some clarifications :
1) *lots of docs, few queries* : If you have a high number of documents
(+dozen millions) and lowish number of queries per second (say less than
10), replicas will not help to reduce the Qtime. For this kind of task
it is better to shard the index, as each query will
On 05/06/2013 06:03 AM, Michael Sokolov wrote:
On 5/5/13 7:48 PM, Mingfeng Yang wrote:
Dear Solr Users,
Does anyone know what is the best way to iterate through each document in a
Solr index with billion entries?
I tried to use select?q=*:*start=xxrows=500 to get 500 docs each time
and then
On 05/06/2013 09:32 AM, Rogowski, Britta wrote:
Hi!
When I write from our database to a HttpSolrServer, (using a
LinkedBlockingQueue to write just one document at a time), I run into memory
problems (due to various constraints, I have to remain on a 32-bit system, so I
can use at most 2 GB
Excellent idea !
And it is possible to use collection aliasing with the CREATEALIAS to
make this transparent for the query side.
ex. with 2 collections named :
collection_1
collection_2
/collections?action=CREATEALIASname=collectionaliascollections=collection_1
collectionalias is now a virtual
On 04/12/2013 09:31 AM, stockii wrote:
hello.
is it possible to include some entities with XInclude in my data-config.xml?
We first struggled with XInclude, and then switched to use custom
entities, which worked much better for our needs (reusing common parts
in several SearchHandlers).
ex.
On 04/11/2013 08:49 AM, Dmitry Kan wrote:
SEVERE: The web application [/solr] appears to have started a thread named
[MultiThreadedHttpConnectionManager cleanup] but has failed to stop it.
This is very likely to create a memory leak.
Apr 11, 2013 6:38:14 AM
On 04/03/2013 07:22 AM, amit wrote:
Below is my query
http://localhost:8983/solr/select/?q=subject:session management in
phpfq=category:[*%20TO%20*]fl=category,score,subject
You specify that you want session to appear in field subject, but
the other tokens only match to the default search
On 04/02/2013 04:20 PM, Koji Sekiguchi wrote:
(13/04/02 21:45), Furkan KAMACI wrote:
Is there any documentation something like flow chart of Solr. i.e.
Documents comes into Solr(maybe indicating which classes get documents) and
goes to parsing process (i.e. stemming processes etc.) and then
On 04/02/2013 05:04 PM, Dotan Cohen wrote:
How might I time the warming? I've been googling warming since your
earlier message but there does not seem to be any really good
documentation on the subject. If there is anything that you feel I
should be reading I would appreciate a link or a keyword
On 03/12/2013 01:37 AM, Robert Muir wrote:
* Collection Aliasing. Got time based data? Want to re-index in a
temporary collection and then swap it into production? Done. Stay
tuned for Shard Aliasing.
Nice :)
Seems that this solves the main use case I have for core SWAP (was
missing in
Almost. I did not benchmark it but tend to believe this
http://docs.oracle.com/javase/6/docs/api/java/util/LinkedHashMap.html :
iteration over the collection-views of a LinkedHashMap requires time
proportional to the /size/ of the map, regardless of its capacity.
Iteration over a HashMap is
Maybe it is more about having fast iterations even on a large collection
of fields ?
André
On 02/13/2013 12:43 PM, knort wrote:
Programming some tests I found that two SolrInputDocuments with the same
fields and values are different.
Trying to figure it out why it's happening I found that the
Thanks, very interesting.
The admin interface is very useful (although it would be useful with a
sample admin-extras.html file somewhere - where it should go and what
can go in it would be good to know. Right now, all we get is an
exception in the logs about the file not existing).
You only
Worth to note that some characters are completely forbidden in XML, such
as chr(0).
When dealing with external text input, some cleanup might be necessary
to avoid breaking indexation.
For example you could replace each forbidden XML character with .
André
On 01/15/2013 09:55 PM, Alexandre
Forgot the link : http://en.wikipedia.org/wiki/Valid_characters_in_XML
André
On 01/16/2013 02:24 PM, Andre Bois-Crettez wrote:
Worth to note that some characters are completely forbidden in XML, such
as chr(0).
When dealing with external text input, some cleanup might be necessary
to avoid
It looks like a use case for using Solrj with queryAndStreamResponse ?
http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/client/solrj/SolrServer.html#queryAndStreamResponse%28org.apache.solr.common.params.SolrParams,%20org.apache.solr.client.solrj.StreamingResponseCallback%29
André
On 11/01/2012 05:06 AM, Jegannathan Mehalingam wrote:
Here is my code which uses CommonsHttpSolrServer:
String url = http://localhost:8983/solr/#/solr/update/;;
your solr url looks wrong, try this :
http://localhost:8983/solr/update/
or maybe this one is you have a core named solr :
On 12/05/2012 02:09 AM, Mark Miller wrote:
On Dec 4, 2012, at 4:57 AM, Andre Bois-Crettezandre.b...@kelkoo.com wrote:
* what can we do to help progress on SOLR-3866 ? Maybe use case
scenarios, detailing desired behavior ? Constrains on what cores or
collections are allowed to SWAP, ie. same
If you do grouping on source_id, it should be enough to request 3 times
more documents than you need, then reorder and drop the bottom.
Is a 3x overhead acceptable ?
On 12/05/2012 12:04 PM, Tom Mortimer wrote:
Hi everyone,
I've got a problem where I have docs with a source_id field, and
Not sure but, maybe you are running out of file descriptors ?
On each solr instance, look at the dashboard admin page, there is a
bar with File Descriptor Count.
However if this was the case, I would expect to see lots of errors in
the solr logs...
André
On 12/05/2012 06:41 PM, Annette Newton
Hello,
With solr-4.0.0, the useful SWAP command
http://wiki.apache.org/solr/CoreAdmin#SWAP that allows to have a main
core serving searches, while a temp core can be re-indexed from scratch,
no longer works on SolrCloud, as was discussed here :Solr Swap Function
doesn't work when using Solr
Hello,
Same as Sam, I believe the SWAP command is important for important use
cases.
For example, with Solr 3, we do use Current and Temp cores, so that
incremental updates to the index are done live on Current, as well as
searches.
Whenever a full/baseline/from scratch index need to be
You have to run ZK on a at least 3 different machines for fault
tolerance (a ZK ensemble).
http://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_shard_replicas_and_zookeeper_ensemble
Ranjan Bagchi wrote:
Hi,
I'm interested in setting up a solr cluster where each machine [at
Consistent hashing seem like a solution to reduce the shuffling of keys
when adding/deleting shards :
http://www.tomkleinpeter.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/
Twitter describe a more flexible sharding in section Gizzard handles
partitioning through a forwarding
Why do you commit in the middle of a full import then, if you don't have
to ?
dprasadx wrote:
Hi, I am using solrj server to commit few changes in the data into the master
index through a java program. It works OK unless we do not do a full-import.
But when I do a full-import (say for 800
Try running a 64bit JVM on your 64bits OS, it should work for much
larger heaps sizes, be it Linux or Windows.
Beware that the memory need is around 30% more important with a 64 bits
JVM (bigger object pointers) if you are not using Compressed Oops :
Indeed, I can not see any of the 3 images here :
http://wiki.apache.org/solr/SolrReplication#Admin_Page_for_Replication
It just displays the name of image file, as the img url seem to point to
a logged-only link such as this one :
... no?
Debug doesn't include filter query only the below (changed a bit):
BoostedQuery(boost(+fieldName:,boostedFunction(ord(fieldName),query)))
On Thu, Nov 17, 2011 at 5:04 PM, Andre Bois-Crettez
andre.b...@kelkoo.comwrote:
John wrote:
Some of the results are receiving score=0 in my
John wrote:
Some of the results are receiving score=0 in my function and I would like
them not to appear in the search results.
you can use frange, and filter by score:
q=ipodfq={!frange l=0 incl=false}query($q)
--
André Bois-Crettez
Search technology, Kelkoo
http://www.kelkoo.com/
Using Solr 3.4.0. That changelog actually says it should reduce memory usage
for that version. We were on a much older version previously, 1.something.
Norms are off on all fields that it can be turned off on.
I'm just hoping this new version doesn't have any leaks. Does FastLRUCache vs
I do not think this is possbile directly out of the box in Solr.
A quick workaround would be to fully denormalize the data, ie instead of
multivalued notes for a customer, have a completely flat index of
customer_note.
Or maybe a custom request handler plugin could actually check that
matches
How much memory you actually allocate to the JVM ?
http://wiki.apache.org/solr/SolrPerformanceFactors#Memory_allocated_to_the_Java_VM
You need to increase the -Xmx value, otherwise your large ram buffers
won't fit in the java heap.
sivaprasad wrote:
Hi,
I am getting the following error
. Dave.
Sent from my iPhone
On Nov 9, 2011, at 6:03 AM, Andre Bois-Crettez andre.b...@kelkoo.com wrote:
I do not think this is possbile directly out of the box in Solr.
A quick workaround would be to fully denormalize the data, ie instead of
multivalued notes for a customer, have a completely
SolrMeter is useful too, it can be plugged to a production server just
to watch evolution of caches usage :
http://code.google.com/p/solrmeter/wiki/Screenshots#CacheHistoryStatistic
André
50 matches
Mail list logo