Any rule of thumb regarding the size of document limitation when storing it
in solr?
Otis Gospodnetic-5 wrote
Use Solr. It's pretty clear you don't yet have any problems that
would make you think about alternatives. Using Solr to store and not
just index will make your life simpler (and
you can use multithread.
for fast , you also can cal(general hash algrothim) solrserver to add docs.
--
View this message in context:
http://lucene.472066.n3.nabble.com/CloudSolrServer-vs-ConcurrentUpdateSolrServer-for-indexing-tp4055772p4056606.html
Sent from the Solr - User mailing list
Hej Mark;
What did you use to prepare your presentation, its really nice.
2013/4/17 Furkan KAMACI furkankam...@gmail.com
Really nice presentation.
2013/4/17 Mark Miller markrmil...@gmail.com
On Apr 16, 2013, at 1:36 AM, SuoNayi suonayi2...@163.com wrote:
Hi, can someone explain more
Rogalon wrote
Am 16. April 2013 um 14:46 schrieb Yonik Seeley-4 [via Lucene] lt;
ml-node+s472066n4056299h21@.nabble
gt;:
On Tue, Apr 16, 2013 at 7:51 AM, Rogalon [hidden email] wrote:
Hi,
I am using pretty complex function queries to completely customize (not
only
boost) the score
Field type is string and this has happened for multiple docs over the past week.
Regards,
Ayush
Date: Tue, 16 Apr 2013 14:06:40 -0600
Subject: Re: Document Missing from Share in Solr cloud
From: thelabd...@gmail.com
To: solr-user@lucene.apache.org
btw ... what is the field type of your
I managed to get this done. The facet queries now facets on a multivalue
field as opposed to the dynamic field names.
Unfortunately it doesn't seem to have done much difference, if any at all.
Some more information that might help:
The JVM memory seem to be eaten up slowly. I dont think that
Well, your numdocs *is* the same. Your maxdocs isn't, which sounds right
to me.
maxdocs is the number of documents, including deleted ones. Given
deleted docs are purged by background merges, it makes sense that each
index is deciding differently when to do those merges. But the number of
Sorry.. i didn't understand that...
did you mean to configure CloudSolrServer with general hash algorithm?
./zahoor
On 17-Apr-2013, at 1:06 PM, rulinma ruli...@gmail.com wrote:
you also can cal(general hash algrothim) solrserver to add docs.
tl;dr: retrieving 10,000 docs is a bad idea. Look into docValues for
storing security info
I suspect that you'll be better served by keeping the permissions
up-to-date in solr and invalidating the caches rather than trying to return
10,000 docs. On average, you'll be attempting to read up to
Sorry, made a copy paste mistake. The numbers are different.
My cloud has two shards with each shard having 1 replica. One of the shards and
replica have the same number of docs, while in the other shard there is a
mismatch.
Regards,
Ayush
From: u...@odoko.co.uk
To:
Hi
I am pumping parallel select queries using CloudSolrServer.
It looks like it can handle only certain no of max connections...
my Question is,
How many concurrent queries does a CloudSolrServer can handle?
An old thread tries to answer this by asking to give our own instance of
John Nielsen [j...@mcb.dk] wrote:
I managed to get this done. The facet queries now facets on a multivalue
field as opposed to the dynamic field names.
Unfortunately it doesn't seem to have done much difference, if any at all.
I am sorry to hear that.
documents = ~1.400.000
references
I have just experienced the same thing on 4.2.1. 4 Shards - each with 2
replicas. Did some bulk loading and all but one Shard match up. Small
discrepancy between the replicas, but no obvious errors either. Will be
doing further loading shortly and will report findings.
Regards.
Netty.
On 17
Hi
We are currently considering running solr cloud on vmware.
Di you have any insights regarding the issue you encountered and generally
regarding using virtual machines instead of physical machines for solr
cloud?
Frank Wennerdahl wrote
Hi Otis and thanks for your response.
We are indeed
Hi,
We have run solr in VM environments extensively (3.6 not Cloud, but the
issues will be similar).
There are some significant things to be aware of when running Solr in a
virtualized environment (these can be equally true with Hyper-V and Xen as
well):
If you're doing heavy indexing, the
Personally I've never heard of a 500 document limit, I routinely use
1,000 doc batches (relatively small documents). Possibly your
co-worker exceeded the packet size or some other outside-solr
limitation?
Erick
On Mon, Apr 15, 2013 at 6:06 PM, Michael McCandless
luc...@mikemccandless.com wrote:
How big are you transaction logs? They can be replayed on startup.
They are truncated and a new one started when you do a hard commit
(openSearcher true or false doesn't matter).
So a quick test of this theory would be to just stop your indexing
process, issue a hard commit on all your cores and
Sorry for not providing enough details initially. You're right, it's
difficult for me to share the real code but let me try and give you an
example.
dataConfig
xi:include href=mydatasource.xml
xmlns:xi=http://www.w3.org/2001/XInclude/
document
I am surprised about the lack of UnInverted from your logs as it is
logged on INFO level.
Nope, no trace of it. No mention either in Logging - Level from the admin
interface.
It should also be available from the admin interface under
collection/Plugin / Stats/CACHE/fieldValueCache.
I never
That post lost a lot of formatting. Please find attached instead.
db-data-config.xml
http://lucene.472066.n3.nabble.com/file/n4056649/db-data-config.xml
--
View this message in context:
Version 4.2.0
collection1 example
I currently have indexed over 1.5 million html files, with more to come.
Here is an issue I am running into, if I search the word mayor I get a great
list of results.
Now if I search the word bing I get results. Searching the words together
mayor bing with
Hi,
I need my tokenizer factory , to split on everything expect numbers ,
letters , '' , ':' and single quote character.
I use 'PatternTokenizerFactory' as below,
tokenizer class=solr.PatternTokenizerFactory
pattern=[^a-zA-Z0-9amp;-:] /
but, its spiting tokens by space only . not sure what I
Hyphen indicates as character range (as in a-z), so if you want to include
a hyphen as a character, escape it with a single backslash.
-- Jack Krupansky
-Original Message-
From: meghana
Sent: Wednesday, April 17, 2013 7:58 AM
To: solr-user@lucene.apache.org
Subject: Pattern Tokenizer
On 17 April 2013 17:10, paulblyth blythy_...@hotmail.com wrote:
That post lost a lot of formatting. Please find attached instead.
db-data-config.xml
http://lucene.472066.n3.nabble.com/file/n4056649/db-data-config.xml
I do not see how this could be working in either case.
Your select statement
Jack Krupansky-2 wrote
Hyphen indicates as character range (as in a-z), so if you want to
include
a hyphen as a character, escape it with a single backslash.
-- Jack Krupansky
-Original Message-
From: meghana
Sent: Wednesday, April 17, 2013 7:58 AM
To:
solr-user@.apache
On Tue, Apr 16, 2013 at 9:44 PM, Roman Chyla roman.ch...@gmail.com wrote:
Is there some profound reason why the defType is not passed onto the filter
query?
defType is a convenience so that the main query parameter q can
directly be the user query (without specifying it's type like
edismax).
John Nielsen [j...@mcb.dk]:
I never seriously looked at my fieldValueCache. It never seemed to get used:
http://screencast.com/t/YtKw7UQfU
That was strange. As you are using a multi-valued field with the new setup,
they should appear there. Can you find the facet fields in any of the other
If updateLog tag is manadatory than why is it given as a parameter in
solrconfig.xml . I mean by default it should be always writing update logs
in my data directory even if I dont use updateLog parameter in config file.
Also the same config file works for solr 4.0 but not solr 4.2
I will be
Hi Gora,
Please forgive the typo. This is merely a simplified example to illustrated
the scenario (if/else and switch) we're trying to achieve; although the
values have been changed the if/else and switch statements remain as is.
The fact that the switch statement should work is the problem - it
Whopps. I made some mistakes in the previous post.
Toke Eskildsen [t...@statsbiblioteket.dk]:
Extrapolating from 1.4M documents and 180 clients, let's say that
there are 1.4M/180/5 unique terms for each sort-field and that their
average length is 10. We thus have
1.4M*log2(1500*10*8) +
Hi Team,
I am using solr for indexing data. I need some statistics information like
max , min , stddev from indexed data. I read about `SolrStatsComponent`
and I used this too.
I read this line on `apache_solr_4_cookbook.pdf`
Please be careful when using this component on the multivalued
Hi,
If you are not afraid of looking into the code, you could trace and
possibly fix this. Remember to commit a patch :)
Another (easier?) way is to compile a repeatable test and file a Jira.
Dmitry
On Tue, Apr 16, 2013 at 4:12 PM, juancesarvillalba
juancesarvilla...@gmail.com wrote:
Hi,
On Apr 17, 2013, at 9:17 AM, vicky desai vicky.de...@germinait.com wrote:
If updateLog tag is manadatory than why is it given as a parameter in
solrconfig.xml
Because its not mandatory.
- Mark
Hi
Thanks For your reply.
we will try to index the permission in solr and add the filter query and try
to get optimum(150 or 100 rows) in result from the solr.
and in future we will try with SSD as well.
Thanks to all For such a great response.
Thanks Regards
Montu v Boda
--
View this
hi Adeel,
I have use solr with maven since 2011,and my dependency is not solr but
solr-core and some other dependencies .
therefore,my project structure is just like unpack the solr.war file with out
the dir 'WEB-INF/lib'.
So I can write some code work with solr ,e.g. a listener set up system
Hi,
Although we use logical sharding, there are cases in our environment as you
described. We handle them manually:
0. prepare new version of a document
1. remove the old version of the document
2. post it and commit
With logical sharding it is relatively easy, but we do need to store
location
How are you searching? From WebUI Admin or from a client? If from a
client, check number of rows being returned. For example SolrNet asks
for 2 rows unless overruled (to force you being explicit about
your paging), so you could be stuck on results
serialization/deserialization. Try searching
Thanks, the earlier presentation is done with KeyNote and the later (more
animation) is done with Tumult Hype.
- Mark
On Apr 17, 2013, at 3:43 AM, Furkan KAMACI furkankam...@gmail.com wrote:
Hej Mark;
What did you use to prepare your presentation, its really nice.
2013/4/17 Furkan
updateLog is not mandatory in general for Solr, but it is mandatory for
cloud mode, right?
Solrconfig mentions solr cloud replica recovery, but doesn't explicitly
say that's a required part of cloud mode. Maybe just a little
clarification in Solrconfig would help, like solr cloud replica
okay this looks promising. I will give it a try and let you know how it
goes. Thanks
On Wed, Apr 17, 2013 at 9:19 AM, jnduan jnd...@gmail.com wrote:
hi Adeel,
I have use solr with maven since 2011,and my dependency is not solr but
solr-core and some other dependencies .
therefore,my project
Is this a bug? I can create the ticket in Jira if it is, but it's not clear
to me what should be happening.
I noticed that if it is using the value set in the home directory, but that
value does not get updated, so my imports get slower and slower.
I guess I could create a cron job to update
Thanks Erick.
Couple of Questions :
Our transaction logs are huge as we have disabled auto commit. The biggest
one is 6.1 GB.
*567M*autosuggest/data/tlog
*22M* avmediaCore/data/tlog
*388M*booksCore/data/tlog
*4.9G * books/data/tlog
*6.1G * mp3-downloads/data/tlog ( 150 % of index
Hi
You probably AND them by default. Look at your mm value of default boolean
operator setting in solrconfig.xml
http://search-lucene.com/?q=mm+default+boolean+operatorfc_project=Solr
Otis
Solr ElasticSearch Support
http://sematext.com/
On Apr 17, 2013 7:43 AM, zeroeffect
Makes sense, thanks. One more question. Shouldn't there be a mechanism to
define a default query parser?
something like (inside QParserPlugin):
public static String DEFAULT_QTYPE = default; // now it
is LuceneQParserPlugin.NAME;
public static final Object[] standardPlugins = {
On 4/17/2013 1:20 AM, Maciej Pestka wrote:
Hi,
I've configured basic authentication on tomcat my slave solr instance and it
works.
Any idea how to configure slave to replicate properly with digest
authentication?
on Solr WIKI I could find only basic authentication example:
You specify it as a default parameter for a requestHandler in your
solrconfig.xml, giving a default value for defType. Not sure that you
can set a default that will cover filter queries too.
Upayavira
On Wed, Apr 17, 2013, at 05:46 PM, Roman Chyla wrote:
Makes sense, thanks. One more question.
True, you cannot currently specify a default (other than the trick Roman showed
earlier) query parser for fq parameters. I think of the bulk of my fq's in the
form of fq={!term f=facet_field}value so setting a default term query parser
for fq's wouldn't really help me exactly, as it needs an
On 4/17/2013 10:29 AM, Umesh Prasad wrote:
We use DIH and have turned off the Auto commit because we have to sometimes
build index from Scratch (clean=true) and we not want to
Our master server sees a lot of restarts, sometimes 2-3 times a day. It
polls other Data Sources for updates which are
On Apr 17, 2013, at 1:42 PM, Shawn Heisey s...@elyograg.org wrote:
On 4/17/2013 10:29 AM, Umesh Prasad wrote:
We use DIH and have turned off the Auto commit because we have to sometimes
build index from Scratch (clean=true) and we not want to
Our master server sees a lot of restarts,
I am doing faceting on an index of 120M documents, on the field of url,
using the following two queries. Note that the only difference of the two
queries is that first one uses default facet.method, and the second one
uses face.method=enum. ( each document in the index contains a review we
please post field defination from solr schema.xml for
stats.field=login_attemptshttp://localhost:8080/solr/daycore/select?q=*:*stats=truestats.field=login_attemptsrows=0
,
it depends how you have defined stats field
On 4/17/2013 3:46 AM, J Mohamed Zahoor wrote:
Hi
I am pumping parallel select queries using CloudSolrServer.
It looks like it can handle only certain no of max connections...
my Question is,
How many concurrent queries does a CloudSolrServer can handle?
Looking into the code for 4.x
When I set distrib=false the spellchecker works perfectly. So I take it
spellchecker doesn't work in solr 4.1 in cloud mode. Does anybody know if it
works in 4.2.1?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Spellchecker-not-working-for-Solr-4-1-tp4055450p4056768.html
On 4/17/2013 11:56 AM, Mark Miller wrote:
There is one additional caveat - when you disable the updateLog, you have to
switch to MMapDirectoryFactory instead of NRTCachingDirectoryFactory. The NRT
directory implementation will cache a portion of a commit (including hard
commits) into RAM
Spellcheck is broken when using both distributed and grouping. The fix is
here: https://issues.apache.org/jira/browse/SOLR-3758 . This will be part of
4.3, which likely will be released within the next few weeks. In the mean time
you can apply the patch to 4.2 or as a workaround, re-issue a
Thank you for the response
--
View this message in context:
http://lucene.472066.n3.nabble.com/Spellchecker-not-working-for-Solr-4-1-tp4055450p4056776.html
Sent from the Solr - User mailing list archive at Nabble.com.
What are your results when using facet.method=fcs?
On Wed, Apr 17, 2013 at 12:06 PM, Mingfeng Yang mfy...@wisewindow.comwrote:
I am doing faceting on an index of 120M documents, on the field of url,
using the following two queries. Note that the only difference of the two
queries is that
: Is it possible to use multiple text files? I tried the following:
...
: But the second list, the cities, are apparently undetected, after
: restarting the tomcat and rebuilding the dictionary. Can this be done? If
: not, how would you recommend managing different dictionaries?
Skimming
Does Solr 3.6 has facet.method=fcs? I tried anyway, and got
ERROR 500: GC overhead limit exceeded java.lang.OutOfMemoryError: GC
overhead limit exceeded.
On Wed, Apr 17, 2013 at 12:38 PM, Timothy Potter thelabd...@gmail.comwrote:
What are your results when using facet.method=fcs?
On
: Side issue: shouldn't that be setMaxConnectionsPerHost instead of including
: the word Default? If there's no objection, I would plan on adding the renamed
: method and using a typical deprecation procedure for the old one.
I think the name comes from the effect it has on the underlying
I've just started to read about Solr caching. I want to learn one thing.
Let's assume that I have given 4 GB RAM into my Solr application and I have
10 GB RAM. When Solr caching mechanism starts to work, does it use memory
from that 4 GB part or lets operating system to cache it from 6 GB part of
On Apr 17, 2013, at 3:09 PM, Furkan KAMACI wrote:
I've just started to read about Solr caching. I want to learn one thing.
Let's assume that I have given 4 GB RAM into my Solr application and I have
10 GB RAM. When Solr caching mechanism starts to work, does it use memory
from that 4 GB part
I see that while merging indexes (I mean optimizing via admin gui), my Solr
instance can still response select queries (as well). How that querying
mechanism works (because merging not finished yet but my Solr instance
still can return a consistent response)?
On 4/16/2013 2:39 PM, Jie Sun wrote:
?xml version=1.0 encoding=UTF-8 ?
solr persistent=true
cores adminPath=/admin/cores
core name=default instanceDir=.//
core name=413a instanceDir=.//
core name=blah instanceDir=.//
...
/cores
/solr
the command I ran was to rename from
On 4/17/2013 3:21 PM, Chris Hostetter wrote:
I think the name comes from the effect it has on the underlying HttpClient
code ... it's possible to configure a HttpConnectionManager such that it
has different number of max connections per host -- ie: host1 has max
connections of 23, host2 has max
merging indexes
The proper terminology is merging segments.
Until the new, merged segment is complete, the existing segments remain
untouched and readable.
-- Jack Krupansky
-Original Message-
From: Furkan KAMACI
Sent: Wednesday, April 17, 2013 6:28 PM
To:
I would like to use the Query Elevation Component. As I understand it only
elevates based on term. I would also like it to consider the list of fq
parameters. Well really just one fq parameter. ex (fq=siteid:4) since I used
the same solr index for many sites. Is something like this available
: Is this a bug? I can create the ticket in Jira if it is, but it's not clear
: to me what should be happening.
It certainly sounds like it, but i too am not certian what is actaully
suppose to be happening here, or why it changed.
Please open a jira with the details of your DIH requestHandler
thanks Shawn for filing the issue.
by the way my solrconfig.xml has:
dataDir${MYSOLRROOT:/mysolrroot}/messages/solr/data/${solr.core.name}/dataDir
For now I will have to shutdown solr and write a script to modify the
solr.xml manually and rename the core data directory to new one.
by the way
Perhaps you should describe the problem you are tryin to solve. There
may be other ways to solve it.
Upayavira
On Thu, Apr 18, 2013, at 01:08 AM, davers wrote:
I would like to use the Query Elevation Component. As I understand it
only
elevates based on term. I would also like it to consider
On 4/17/2013 7:07 PM, Jie Sun wrote:
thanks Shawn for filing the issue.
by the way my solrconfig.xml has:
dataDir${MYSOLRROOT:/mysolrroot}/messages/solr/data/${solr.core.name}/dataDir
For now I will have to shutdown solr and write a script to modify the
solr.xml manually and rename the
71 matches
Mail list logo