If you are using ZK-coordinating Solr (SolrCloud - you need 4.0+) you
can maintain a in-memory always-up-to-date data-structure containing the
information - ClusterState. You can get it through CloudSolrServer og
ZkStateReader that you connect to ZK once and it will automatically
update the
Hi Otis,
thanks for your reply.
When I disable grouping, everything is fine!
Also, when I limit the rows to return with the rows= Parameter I don't run
into memory problems.
Since I still get the information about the hits:
int name=matches3793/int
int name=ngroups2175/int
I was wondering if
I'm having a problem using a property file in my data-import.xml file.
My aim is to not hard code some values inside my xml file, but rather
reusing the values from a property file. I'm using multicore and some of
the values are being changed from time to time and I do not want to change
them in
Thanks Per.
I'm currently not using SolrCloud but that's a good tip to keep in mind.
Thanks,
Shahar.
-Original Message-
From: Per Steffensen [mailto:st...@designware.dk]
Sent: Thursday, January 10, 2013 10:02 AM
To: solr-user@lucene.apache.org
Subject: Re: CoreAdmin STATUS performance
HI
I use solr 4.0 and have documents which have a timestamp field.
For a given query i know how to boost the latest documents and show them up
first.
But I have a requirement to show **only** the latest documents and the stats
along with it..
If i use boosting..the older documents still
Hi,
Our current architecture is as follows ,
- Single server [ On which we do both Indexing and Searching]
- Solr version 3.6.1 Multicores
- We have several small big indexes as cores within a webapp
- Our Indexing to the individual cores happen via an index queue ,due to
which
Hi,
Can you sort on the timestamp field (descending) and take only the top most
row (rows=1)?
Or are you saying that you want a query such that it matches only the most
recent one?
On Thu, Jan 10, 2013 at 5:27 PM, jmozah jmo...@gmail.com wrote:
HI
I use solr 4.0 and have documents which
Lets say i got one collection with 3 shards. Every shard contains indexed
data.
I want to unload one shard. Is there any way for data from unloaded shard to
be not lost?
How to remove shard with data withoud loosing them?
--
View this message in context:
I need a query that matches only the most recent ones...
Because my stats depend on it..
./Zahoor
HBase Musings
On 10-Jan-2013, at 3:37 PM, Naresh nnar...@gmail.com wrote:
Hi,
Can you sort on the timestamp field (descending) and take only the top most
row (rows=1)?
Or are you saying
On 1/10/13 10:09 AM, Shahar Davidson wrote:
search request, the system must be aware of all available cores in order to
execute distributed search on_all_ relevant cores
For this purpose I would definitely recommend that you go SolrCloud.
Further more we do something ekstra:
We have several
Hi,
I have a query that searches through every field to find the text 'london'
(constituencies:(london) OR label:(london) OR name:(london) OR
office:(london))
Which works fine, but when I want to filter my results. Say I want to
filter down to constituencies that exactly match 'london', but
we have a SolrCloud with 3 nodes. we add documents to leader node and use
commitwithin(100secs) option in SolrJ to add documents. AutoSoftCommit in
SolrConfig is 1000ms.
Transaction logs on replicas grew bigger than the index and we ran out of
disk space in few days. Leader's tlogs are very small
we have a SolrCloud with 3 nodes. we add documents to leader node and use
commitwithin(100secs) option in SolrJ to add documents. AutoSoftCommit in
SolrConfig is 1000ms.
Transaction logs on replicas grew bigger than the index and we ran out of
disk space in few days. Leader's tlogs are very small
Use filter queries to filter or drill down:
http://wiki.apache.org/solr/CommonQueryParameters#fq
Also consider using dismax/edismax queries, which are designed to match on
any of multiple fields.
Also be careful to put a space between each left parenthesis and field name
since there is a
Yes, you must issue hard commits. You can use autocommit and use
openSearcher=false. Autocommit is not distributed, it has to be configured
in every node (which will automatically be, because you are using the exact
same solrconfig for all your nodes).
Other option is to issue an explicit hard
Gaaah, thought I looked at which version that page was from, obviously
screwed it up. Thanks for the correction.
Erick
On Wed, Jan 9, 2013 at 12:12 PM, Smiley, David W. dsmi...@mitre.org wrote:
Erick,
Alex asked about Solr 4 spatial, and his use-case requires it because
he's got
Thanks, I've tried doing
lst name=params
str name=wtxml/str
str name=fq
(constituencies:(ian paisley) OR label:(ian paisley) OR office:(ian
paisley))
/str
str name=qname_long:paisley, ian/str
/lst
/lst
result name=response numFound=0 start=0/
and
str name=fq+ian +paisley paisley, ian/str
str
No, you're pretty much on track. You can also just include the field
multiple times if you want, itemModelNoExactMatchStr:123-4567 OR
itemMOdelNoExactMatchStr:345-034985
But
itemModelNoExactMatchStr:(123-4567 OR 345-034985) works just as well and is
more compact.
15 terms is actually quite short
There's no really easy way that I know of. I've seen several approaches
used though
1 do it in the UI. This assumes that your users aren't typing in raw
queries, they're picking field names from a drop-down or similar. Then the
UI maps the chosen fields into what the schema defines.
2 Do it in
Hi,
I'm quite new with Solr.
I've searched for this but found no answer to my needs, which seems to
be quite common though.
Let's say I have three entities in my database (I have more than that
but it's for the sake of simplicity) :
File *--* Procedure *--* Organ
1 n
Hi Per,
Thanks for your reply!
That's a very interesting approach.
In your system, how are the collections created? In other words, are the
collections created dynamically upon an update (for example, per new day)?
If they are created dynamically, who handles their creation (client/server)
It may still be related. Even a non empty index can have no versions (eg one
that was just replicated). Should behave better in this case in 4.1.
- Mark
On Jan 10, 2013, at 12:41 AM, Zeng Lames lezhi.z...@gmail.com wrote:
thanks Mark. will further dig into the logs. there is another problem
I'm having a problem using a property file in my data-import.xml file.
My aim is to not hard code some values inside my xml file, but rather
reusing the values from a property file. I'm using multicore and some of
the values are being changed from time to time and I do not want to change
them in
Setup hard auto commit with openSeacher=false. I would do it at least once a
minute. Don't worry about the commit being out of sync on the different nodes -
you will be using soft commits for visibility. The hard commits will just be
about relieving the pressure on the tlog.
- Mark
On Jan 10,
Not sure if I fully understood that, but it seems to be that you are
currently indexing file with extra child info, when you want to be
indexing organs with extra parent info.
Have you tried inverting the order of your nested entities (organ as an
outer one, etc) and seeing if that's better for
dataimport.properties is for DIH to store it's own properties for delta
processing and things. Try solrcore.properties instead, as per recent
discussion:
http://lucene.472066.n3.nabble.com/Reading-database-connection-properties-from-external-file-td4031154.html
Regards,
Alex.
Personal blog:
Alexandre Rafalovitch a écrit :
Not sure if I fully understood that, but it seems to be that you are
currently indexing file with extra child info, when you want to be
indexing organs with extra parent info.
Given that File with file_ref = 1,
it has two procedures :
- proc_ref = 1,
- proc_ref
And you don't need to open a searcher (openSearcher=false) because
you've got caches built up already alongside the in-memory NRT segment
which you can continue to use once the hard commit has happened? Is that
correct?
(sorry for hijacking the thread - hopefully it is somewhat relevant)
Hi,
Just add the required query clause with the appropriate range on that
timestamp field you have to your existing query.
Otis
Solr ElasticSearch Support
http://sematext.com/
On Jan 10, 2013 5:55 AM, jmozah jmo...@gmail.com wrote:
I need a query that matches only the most recent ones...
Hi,
There may be a slicker way, but one way is to take an index
snapshot/backup before unloading. Search recent messages on this list for
pointers.
Otis
Solr ElasticSearch Support
http://sematext.com/
On Jan 10, 2013 5:18 AM, mizayah miza...@gmail.com wrote:
Lets say i got one collection
Hi,
You are going in the right direction and your assumptions are correct. In
short, if the performance hit is too big then you simply need more ec2
instances (some have high cpu, some memory, some disk IO ... pick wisely).
Otis
Solr ElasticSearch Support
http://sematext.com/
On Jan 10, 2013
On 1/10/2013 2:09 AM, Shahar Davidson wrote:
As for your first question, the core info needs to be gathered upon every
search request because cores are created dynamically.
When a user initiates a search request, the system must be aware of all
available cores in order to execute distributed
You're using query and filter query backwards - the query is what you are
looking for (the OR), while the filter query is the constraint on the
query - the drill down.
-- Jack Krupansky
-Original Message-
From: Michael Jones
Sent: Thursday, January 10, 2013 7:38 AM
To:
Thanks Alexandre!
I followed your example and created a solrcore.properties in
solr.home/conf/solrcore.properties.
I created a symlink in my core/conf to the solrcore.properties file, but I
can't read the properties.
My properties file:
username=myusername
password=mypassword
My
The collections are created dynamically. Not on update though. We use
one collection per month and we have a timer-job running (every hour or
so), which checks if all collections that need to exist actually does
exist - if not it creates the collection(s). The rule is that the
collection for
I am on 3.6 and this is my setup:
Properties file under solr.home, so right under /jetty/solr
solr.xml modified as follows:
core name=corename instanceDir=instancedir
properties=../solrcore.properties /
http://wiki.apache.org/solr/CoreAdmin#property - the path is relative to
instancedir
If you do a standard unload, it won't remove any of the on disk data. You have
to explicitly ask for that. So you can do a vanilla unload and pull that core
out of rotation - later you can recreate the core with the same parameters it
had, and it will come back with the same data it had.
-
There is no need to open a Searcher because you are controlling visibility
through the faster 'soft' commit. That will reopen the reader from the
IndexWriter. Because of that, there is no reason to do a heavy, non NRT
Searcher reopen on hard commits. Essentially, the hard commit becomes simply
That's great Mark. Thx. One final question... all the stuff to do with
autowarming and static warming of caches - I presume all of that
configuration is still relevant (if less so) as you still need to warm
caches on a soft commit, even if those caches are much smaller than they
would be
Why do you want to unload one shard of a collection? Doing so would
render your collection incomplete and therefore non-functional. It'd
help to understand a bit more what you're trying to achieve.
Upayavira
On Thu, Jan 10, 2013, at 10:17 AM, mizayah wrote:
Lets say i got one collection with 3
I recently started working with the clustering plugin on solr 4.
I assigned a requsthandler to clustering: /clustering and got the following
errors
lazy loading errororg.apache.solr.common.SolrException: lazy loading error
at
On Thu, Jan 10, 2013 at 9:37 AM, Bruno Dusausoy bdusau...@yp5.be wrote:
Have you tried inverting the order of your nested entities (organ as an
outer one, etc) and seeing if that's better for your needs?
Interesting idea, never thought about that.
But unfortunately my real case is much much
Alexandre,
Unfortunately this is poorly documented and it takes a little trian-and-error
to figure out what is going on. I believe the order is this:
1. Get data from the EntityProcessor (in your case, MailEntityProcessor)
2. Run transformers on the data.
3. Run and post-transform operations
Is the registration of the search component failing earlier in your
logs?
Upayavira
On Thu, Jan 10, 2013, at 04:23 PM, obi240 wrote:
I recently started working with the clustering plugin on solr 4.
I assigned a requsthandler to clustering: /clustering and got the
following
errors
lazy
Also, you have enable=${solr.clustering.enabled:false} in there. Are
you setting solr.clustering.enabled=true anywhere? I'd just remove that
bit, you clearly want it enabled.
Upayavira
On Thu, Jan 10, 2013, at 04:23 PM, obi240 wrote:
I recently started working with the clustering plugin on solr
I think it really depends - if you are gong for very fast visibility, your
going to spend a bunch of time warming, and then just throw it out before it
even gets much if any reuse. For very fast visibility turnaround, I suspect you
don't want to do any warming. I think it depends on many
On 1/9/2013 8:54 PM, Mark Miller wrote:
I'd put everything into one. You can upload different named sets of config
files and point collections either to the same sets or different sets.
You can really think about it the same way you would setting up a single node
with multiple cores. The main
I've tried both ways and I still get zero results with this.
Even though name_long:paisley, ian will return results.
str name=fqname_long:paisley, ian/str
str name=q
(constituencies:(ian paisley) OR label:(ian paisley) OR office:(ian
paisley))
/str
On Thu, Jan 10, 2013 at 3:27 PM, Jack
Thanks,Otis..
But then what exactly is the advantage for a master slave architecture
for multicore ,when replication has the same effect as that of a commit
and if I am going to have worse performance by moving to master/ slave over
a single server with sequential indexing?Am I missing
If the fields you're querying are of type String (1 token), then you need to
escape the whitespace with a backslash, like this:
label:ian\ paisley
If they are of type Text (multiple tokens), sometimes you need to explicitly
insert AND between each token, either with:
label:(ian AND paisley)
Heh, the it depends answer :-)
Thanks for the clarification.
Upayavira
On Thu, Jan 10, 2013, at 05:01 PM, Mark Miller wrote:
I think it really depends - if you are gong for very fast visibility,
your going to spend a bunch of time warming, and then just throw it out
before it even gets much
My fields are
field name=id type=string indexed=true stored=true required=true
multiValued=false /
field name=name type=text_general indexed=true stored=true
multiValued=true/
field name=name_long type=string indexed=true stored=true
multiValued=true/
field name=type type=string indexed=true
In the end, the best advice is try it.
You'll save the effort of indexing with this master/slave setup, but
you'll still need to warm your caches on each slave, which is a
reasonable portion of the work done on a commit. However, with a
master/slave setup, you get the option to go to two slaves,
I downloaded the latest from solr.
I applied a patch
cd to solr dir
and I try ant dist
I get these ivy errors
ivy-availability-check:
[echo] Building analyzers-phonetic...
ivy-fail:
[echo] This build requires Ivy and Ivy could not be found in your
ant classpath.
[echo]
str name=fqname_long:paisley, ian/str
str name=q
*:* OR (constituencies:(ian paisley) OR label:(ian paisley) OR office:(ian
paisley))
/str
/lst
Does actually give me incorrect results for other queries. :(
On Thu, Jan 10, 2013 at 7:42 PM, Michael Jones michaelj...@gmail.comwrote:
My fields
I notice here that both constituencies and office are type string, so
these probably have only 1 token. In this case, you need to search with the
whitespace escaped with a backslash.
Besides this, I'm not entirely sure what more to tell you. You're going to
have to verify that some documents
I have a multi-threaded application using solrj 4. There are a maximum of 25
threads. Each thread creates a connection using HttpSolrServer, and runs one
query. Most of the time this works just fine. But occasionally I get the
following exception:
Jan 10, 2013 9:29:07 AM
Ok, I think I see where the problem is coming from.
I do a search for 'ian paisley' over all four fields and get 900 results.
And in my returned facets there is 'paisley, ian' in the name facet. If you
only search for his name, name_long:paisley, ian - you get 200 results.
But if someone does a
Tim -
You've likely been bitten by https://issues.apache.org/jira/browse/SOLR-4286,
which is now fixed and will be in 4.1 coming up soon.
Erik
On Jan 10, 2013, at 15:03 , Timothy Potter wrote:
Hi,
Using Solr 4.0, I'm sending a partial document update request (as JSON) and
it's
Ok, the old problem was that eclipse was using a different version of ant
1.8.3.
I dropped the ivy jar in the build path and now I get these errors:
[ivy:retrieve] ERRORS
[ivy:retrieve] Server access Error: Connection timed out: connect
Hi Solr Users,
Can someone give me some good parsing rules of thumb to make the debug explain
output human readable? I found this cool site for visualizing the output but
our queries are too complex and break their parser: http://explain.solr.pl
I tried adding new lines plus indenting after
Robert -
Two options here:
- Use debug.explain.structured
http://wiki.apache.org/solr/CommonQueryParameters#debug.explain.structured
- Use wt=rubyindent=on and it'll come out in an indented fashion browser
friendly manner, but even in XML it should come out with whitespace and
newlines
We are in the midst of upgrading from Solr 3.6 to Solr 4.0 and have
encountered an issue with the method the SnapPuller now uses to determine
if a new directory is needed when fetching files to a slave from master.
With Solr 3.6, our reindexing process was:
On master:
1. Reindex in a separate
On Jan 10, 2013, at 4:11 PM, Gregg Donovan gregg...@gmail.com wrote:
If the commitTimeMSec based check in Solr 4.0 is needed for SolrCloud,
It's not. SolrCloud just uses the force option. I think this other change was
made because Lucene stopped using both generation and version. I can try
I actually think the example NRT setting of one second is probably lower than
it should be.
When you think about most NRT cases, do you really need 1 second visibility?
You normally could easily handle 10, 20, 30 seconds or more. Rather than just
going for the low time, I think people should
Thanks, Mark.
The relevant commit on the solrcloud branch appears to be 1231134 and is
focused on the recovery aspect of SolrCloud:
http://svn.apache.org/viewvc?diff_format=hview=revisionrevision=1231134
I recently configured a Solr 4.0 instance with the Carrot2
ClusteringComponent handler as specified here:
http://wiki.apache.org/solr/ClusteringComponent
If I run a query like:
http://myhost/solr/mycollection/clustering?q=*.*rows=100
I receive (in XML) the document hits
Hmm…I don't recall that change. We use the force, so SolrCloud certainly does
not depend on it.
It seems like it might be a mistake - some dev code that got caught up with the
commit?
I'm a little surprised it wouldn't trip any tests…I still have to read your
first email closely though.
-
Hi,
i am trying to index multiple tables in solr. I am not sure which data
config file to be changed there are so many of them(like solr-data-config,
db-data-config)?
Also, do i have to change the id, name and desc to the name of the columns
in my table? and
how do i add solr_details field in
Hello,
I am new solr user. i updated schema.xml and solrconfig.xml now in this link
http://localhost:8983/solr/browse; auto completion is not working :( .
please tell me how should i modify that file so i can do my work(auto
completion). please reply urgent.
thanks
--
View this message in
Hi Erik,
Thanks, debug.explain.structured=true helps a lot! Could you also tell me what
these `#8;#0;#0;#0;#1; strings represent in the debug output? Are they some
internal representation of the field name/value combos in the query? They come
out like this:
PS the wt=ruby param is even better! Great tips.
-Original Message-
From: Petersen, Robert [mailto:rober...@buy.com]
Sent: Thursday, January 10, 2013 3:17 PM
To: solr-user@lucene.apache.org
Subject: RE: parsing debug output for readability
Hi Erik,
Thanks,
On Thu, Jan 10, 2013 at 6:16 PM, Petersen, Robert rober...@buy.com wrote:
Thanks, debug.explain.structured=true helps a lot! Could you also tell me
what these `#8;#0;#0;#0;#1; strings represent in the debug output?
That's internally how a number is encoded into a string (5 bytes, the
first
Thx, that makes a lot of sense.
Upayavira
On Thu, Jan 10, 2013, at 09:37 PM, Mark Miller wrote:
I actually think the example NRT setting of one second is probably lower
than it should be.
When you think about most NRT cases, do you really need 1 second
visibility? You normally could easily
thanks Mark. may I know the target release date of 4.1?
On Thu, Jan 10, 2013 at 10:13 PM, Mark Miller markrmil...@gmail.com wrote:
It may still be related. Even a non empty index can have no versions (eg
one that was just replicated). Should behave better in this case in 4.1.
- Mark
On
Looks that way ... upgrading to 4.1 fixed the issue - thanks Erik.
Cheers,
Tim
On Thu, Jan 10, 2013 at 1:10 PM, Erik Hatcher erik.hatc...@gmail.comwrote:
Tim -
You've likely been bitten by
https://issues.apache.org/jira/browse/SOLR-4286, which is now fixed and
will be in 4.1 coming up
Looks like we are talking about making a release candidate next week.
Mark
Sent from my iPhone
On Jan 10, 2013, at 7:50 PM, Zeng Lames lezhi.z...@gmail.com wrote:
thanks Mark. may I know the target release date of 4.1?
On Thu, Jan 10, 2013 at 10:13 PM, Mark Miller markrmil...@gmail.com
What modifications did you make? If you changed the field name (it's
literally 'name') in the example, you'll need to change the JavaScript
(terms.fl parameter) in the autocomplete stuff in conf/velocity/head.vm, and
also the 'name' reference in suggest.vm. That should be all you need to
Thanks for the Info . I was thinking that autocommit will be propagated
through the cloud like the explicit commit command. If it is not logged
into tlogs as mentioned we can just set autocommit and forget about it.
Thanks
Shyam
On Thu, Jan 10, 2013 at 8:15 PM, Tomás Fernández Löbbe
thanks Mark. looking forward it
On Fri, Jan 11, 2013 at 9:28 AM, Mark Miller markrmil...@gmail.com wrote:
Looks like we are talking about making a release candidate next week.
Mark
Sent from my iPhone
On Jan 10, 2013, at 7:50 PM, Zeng Lames lezhi.z...@gmail.com wrote:
thanks Mark. may
hi, guys
when u using dataimport handler to import index , it's ok.
but later when data is became bigger and bigger , i found using dataimport
handler is very slow . it costs nearly 20 mins.
So i try using another way to build index . using script to put data into csv
file then import to
You can send csv format data directly to Solr. For example:
curl http://localhost:8983/solr/update?commit=true -H
'Content-type:application/csv' -d '
id,f1_s,f2_i
d-1,abc,123
d-2,def,456
d-3,xyz,999'
Just include your dynamic field names on the first line of data.
-- Jack Krupansky
Can you omit norms and check?
On Fri, Jan 11, 2013 at 2:25 PM, wwhite1133 wwhite1...@gmail.com wrote:
While reading the lists , I found that overriding similarity to return 1
for
tf and idf i a way for this.
Is that correct or is there any other way ?
Thanks
WW
--
View this message in
Hi Romita,
The 3rd parameter should be '/solr/corename' because ping() sends request
to /solr/corename/admin/ping handler. Try, it should work.
Regards.
On 17 December 2012 03:23, Romita Saha romita.s...@sg.panasonic.com wrote:
Hi,
I open the Solr browser using the following url:
84 matches
Mail list logo