Hi,
I want to use SolrCloud in a more federated mode rather than
replication. The failover is nice, but I am more interested in
increasing capacity of an index through horizontal scaling (shards).
How can I configure shards such that they retain their own documents and
don't replicate (or
Thanks for the reply Mark.
I did example A. One of the instances had zookeeper. If I shut down the
other instance, all searches on the other (running) instance produced an
error in the browser.
I don't have the error handy but it was one line. Something like missing
shard in collection IIRC.
in an app server?
Any tips appreciated!
Darren
On 01/30/2012 06:58 PM, Darren Govoni wrote:
Hi,
Is there any issue with running the new SolrCloud deployed as a war
in another app server?
Has anyone tried this yet?
thanks.
Hi,
Is there any issue with running the new SolrCloud deployed as a war
in another app server?
Has anyone tried this yet?
thanks.
'the leafs'),
then perhaps your hierarchy needs refactoring for redundancy?
Happy to help with more details.
Darren
On 01/24/2012 11:22 AM, Yuhao wrote:
Darren,
One challenge for me is that a term can appear in multiple places of the hierarchy. So it's not safe to
simply use the term
On Mon, 23 Jan 2012 14:33:00 -0800 (PST), Yuhao nfsvi...@yahoo.com
wrote:
Programmatically, something like this might work: for each facet field,
add another hidden field that identifies its parent. Then, program
additional logic in the UI to show only the facet terms at the currently
.
On 01/18/2012 11:08 PM, Steven A Rowe wrote:
Hi Darren,
I think it's rare because it's rare: if this were found to be a useful
advertising space, rare would cease to be descriptive of it. But I could be
wrong.
Steve
-Original Message-
From: Darren Govoni [mailto:dar...@ontrenet.com
Agree. There's probably some unwritten etiquette there.
On 01/19/2012 05:52 AM, Patrick Plaatje wrote:
Partially agree. If just the facts are given, and not a complete sales talk
instead, it'll be fine. Don't overdo it like this though.
Cheers,
Patrick
2012/1/19 Darren Govonidar
Try changing the URI/HTTP/GET size limitation on your app server.
On 01/18/2012 05:59 PM, Daniel Bruegge wrote:
Hi,
I am just wondering how I can 'grow' a distributed Solr setup to an index
size of a couple of terabytes, when one of the distributed Solr limitations
is max. 4000 characters in
And to be honest, many people on this list are professionals who not
only build their own solutions, but also buy tools and tech.
I don't see what the big deal is if some clever company has something of
imminent value here to share it. Considering that its a rare event.
On 01/18/2012 08:28
First query will cause the index caches to be warmed up and this is why
the first query takes some time.
You can prewarm the caches with a query (when solr starts up) of your
choosing in the config file. Google around the SolrWiki on cache/index
warming.
hth
hi,
I had an solr3.3 index of
What index analyzer or field settings are you using for that field? Sounds
like it might be tokenized. Maybe look at alternatives that don't tokenize
fields. Just a guess here though. Good luck.
On Fri, 13 Jan 2012 09:04:00 -0500, Christopher Gross cogr...@gmail.com
wrote:
My index has a
Maybe also have a look at these links.
http://www.hathitrust.org/blogs/large-scale-search/performance-5-million-volumes
http://www.hathitrust.org/blogs/large-scale-search
On Fri, 13 Jan 2012 15:49:06 +0100, Daniel Brügge dan...@bruegge.eu
wrote:
Hi,
it's definitely a problem to store 5TB in
I will. Thanks.
Hi Darren,
Would you please tell us all the parameters that you are sending in the
request? You can use the parameter echoParams=all to get the list in the
output.
Thanks,
*Juan*
On Mon, Jan 2, 2012 at 8:37 PM, Darren Govoni dar...@ontrenet.com wrote:
Forgot to add
.
Thanks,
Darren
with the query.
Is it a bug?
Darren
On 01/02/2012 04:39 PM, Juan Grande wrote:
Hi Darren,
This is the expected behavior. Have you tried setting the
hl.requireFieldMatch parameter to true? See:
http://wiki.apache.org/solr/HighlightingParameters#hl.requireFieldMatch
*Juan*
On Mon, Jan 2, 2012 at 10
for it.
If there are no query term matches for the df, then getting ALL the
field terms highlighted (as it does now) is rather perplexing feature.
Darren
On 01/02/2012 06:28 PM, Darren Govoni wrote:
Hi Juan,
Setting that parameter produces the same extraneous results. Here is
my query:
{!lucene q.op
I see what you are asking. This is an interesting question. It seems
inefficient for Solr to apply the
requested rows to all shards only to discard most of the results on merge.
That would consume lots of resources not used in the final result set.
On 12/19/2011 04:32 PM, ku3ia wrote:
Uhm,
How do you determine a duplicate?
Solr has de-duplication built in and also you may consider hashing
documents on some fields to create a consistent doc id that would be the
same for same documents and let Solr re-write them. Either approach would
reduce or eliminate the possibility of duplicates
I read this response, but it lacks the quoted text so I have no clue what
your advice is in reference to. This makes it hard for others to benefit
from the advice. Just a thought.
Go ahead with SOLR based text search. Thats what it is meant for and does
it
great.
Regards
Pravesh
--
View
This would seem to indicate that you are using a whitespace analyzer on
the default search field. I believe other analyzers will properly tokenize
around the comma.
same problem with Solr 4.0
2011/12/8 elisabeth benoit elisaelisael...@gmail.com
Hello,
I'm using Solr 3.4, and I'm having a
Yes. That's what I would expect. I guess I didn't understand when you said
The facet counts are the counts of the *values* in that field
Because it seems its the count of the number of matching documents
irrespective
if one document has 20 values for that field and another 10, the facet
count
Sorry to jump into this thread, but are you saying that the facet count is
not # of result hits?
So if I have 1 document with field CAT that has 10 values and I do a query
that returns this 1 document with faceting, that the CAT facet count will
be 10 not 1? I don't seem to be seeing that
Solr doesn't support these kind of business rules inside of it. Not
intended to.
Thusly, you will have to manage them externally. What's unstable about a
cronjob?
You will have to run your business rules externally, then apply the
necessary
field updates to the documents in Solr, ensuring the
Monitoring this thread make me ask the question of whether there are
standardized performance benchmarks for Solr.
Such that they are run and published with each new release. This would
affirm its performance under known circumstances,
with which people can try in their own environments and
Any suspicous activity in the logs? what about disk activity?
On 11/29/2011 05:22 PM, Pawel Rog wrote:
On Tue, Nov 29, 2011 at 9:13 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
Let's back up a minute and cover some basics...
1) You said that you built a brand new index on a brand new
).
I use Solr in my project and am hoping to gain some of the distributed
features mention in ES with Solr.
Does anyone know what the similarities/differences might be?
Many thanks,
Darren
Thanks Erick. I'll check that and report any oddities I find.
Something you may have overlooked is that with debugQuery=on,
way down near the end of the list is a timing section, looks like
this:
lst name=timing
that lists the time each of the components takes to do its thing,
things like
?
Server has 15GB RAM. Responses are not unreasonably large. I use paging.
Many thanks,
Darren
Yeah, I figured that. I guess I will have to dig deeper because the data
transferred is only about 60k and all local on one machine. Shouldn't take
13 seconds for that.
I am running Solr 3.4 in a glassfish domain for
itself. I have about 12,500 documents with a 100 or so
fields with the
Another interesting note. When I use the Solr Admin screen to perform the
same query, it doesn't take as long. Only when using SolrJ and Http Solr
server connection.
I am running Solr 3.4 in a glassfish domain for
itself. I have about 12,500 documents with a 100 or so
fields with the works
I have rows=10. Good idea, I will set it to 1.
Should I expect a constant return time with rows=10 despite the # of total
found documents since they aren't returned?
Another thing to note is that QTime does not include the time it takes to
retrieve the stored documents to include in the
My interpretation of your results are that your FQ found 1281 documents
with 1213206 value in sou_codeMetier field. Of those results, 476 also
had 1212104 as a value...and so on. Since ALL the results will have
the field value in your FQ, then I would expect the other values to
be equal or less
Interesting Yury. Thanks.
On 10/20/2011 11:00 AM, Yury Kats wrote:
On 10/19/2011 5:15 PM, Darren Govoni wrote:
Hi Otis,
Yeah, I saw page, but it says for merging cores, which I presume
must reside locally to the solr instance doing the merging?
What I'm interested in doing is merging
?
Darren
). Is it still possible or did I
misread the wiki?
Thanks!
Darren
On 10/19/2011 11:57 AM, Otis Gospodnetic wrote:
Hi Darren,
http://search-lucene.com/?q=solr+mergefc_project=Solr
Check hit #1
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http
, Otis Gospodnetic wrote:
Darren,
No, that is not possible without one copying an index/shard to a single machine
on which you would then merge indices as described on the Wiki.
H, wouldn't it be nice to make use of existing replication code to make it
possible to move shards around the cluster
was referring to nodes. For all I know, those are two
different things in the new cloud design terminology (I believe they are).
I guess understanding cores vs. nodes vs shards is helpful. :)
cheers!
Darren
On 09/29/2011 12:00 AM, Pulkit Singhal wrote:
@Darren: I feel that the question itself
Agree. Thanks also for clarifying. It helps.
On 09/29/2011 08:50 AM, Yury Kats wrote:
On 9/29/2011 7:22 AM, Darren Govoni wrote:
That was kinda my point. The new cloud implementation
is not about replication, nor should it be. But rather about
horizontal scalability where nodes manage
On 09/27/2011 05:05 PM, Yury Kats wrote:
You need to either submit the docs to both nodes, or have a replication
setup between the two. Otherwise they are not in sync.
I hope that's not the case. :/ My understanding (or hope maybe) is that
the new Solr Cloud implementation will support
Hi,
Maybe you are aware of this[1] already but I'm not sure its status.
It seems to be removed currently from Solr. Also, I'm not sure if it
works cross-index, but thought you might look there.
Darren
[1] https://issues.apache.org/jira/browse/SOLR-2272
On Mon, 19 Sep 2011 22:21:17 -0700
Please see [1]
[1] https://issues.apache.org/jira/browse/SOLR-1632
On Tue, 20 Sep 2011 16:14:08 +0200, Massimo Schiavon
mschia...@volunia.com wrote:
Seems that when I submit a query in a sharded environment the idf
component of the scoring formula takes into consideration the local
terms
, paging etc?
Thanks!
Darren
Thank you. Should be awesome when its ready!
On Wed, 14 Sep 2011 10:25:26 -0400, Yonik Seeley
yo...@lucidimagination.com wrote:
On Wed, Sep 14, 2011 at 10:17 AM, dar...@ontrenet.com wrote:
Hi,
I am very excited to see this direction for Solr. I realize its early
still,
but is there any
Here's a thought.
If dist is under solr.solr.home but your lib dir is set to be
../../dist.
Wouldn't the lib dir be relative to solr.solr.home and therefore should
just be dist?
On Wed, 14 Sep 2011 07:45:45 -0700 (PDT), Xue-Feng Yang
just4l...@yahoo.com wrote:
Hi all,
I am trying set up
I also would like to know the answer to this. But my feeling is
that you can't do what you want. I also had to use the highlighting
workaround and aggregate dynamic field to accomplish the inability
of multivalued fields to accommodate it.
On Mon, 12 Sep 2011 11:44:01 -0400, Rahul Warawdekar
See http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams
On Fri, 9 Sep 2011 09:59:57 -0400, Tirthankar Chatterjee
tchatter...@commvault.com wrote:
Hi,
Is there a way that we can give an alias name for a field so that the
schema is not required to change.
Use Case: We defined the schema
It doesn't.
On 08/29/2011 01:37 PM, Mike Austin wrote:
I've been trying to follow the progress of this and I'm not sure what the
current status is. Can someone update me on what is currently in Solr4 and
does it support multi-valued location in a single document? I saw that
SOLR-2155 was not
.
Best
Erick
On Thu, Aug 25, 2011 at 10:01 PM, Darren Govoni dar...@ontrenet.com
wrote:
Hi Erick,
Sure thing.
I have a document schema where I put the sentences of that document in
a
multivalued field sentences.
I search that field in a query but get back the document results
On Thu, Aug 25, 2011 at 10:01 PM, Darren Govoni dar...@ontrenet.com
wrote:
Hi Erick,
Sure thing.
I have a document schema where I put the sentences of that document
in
a
multivalued field sentences.
I search that field in a query but get back the document results,
naturally.
I
Hi,
Is it possible to construct a query in Solr where the paged results are
matching multivalued fields and not documents?
thanks,
Darren
result)
and then do my own paging since I am only returning pages of sentences
and not the whole document.
(i.e. I don't want to page the document results).
Does this make sense? Or is there a better way Solr can accomodate this?
Much appreciated.
Darren
On 08/25/2011 07:24 PM, Erick Erickson
patent rights only last 17 years then it is public domain.
On 08/17/2011 11:05 AM, Walter Underwood wrote:
I have no plan to look at the patents, but there is some serious prior art in
faceted search. First, faceted classification for libraries was invented by S.
R. Ranganathan in 1933.
Hi,
Is it possible to restrict the /terms component output to the results
of a query?
thanks,
Darren
the parameter facet.mincount=1. It will be slower than the
TermsComponent.
On Wed, Aug 17, 2011 at 1:19 PM, Darren Govonidar...@ontrenet.com wrote:
Hi,
Is it possible to restrict the /terms component output to the results of a
query?
thanks,
Darren
Off the top of my head you maybe you can get the number of results and
then
look at the last document and check its score. I believe the results
will be ordered by score?
On 08/04/2011 05:44 PM, Kissue Kissue wrote:
Hi,
I am using Solr 3.1 with the SolrJ client library. I can see that it is
You can issue a new facet search as you drill down from your UI.
You have to specify the fields you want to facet on and they can be
dynamic.
Take a look at recent threads here on taxonomy faceting for help.
Also, look here[1]
[1] http://wiki.apache.org/solr/SimpleFacetParameters
On Tue, 5 Jul
That's a good way. How does it perform?
Another way would be to store the parent topics in a field.
Whenever a parent node is drilled-into, simply search for all documents
with that parent. Perhaps not as elegant as your approach though.
I'd be interested in the performance comparison between
Will it be possible to do spatial searches on multi-valued spatial
fields soon?
I have a latlon field (point) that is multi-valued and don't know how to
search against it
such that the lats and lons match correctly - since they are split apart.
e.g. I have a document with 10 point/latlon
)...
But I don't really know which value in Cardiologist match perfectly.
Again, I only want it to return:
Cardiologist: 3
If I searched on q=internistdefType=dismaxqf=specialties, I want the
result to be:
Internist: 1
Does this all make sense?
On 6/21/11 8:23 PM, Darren Govonidar
I once tried to load wordnet synsets as a synonym file and it was
prohibitively slow and unusable. fyi.
On 06/22/2011 12:23 PM, Robert Muir wrote:
On Wed, Jun 22, 2011 at 10:14 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de wrote:
While trying some synonyms.txt files I noticed a huge
Yeah, I agree with that last statement.
It seems to me that the use case where it _might_ matter is where
you have a query for MORE than one.
q=cardiologist OR family
and in that case, it MIGHT be useful to separate the facets
in a XOR sense where you don't get cross-pollution. But
the
So are you saying that for all results for cardiologist,
you don't want facets not matching Cardiologist to be
returned as facets?
what happens when you make q=specialities:Cardiologist?
instead of just q=Cardiologist?
Seems that if you make the query on the field, then all
your results will
Thats pretty awesome. Thanks Renaud!
On Tue, 2011-05-31 at 22:56 +0100, Renaud Delbru wrote:
Hi,
have a look at the flexible query parser of lucene (contrib package)
[1]. It provides a framework to easily create different parsing logic.
You should be able to access the AST and to modify
the query logic cannot be
semantically altered. (e.g. AND, OR, paren's etc) so it must be parsed
first.
How can this be done with SolrJ?
thanks for any tips.
Darren
with SolrJ?
thanks for any tips.
Darren
Ludovic,
Thank you for this tip, it sounds useful.
Darren
On Tue, 2011-05-31 at 14:38 -0700, lboutros wrote:
Darren,
you can even take a look to the DebugComponent which returns the parsed
query in a string form.
It uses the QueryParsing class to parse the query, you could perhaps do
You should be able to retrieve the snippets in your search engine and
combine or format
them however you like before returning the results to your client Right?
So in your middle tier, you invoke solr with a search, get the results,
retrieve the snippets,
iterate over them and format to your
Can I ask if you do any faceted or MLT type searches? Do those even work
across shards?
On Fri, 2011-05-13 at 08:59 -0600, Shawn Heisey wrote:
Our system, which I am not at liberty to disclose, consists of 55
million documents, mostly photos and text, but video is starting to
become
Thanks for the info Shawn. I'll look into the issue as well.
On Fri, 2011-05-13 at 12:34 -0600, Shawn Heisey wrote:
On 5/13/2011 11:09 AM, Darren Govoni wrote:
Can I ask if you do any faceted or MLT type searches? Do those even work
across shards?
We currently aren't using facets
I have the same questions.
But from your message, I couldn't tell. Are you using Solr now? Or some
other indexing server?
Darren
On Thu, 2011-05-12 at 09:59 -0700, atreyu wrote:
Hi,
I have about 300 million docs (or 10TB data) which is doubling every 3
years, give or take. The data
Ok, thanks. Yeah, I'm in the same boat and want to know what others have
done with document numbers that large.
I know there is SolrCloud that can federate numerous solr instances and
query across them, so I suspect some solution with 100's of M's of docs
would require a federation.
If anyone
I think what's being asked is obvious, in that, they want to modify the
sorted relevancy of the results of MLT. Where, instead of (or in addition
to) sorting by the mlt score, some modified function/subquery can be used
to further sort the results.
One example. You run a MLT query against a
What about standing up a VM (search appliance that you would make) for
each client?
If there's no data sharing across clients, then using the same solr
server/index doesn't seem necessary.
Solr will easily meet your needs though, its the best there is.
On Wed, 2011-02-09 at 14:23 -0500, Greg
Hi,
Is there a way to get the relevant nearby words in the entire index
given a single word?
I want to know all the relevance ranked words before and after the queried
word.
thanks for any tips.
Darren
Hi,
I built the trunk and deploy the war, but cannot access the admin URL
anymore.
Error loading class
'org.apache.solr.highlight.MultiColoredScoreOrderFragmentsBuilder
This class seems to be missing?
thanks,
Darren
Take a look at term proximity and phrase query.
http://wiki.apache.org/solr/SolrRelevancyCookbook
Hey guys,
I have a solr index where i store information about experts from
various fields. The thing is when I search for channel marketing i
get people that have the word channel or marketing
Hi Koji,
I tried to apply your patch to the 1.4.0 tagged branch, but it didn't
take completely.
What branch does it work for?
Darren
On Thu, 2010-10-21 at 23:03 +0900, Koji Sekiguchi wrote:
(10/10/21 20:33), dar...@ontrenet.com wrote:
Hi,
Does the latest Solr provide an explanation
Hi,
Does the latest Solr provide an explanation for results returned by MLT?
I want to get the interesting terms for each result that overlap with the
source document. This set of terms will vary from result to result
possibly.
Thanks!
Darren
Thank you!
On Thu, 2010-10-21 at 23:03 +0900, Koji Sekiguchi wrote:
(10/10/21 20:33), dar...@ontrenet.com wrote:
Hi,
Does the latest Solr provide an explanation for results returned by MLT?
No, but there is an open issue:
https://issues.apache.org/jira/browse/SOLR-860
Koji
Does the spatial constraints for laton types work for multivalued latlon
fields? Is there an example of it? using a field conjunction with
operators didn't work, last I checked.
On Wed, Oct 13, 2010 at 7:28 AM, PeterKerk vettepa...@hotmail.com wrote:
Hi,
Thanks for the quick reply :)
I
in
the query?
thank you, I will read the links again,
Darren
On Thu, 2010-07-22 at 10:15 +0200, Stanislaw Osinski wrote:
Hi,
I am attempting to cluster a query. It kinda works, but where my
(regular) query returns 500 results the cluster only shows 1-10 hits for
each cluster (5 clusters). Never
I set the rows=50 on my clustering URL in a browser and it returns more.
In my SolrJ, I used ModifiableSolrParams and I set (rows,50) but it
still returns less than 10 for each cluster.
Is there a way to set rows wanted with ModifiableSolrParams?
thanks and sorry for the double post.
Darren
Yeah, my results count is 151 and only 21 documents appear in 6
clusters.
This is true whether I use URL or SolrJ.
When I use carrot workbench and point to my Solr using local clustering,
the workbench
has numerous clusters and all documents are placed
On Thu, 2010-07-22 at 18:06 +0200,
cluster.
thanks,
Darren
Hi,
What could cause a facet query on a field (say 'name') differ in count
from a basic query on the field using the same value?
e.g
name:'Darren'
If there are 10 documents that match this, the facet count should be 10
for 'Darren', and I should get 10 results if I query on the field
Hi,
I really think there is something not quite right going on here
after much study. Here is my findings.
Using MLT, I get terms that appear to be long concatenations of words
that are space delimited in the original text.
I can't think of any reason for these sentence-like terms to exist
Jan,
Looks interesting. I will try this.
Thanks!
Darren
On Mon, 2010-06-28 at 19:54 +0200, Jan Høydahl / Cominvent wrote:
Hi,
You might also want to check out the new Lucene-Hunspell stemmer at
http://code.google.com/p/lucene-hunspell/
It uses OpenOffice dictionaries with known stems
.
Is this correct or acceptable behavior? Previous discussions here on
stemming, I was told its ok as long as all the words reduce
to the same stem, but when different words reduce to the same stem it
seems to affect search results in a bad way.
Darren
idea?
Darren
Hi Darren,
You might want to look at the KStemmer
(http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem)
instead of the standard PorterStemmer. It essentially has a 'dictionary'
of exception words where stemming stops if found, so in your case
president won't
I'm using SOLR 1.4 with a few multi-cores, running under a Tomcat 6
environment. I'm using the web services to pass xml documents for adding
records with no problem, using a URL on my development machine of
http://localhost:8080/Solr/product/update/;
I've tried implementing an XML-based delete,
That does work. I re-checked my code, and there was a bug which submitted
an empty string as the xml instead of the generated delete command. With
this fixed, it seems to work without a problem.
On Mon, Jun 21, 2010 at 1:21 AM, Ahmet Arslan iori...@yahoo.com wrote:
How are you submitting
appreciated!
Darren
It works! Thanks Sascha. I swear I tried that combination. Hehe.
On Sat, 2010-06-19 at 21:19 +0200, Sascha Szott wrote:
Hi Darren,
try mlt.fl=field1 field2
Best,
Sascha
Darren Govoni wrote:
Hi,
I read the wiki and tried about a dozen variations such as:
...mlt.fl
Hi,
I am using a recent nightly build of Solr with no significant schema
mods. I index a couple documents and view the TFV's in this query.
stemming (e.g.requir? require.)
This seems buggy to me. Are these correct? If so, how can I sort out the
legit terms from these messy ones?
thanks for any tips!
Darren
On Fri, 2010-06-18 at 15:33 -0400, Darren Govoni wrote:
Hi,
I am using a recent nightly build of Solr with no significant schema
Thanks for the explanation Chris. I'll try it but the term
lst
name=queriesandreadmoreresultseventhoughthisexampleissimpleconsidercaseswheretherear
strikes me as not very legitimate and the source text is just space
bounded words so even if its doing what it is supposed to, I'm not sure
this
document, the relevant line the
term seems to be related to. Just a sentence with no specific
boundaries.
...perform more queries and read more results. Even though this example
is simple, consider cases where there are intersections between
thousands ...
Maybe I need to indicate tokenized?
Darren
From what I've seen so far, using separate fields for latitude and
longitude, especially with multiple values of each, does not work correctly
in all situations.
The hole in my understanding is how Solr knows how to pair a latitude and
longitude field _back_ into a POINT.
I can say that it
From what I can tell, the spatial stuff is incomplete and not entirely
functional at this point.
In the trunk I get the same error, but also I get results no matter what
the distance is, so it was broke as of last week for me.
On Wed, 2010-06-09 at 23:18 -0700, nickdos wrote:
I'm running the
101 - 200 of 242 matches
Mail list logo