I have Solr as the backend to an ECommerce solution where the fields can
be configured to be searchable, which generates a schema.xml and loads
it into Solr.
Now we also allow to configure Solr search weight per field to affect
queries, so my queries usually look something like this:
Hi,
I have a SolrCloud setup, running 4.10.3. The setup consists of several cores,
each with a single shard and initially each shard has a single replica (so,
basically, one machine). I am using core discovery, and my deployment tools
create an empty core on newly provisioned machines.
The
.
Michael
Jani, Vrushank mailto:vrushank.j...@truelocal.com.au
2015-05-19 at 03:51
Hello,
We have production SOLR deployed on AWS Cloud. We have currently 4
live SOLR servers running on m3xlarge EC2 server instances behind ELB
(Elastic Load Balancer) on AWS cloud. We run Apache SOLR in Tomcat
or vmstat command in
Linux.)
-Michael
-Original Message-
From: Cheng, Sophia Kuen [mailto:sophia_ch...@hms.harvard.edu]
Sent: Saturday, May 02, 2015 4:13 PM
To: solr-user@lucene.apache.org
Subject: Upgraded to 4.10.3, highlighting performance unusably slow
Hello,
We recently upgraded solr
think you'll
need to run it on a separate thread.
-Michael
-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Friday, May 01, 2015 10:09 AM
To: solr-user@lucene.apache.org
Subject: Re: How to start an optimize in SolrJ without waiting for it to
complete?
On 5/1/2015
(Replicable) 1430107573634 27 -
Slave (Searching) 1429762011916 23 287.14 GB
Any idea why the replication is not triggered here or what I could try
to fix it?
Solr Version is 4.10.3.
-Michael
Not sure if there’s a better way, but this works
From: Motulewicz, Michael Motulewicz
michael.motulew...@healthsparq.commailto:michael.motulew...@healthsparq.com
Reply-To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
Hi,
I’m attempting to facet on the results of a custom solr function. I’ve been
trying all kinds of combinations that I think would work, but keep getting
errors. I’m starting to wonder if it is possible.
I’m using Solr 4.0 and here is how I am calling:
I can at least say that Solr 3.x works fine with Java 7.
-Michael
-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Monday, April 06, 2015 5:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Are there known issues with Java 8 in older versions of Solr?
On 4/6
with Solr 4.8.X
Cheers,
Michael
Glad you are sorted out!
Michael Della Bitta
Senior Software Engineer
o: +1 646 532 3062
appinions inc.
“The Science of Influence Marketing”
18 East 41st Street
New York, NY 10017
t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b
happens at query time. Not sure if that's significant for you.
Michael Della Bitta
Senior Software Engineer
o: +1 646 532 3062
appinions inc.
“The Science of Influence Marketing”
18 East 41st Street
New York, NY 10017
t: @appinions https://twitter.com/Appinions | g+:
plus.google.com
is
the query supposed to retrieve the lower-case version?
(sorry, if this sounds like a naive question, but I have a feeling that I
am missing something really basic here).
Michael Della Bitta
Senior Software Engineer
o: +1 646 532 3062
appinions inc.
“The Science of Influence Marketing”
18 East
the most.
Solr on HDFS currently doesn't have any sort of rack locality like there is
with say HBase colocated on the HDFS nodes. So you can expect that even
with Solr installed on the same nodes as your datanodes for HDFS, that
there will be remote IO.
Michael Della Bitta
Senior Software Engineer
You'll need to wrap the date in quotes, since it contains a colon:
String a = speechDate:\1992-07-10T17:33:18Z\;
-Michael
-Original Message-
From: Mirko Torrisi [mailto:mirko.torr...@ucdconnect.ie]
Sent: Tuesday, March 10, 2015 3:34 PM
To: solr-user@lucene.apache.org
Subject: Invalid
thought maybe I was the only one...
-Michael
-Original Message-
From: lei [mailto:simpl...@gmail.com]
Sent: Thursday, March 05, 2015 2:40 PM
To: solr-user@lucene.apache.org
Subject: Re: Performance on faceting using docValues
Here is the specs of some example query faceting on three
October 2014, Apache Solr™ 4.10.4 available
The Lucene PMC is pleased to announce the release of Apache Solr 4.10.4
Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted
time,
but on the other hand, you don't have to maintain a Zookeeper ensemble or
devote brain cells to understanding collections/shards/etc.
Michael Della Bitta
Senior Software Engineer
o: +1 646 532 3062
appinions inc.
“The Science of Influence Marketing”
18 East 41st Street
New York, NY 10017
Benson:
Are you trying to run independent invocations of Solr for every node?
Otherwise, you'd just want to create a 8 shard collection with
maxShardsPerNode set to 8 (or more I guess).
Michael Della Bitta
Senior Software Engineer
o: +1 646 532 3062
appinions inc.
“The Science of Influence
There is also PostingsHighlighter -- I recommend it, if only for the
performance improvement, which is substantial, but I'm not completely
sure how it handles this issue. The one drawback I *am* aware of is
that it is insensitive to positions (so words from phrases get
highlighted even in
You're probably launching Solr using the older version of Java somehow. You
should make sure your PATH and JAVA_HOME variables point at your Java 8
install from the point of view of the script or configuration that launches
Solr.
Hope that helps.
Michael Della Bitta
Senior Software Engineer
o
At the layer right before you send that XML out, have it have a fallback
option on error where it sends each document one at a time if there's a
failure with the batch.
Michael Della Bitta
Senior Software Engineer
o: +1 646 532 3062
appinions inc.
“The Science of Influence Marketing”
18 East
On 02/17/2015 03:46 AM, Volkan Altan wrote:
First of all thank you for your answer.
You're welcome - thanks for sending a more complete example of your
problem and expected behavior.
I don’t want to use KeywordTokenizer. Because, as long as the compound words
written by the user are
StandardTokenizer splits your text into tokens, and the suggester
suggests tokens independently. It sounds as if you want the suggestions
to be based on the entire text (not just the current word), and that
only adjacent words in the original should appear as suggestions.
Assuming that's
, which is not very handy.
-Michael
see the current year (2015) is hard
coded. Is there an easy way to get the current year within the function?
Messing around with NOW looks very complicated.
-Michael
(and
the content has the entities) and will be dificult add the DTD to the
content...
Thanks
- Raul
El 03/02/15 a las 17:15, Michael Sokolov escribió:
If the entities are in the content, you would need to add the DTD to
the content, not to the stylesheet. Or you could transform the
content
If the entities are in the content, you would need to add the DTD to the
content, not to the stylesheet. Or you could transform the content
converting the entities.
-Mike
On 02/03/2015 10:41 AM, Raul wrote:
Hi all!
I'm trying to use Solr with the DIH and xslt processing. All is fine
till i
If you're trying to do a bulk ingest of data, I recommend committing less
frequently. Don't soft commit at all until the end of the batch, and hard
commit every 60 seconds.
Michael Della Bitta
Senior Software Engineer
o: +1 646 532 3062
appinions inc.
“The Science of Influence Marketing”
18
Have a look here:
https://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams;
it might answer your question. Typically what I recommend is to keep
the selected facet in view, but without any limitation on its counts.
However if you want to hide it altogether, I
Please go ahead and play with autocomplete on safaribooksonline.com/home
- if you are not a subscriber you will have to sign up for a free
trial. We use the AnalyzingInfixSuggester. From your description, it
sounds as if you are building completions from a field that you also use
for
I was tempted to suggest rehab -- but seriously it wasn't clear if Nitin
meant the log files Michael is referring to, or the transaction log
(tlog). If it's the transaction log, the solution is more frequent hard
commits.
-Mike
On 2/2/2015 11:48 AM, Michael Della Bitta wrote:
If you'd like
If you'd like to reduce the amount of lines Solr logs, you need to edit the
file example/resources/log4j.properties in Solr's home directory. Change
lines that say INFO to WARN.
Michael Della Bitta
Senior Software Engineer
o: +1 646 532 3062
appinions inc.
“The Science of Influence Marketing
Good call, it could easily be the tlog Nitin is talking about.
As for which definition of high, I was making assumptions as well. :)
Michael Della Bitta
Senior Software Engineer
o: +1 646 532 3062
appinions inc.
“The Science of Influence Marketing”
18 East 41st Street
New York, NY 10017
t
We were using grouping (no DocValues, though) and recently switched to
using block-indexing and joins (see
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers).
We got a nice speedup on average (perhaps 2x faster) and an even better
improvement in
On 1/31/2015 2:47 PM, Mikhail Khludnev wrote:
Michael,
Please check two questions inlined below
Hi Mikhail,
On Sat, Jan 31, 2015 at 10:14 PM, Michael Sokolov
msoko...@safaribooksonline.com wrote:
You can only handle a single relation this way since you have to
restructure your index
If you have a finite known set of hosts, you could do something truly awful:
create a field for each distinct host and set all of them to have
value={id of the document} except for the host to which the document
belongs: assign that hostname field some constant value, like true.
Then query
Here's a Jira for this: https://issues.apache.org/jira/browse/SOLR-3031
I've attached a patch there that might be useful for you.
-Michael
-Original Message-
From: Jorge Luis Betancourt González [mailto:jlbetanco...@uci.cu]
Sent: Thursday, January 22, 2015 4:34 PM
To: solr-user
Hi,
I'm seeing some odd behavior that I am hoping someone could explain to me.
The configuration I'm using to repro the issue, has a ZK cluster and a single
Solr instance. The instance has 10 Cores, and none of the cores are sharded.
The initial startup is fine, the Solr instance comes up and
the
commits to the Solr-transaction.log?
-Clemens
-Ursprüngliche Nachricht-
Von: Michael Sokolov [mailto:msoko...@safaribooksonline.com]
Gesendet: Dienstag, 20. Januar 2015 14:54
An: solr-user@lucene.apache.org
Betreff: Re: transactions@Solr(J)
On 1/20/2015 5:18 AM, Clemens Wyss DEV wrote:
http
On 1/20/2015 5:18 AM, Clemens Wyss DEV wrote:
http://stackoverflow.com/questions/10805117/solr-transaction-management-using-solrj
Is it true, that a SolrServer-instance denotes a transaction context?
Say I have two concurrent threads, each having a SolrServer-instance pointing to the
same
You can also implement your own cursor easily enough if you have a
unique sortkey (not relevance score). Say you can sort by id, then you
select batch 1 (50k docs, say) and record the last (maximum) id in the
batch. For the next batch, limit it to id last_id and get the first
50k docs (don't
I've seen the same thing, poked around a bit and eventually decided to
ignore it. I think there may be a ticket related to that saying it's a
logging bug (ie not a real issue), but I couldn't swear to it.
-Mike
On 01/16/2015 12:36 PM, Tom Burton-West wrote:
Hello,
I'm running Solr 4.10.2
is that we can avoid the rebuilt index on every commit or
optimize.
Is this the right way ?? or any that I missed ???
Regards
dhanesh s.r
On Thu, Jan 15, 2015 at 3:20 AM, Michael Sokolov
msoko...@safaribooksonline.com wrote:
did you build the spellcheck index using spellcheck.build
at 12:47 AM, Michael Sokolov
msoko...@safaribooksonline.com wrote:
I think you are probably getting bitten by one of the issues addressed in
LUCENE-5889
I would recommend against using buildOnCommit=true - with a large index
this can be a performance-killer. Instead, build the index yourself
Yep, you'll have to increase the heap size for your Tomcat container.
http://stackoverflow.com/questions/6897476/tomcat-7-how-to-set-initial-heap-size-correctly
Michael Della Bitta
Senior Software Engineer
o: +1 646 532 3062
appinions inc.
“The Science of Influence Marketing”
18 East 41st
As a foolish dev (not malicious I hope!), I did mess around with
something like this once; I was writing my own Codec. I found I had to
create a file called META-INF/services/org.apache.lucene.codecs.Codec in
my solr plugin jar that contained the fully-qualified class name of my
codec: I
I think you are probably getting bitten by one of the issues addressed
in LUCENE-5889
I would recommend against using buildOnCommit=true - with a large index
this can be a performance-killer. Instead, build the index yourself
using the Solr spellchecker support (spellcheck.build=true)
if
there are any errors on the Oracle side?
Michael Della Bitta
Senior Software Engineer
o: +1 646 532 3062
appinions inc.
“The Science of Influence Marketing”
18 East 41st Street
New York, NY 10017
t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com
Another way of doing it is by setting the -Dhost=$hostname parameter when
you start Solr.
Michael Della Bitta
Senior Software Engineer
o: +1 646 532 3062
appinions inc.
“The Science of Influence Marketing”
18 East 41st Street
New York, NY 10017
t: @appinions https://twitter.com/Appinions
It looks like this is a good starting point:
http://wiki.apache.org/solr/SolrConfigXml#codecFactory
-Mike
On 01/12/2015 03:37 PM, Tom Burton-West wrote:
Hello all,
Our indexes have around 3 billion unique terms, so for Solr 3, we set
TermIndexInterval to about 8 times the default. The net
can fix it?
--Michael
[1]
http://robotlibrarian.billdueber.com/2012/03/boosting-on-exactish-anchored-phrase-matching-in-solr-sst-4/
both titles).
In such cases it is almost impossible to move the search fields to the
qf parameter.
--Michael
Am 11.01.2015 um 14:19 schrieb Michael Lackhoff:
Or put another way: How can I do this boost in more complex queries like:
title:foo AND author:miller AND year:[2010 TO *]
It would be nice to have a title foo before another title some foo
and bar (given the other criteria also match both
that have an exact(ish) match.
--Michael
exactly at the
top, even if combined with dozens of other criteria.
And it doesn't really help to question the demand since the demand is
there and somewhat external. The point is how to best meet it.
--Michael
f.title.pf=title_exact^10 title_proper^5
analogous to (the existing)
f.title.qf=title_proper^10 title_related
everything should work just fine
But I guess this will only come if or when one of the developers has an
itch to scratch ;-)
Anyway, thanks a lot for all help and a great product
--Michael
I would do one of either:
1. Set a different Solr home for each instance. I'd use the
-Dsolr.solr.home=/d/2 command line switch when launching Solr to do so.
2. RAID 10 the drives. If you expect the Solr instances to get uneven
traffic, pooling the drives will allow a given Solr instance to
The downsides that come to mind:
1. Every write gets amplified by the number of nodes in the cloud. 1000
write requests end up creating 1000*N HTTP calls as the leader forwards
those writes individually to all of the followers in the cloud. Contrast
that with classical replication where only
The Jetty servlet container that Solr uses doesn't understand those
files. It would not use them to determine access, and would likely make
them accessible to web requests in plain text.
On 1/6/15 16:01, Craig Hoffman wrote:
Thanks Otis. Do think a .htaccess / .passwd file in the Solr admin
Also see this G+ post I wrote up recently showing how %tg deletions
changes over time for an every add also deletes a previous document
stress test: https://plus.google.com/112759599082866346694/posts/MJVueTznYnD
Mike McCandless
http://blog.mikemccandless.com
On Wed, Dec 31, 2014 at 12:21 PM,
On 12/30/14 12:42 PM, Jonathan Rochkind wrote:
On 12/30/14 12:35 PM, Walter Underwood wrote:
You want preserveOriginal=“1”.
You should only do this processing at index time.
If I only do this processing at index time, then mixedCase at query
time will no longer match mixed Case in the
I noticed that your suggester analyzers include
filter class=solr.PatternReplaceFilterFactory pattern=([^\w\d\*æøåÆØÅ ])
replacement= replace=all /
which seems like a bad idea -- this will strip all those arabic, russian
and japanese characters entirely, leaving you with probably only
thrashing.
You might try bumping up your heap some to see if that helps. It's made
a difference for me, but mostly in delaying the onset and limiting the
occurrence of this. Likely I just need an even larger heap.
Michael
On 12/18/14 17:36, heaven wrote:
Hi,
We have 2 shards, each one has
Thanks Andrey! I voted for your patch
-Mike
On 12/17/2014 4:01 AM, Kydryavtsev Andrey wrote:
For support scoreMode parameter in BlockJoinParentQParser we have this jira
with attached patch https://issues.apache.org/jira/browse/SOLR-5882
17.12.2014, 06:54, Michael Sokolov msoko
Have other people tried migrating an index that was created without
block (parent/child) indexing to one that *does* have it? Did you find
that you got duplicate documents - ie multiple documents with the same
uniqueField value? That's what I found, and I don't see how that's
possible.
, Michael Sokolov
msoko...@safaribooksonline.com wrote:
Have other people tried migrating an index that was created without block
(parent/child) indexing to one that *does* have it? Did you find that you
got duplicate documents - ie multiple documents with the same uniqueField
value? That's what I
I'm trying to use BJPQP and ran into a few little gotchas that I'd like
to share with y'all in case you have any advice.
First I ran into an NPE that probably should be handled better - maybe
just an exception with a better message. The framework I'm working in
makes it slightly annoying to
I'm not sure, but is it necessary to set positionIncAttr to 1 when there
are *not* any lemmas found? I think the usual pattern is to call
clearAttributes() at the start of incrementToken
-Mike
On 12/15/14 7:38 AM, Erlend Garåsen wrote:
I have written a dictionary-based lemmatizer for
Well I think your first step should be finding a reproducible test case
and encoding it as a unit test. But I suspect ultimately the fix will
be something to do with positionIncrement ...
-Mike
On 12/15/2014 09:08 AM, Erlend Garåsen wrote:
On 15.12.14 14:11, Michael Sokolov wrote:
I'm
I want terms to be stemmed, unless they are quoted, using dismax.
On 12/12/14 8:19 PM, Amit Jha wrote:
Hi Mike,
What is exact your use case?
What do mean by controlling the fields used for phrase queries ?
Rgds
AJ
On 12-Dec-2014, at 20:11, Michael Sokolov msoko...@safaribooksonline.com
for edismax phrase boosting, although it might be
interesting to support both, so more precise phrases get an even
higher boost as do less-precise phrases. But it does need to be
optional since it has an added cost at query time.
-- Jack Krupansky
-Original Message- From: Michael
, Michael Della Bitta wrote:
Only thing you have to worry about (in both the CUSS and the home grown
case) is a single bad document in a batch fails the whole batch. It's up
to you to fall back to writing them individually so the rest of the
batch makes it in.
With CUSS, your program will never
On Thursday, December 11, 2014 10:50 PM, Michael Sokolov
msoko...@safaribooksonline.com wrote:
I'd like to supply a different set of fields for phrases than for bare
terms. Specifically, we'd like to treat phrases as more exact -
probably turning off stemming and generally having a tighter analysis
chain
Doug - I believe pf controls the fields that are used for the phrase
queries *generated by the parser*.
What I am after is controlling the fields used for the phrase queries
*supplied by the user* -- ie surrounded by double-quotes.
-Mike
On 12/12/2014 08:53 AM, Doug Turnbull wrote:
Michael
document in a batch fails the whole batch. It's up
to you to fall back to writing them individually so the rest of the
batch makes it in.
Michael
On 12/11/14 11:04, Erick Erickson wrote:
I don't think so, it uses SolrInputDocuments and
lists thereof. So if you parse the xml and then
put things
I'd like to supply a different set of fields for phrases than for bare
terms. Specifically, we'd like to treat phrases as more exact -
probably turning off stemming and generally having a tighter analysis
chain. Note: this is *not* what's done by configuring pf which
controls fields for the
So the short answer to your original question is no. Highlighting is
designed to find matches *within* a tokenized (text) field only. That
is difficult because text gets processed and there are all sorts of
complications, but for integers it should be pretty easy to match the
values in the
=solr.LowerCaseFilterFactory/--
/analyzer
/fieldType
...
-Ursprüngliche Nachricht-
Von: Michael Sokolov [mailto:msoko...@safaribooksonline.com]
Gesendet: Donnerstag, 4. Dezember 2014 14:05
An: solr-user@lucene.apache.org
Betreff: Re: Keeping capitalization in suggestions?
Have
Alex, I spent some time answering questions there, but got ultimately
got turned off by the competitive nature of it. I wanted to increase my
score -- fun! But if you are not watching it all the time, the questions
go by very fast, and you lose your edge. The typical pattern seems to
be:
I get the impression there was a concern that the caller could hold on
to the query generated by JoinUtil for too long - eg across requests in
Solr. I'm not sure why the OP thinks that would happen, though.
-Mike
On 12/08/2014 04:57 AM, Mikhail Khludnev wrote:
On Fri, Dec 5, 2014 at 10:44
Right - allowing Solr to manage these queries (SOLR-6234) seems like the
way to go
... OP == original poster (I lost track of who started the discussion)
-Mike
On 12/08/2014 10:19 AM, Mikhail Khludnev wrote:
On Mon, Dec 8, 2014 at 5:38 PM, Michael Sokolov
msoko...@safaribooksonline.com
They should be reused if the impl. allows for it.
Besides reducing GC cost, it can also be a sizable performance gain
since these enums can have quite a bit of state that otherwise must be
re-initialized.
If you really don't want to reuse them (force a new enum every time), pass null.
Mike
How about creating a new core that only holds a single week's documents,
and retrieving all of its terms? Then each week, flush it and start over.
-Mike
On 12/05/2014 07:54 AM, lboutros wrote:
Dear all,
I would like to get the new terms of fields since last update (once a week).
If I
Have a look at AnalyzingInfixSuggester - it does what you want.
-Mike
On 12/4/14 3:05 AM, Clemens Wyss DEV wrote:
When I index a text such as Chamäleon and look for suggestions for chamä and/or
Chamä, I'd expect to get Chamäleon (uppercased).
But what happens is
If lowecasefilter (see below
There's no appreciable RAM cost during querying, faceting, sorting of
search results and so on. Stored fields are separate from the inverted
index. There is some cost in additional disk space required and I/O
during merging, but I think you'll find these are not significant. The
main cost
, it will probably answer some questions:
https://wiki.apache.org/solr/SolrCaching
I hope that helps!
Michael
Stefan I had problems like this -- and the short answer is -- it's a
PITA. Solr is not really designed to be extended in this way. In fact
I believe they are moving towards an architecture where this is even
less possible - folks will be encouraged to run solr using a bundled
exe, perhaps
Have you considered using grouping? If I understand your requirements,
I think it does what you want.
https://cwiki.apache.org/confluence/display/solr/Result+Grouping
On 12/02/2014 12:59 PM, Darin Amos wrote:
Thanks!
I will take a look at this. I do have an additional question, since after
I would keep trying with the highlighters. Some of them, at least, have
options to provide an external text source, although you will almost
certainly have to write some java code to get this working; extend the
highlighter you choose and supply its text from an external source.
-Mike
On
, Michael Sokolov msoko...@safaribooksonline.com
wrote:
Have you considered using grouping? If I understand your requirements, I think
it does what you want.
https://cwiki.apache.org/confluence/display/solr/Result+Grouping
https://cwiki.apache.org/confluence/display/solr/Result+Grouping
On 12/02
Mikhail - I can imagine a filter that strips out everything but numbers
and then indexes those with a (separate) numeric (trie) field. But I
don't believe you can do phrase or other proximity queries across
multiple fields. As long as an or-query is good enough, I think this
problem is not
On 12/02/2014 03:41 PM, Mikhail Khludnev wrote:
Thanks for suggestions. Do I remember correctly that you ignored last
Lucene Revolution?
I wouldn't say I ignored it, but it's true I wasn't there in DC: I'm
excited to catch up on the presentations as the videos become available,
though.
Of course testing is best, but you can also get an idea of the size of
the non-storage part of your index by looking in the solr index folder
and subtracting the size of the files containing the stored fields from
the total size of the index. This depends of course on the internal
storage
: https://www.linkedin.com/groups?gid=6713853
On 29 November 2014 at 13:16, Michael Sokolov
msoko...@safaribooksonline.com wrote:
Of course testing is best, but you can also get an idea of the size of the
non-storage part of your index by looking in the solr index folder and
subtracting the size
On 11/29/14 1:30 PM, Toke Eskildsen wrote:
Michael Sokolov [msoko...@safaribooksonline.com] wrote:
I wonder if there's any value in providing this metric (total index size
- stored field size - term vector size) as part of the admin panel? Is
it meaningful? It seems like there would be a lot
Yes - here's a working example we have in production (tested in 4.8.1
and 4.10.2, but the underlying lucene stuff hasn't changed since 4.6.1
I'm pretty sure):
The index size will not increase as quickly as you might think, and is
not an issue in most cases. An alternative to two fields, though, is to
index both upper- and lower-case tokens at the same position in a single
field, and then to perform no case folding at query time. There is no
right -- missed Ahmet's answer there in my haste to respond ...
-Mike
On 11/25/14 6:56 AM, Ahmet Arslan wrote:
Hi Apurv,
I wouldn't worry about index size, increase in index size is not linear (2x)
like that.
Please see similar discussion :
https://issues.apache.org/jira/browse/LUCENE-5620
Scores are related to total term frequencies *in each shard*, not
globally, and I think they may include term counts from deleted
documents as well, which could account for the discrepancy in scores
across the two shards.
-Mike
On 11/25/14 3:22 AM, rashi gandhi wrote:
Hi,
I have created
301 - 400 of 1917 matches
Mail list logo