Solr: Elevate with complex query specifying field names

2015-05-31 Thread Thomas Michael Engelke
I have Solr as the backend to an ECommerce solution where the fields can be configured to be searchable, which generates a schema.xml and loads it into Solr. Now we also allow to configure Solr search weight per field to affect queries, so my queries usually look something like this:

Sync failure after shard leader election when adding new replica.

2015-05-26 Thread Michael Roberts
Hi, I have a SolrCloud setup, running 4.10.3. The setup consists of several cores, each with a single shard and initially each shard has a single replica (so, basically, one machine). I am using core discovery, and my deployment tools create an empty core on newly provisioned machines. The

Re: Issue serving concurrent requests to SOLR on PROD

2015-05-19 Thread Michael Della Bitta
. Michael Jani, Vrushank mailto:vrushank.j...@truelocal.com.au 2015-05-19 at 03:51 Hello, We have production SOLR deployed on AWS Cloud. We have currently 4 live SOLR servers running on m3xlarge EC2 server instances behind ELB (Elastic Load Balancer) on AWS cloud. We run Apache SOLR in Tomcat

RE: Upgraded to 4.10.3, highlighting performance unusably slow

2015-05-02 Thread Ryan, Michael F. (LNG-DAY)
or vmstat command in Linux.) -Michael -Original Message- From: Cheng, Sophia Kuen [mailto:sophia_ch...@hms.harvard.edu] Sent: Saturday, May 02, 2015 4:13 PM To: solr-user@lucene.apache.org Subject: Upgraded to 4.10.3, highlighting performance unusably slow Hello, We recently upgraded solr

RE: How to start an optimize in SolrJ without waiting for it to complete?

2015-05-01 Thread Ryan, Michael F. (LNG-DAY)
think you'll need to run it on a separate thread. -Michael -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Friday, May 01, 2015 10:09 AM To: solr-user@lucene.apache.org Subject: Re: How to start an optimize in SolrJ without waiting for it to complete? On 5/1/2015

Replication not triggered

2015-04-27 Thread Michael Lackhoff
(Replicable) 1430107573634 27 - Slave (Searching) 1429762011916 23 287.14 GB Any idea why the replication is not triggered here or what I could try to fix it? Solr Version is 4.10.3. -Michael

Re: Is it possible to facet on the results of a custom solr function?

2015-04-20 Thread Motulewicz, Michael
Not sure if there’s a better way, but this works From: Motulewicz, Michael Motulewicz michael.motulew...@healthsparq.commailto:michael.motulew...@healthsparq.com Reply-To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org

Is it possible to facet on the results of a custom solr function?

2015-04-13 Thread Motulewicz, Michael
Hi, I’m attempting to facet on the results of a custom solr function. I’ve been trying all kinds of combinations that I think would work, but keep getting errors. I’m starting to wonder if it is possible. I’m using Solr 4.0 and here is how I am calling:

RE: Are there known issues with Java 8 in older versions of Solr?

2015-04-06 Thread Ryan, Michael F. (LNG-DAY)
I can at least say that Solr 3.x works fine with Java 7. -Michael -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Monday, April 06, 2015 5:26 PM To: solr-user@lucene.apache.org Subject: Re: Are there known issues with Java 8 in older versions of Solr? On 4/6

Best way to monitor Solr regarding crashes

2015-03-28 Thread Michael Bakonyi
with Solr 4.8.X Cheers, Michael

Re: Applying Tokenizers and Filters to CopyFields

2015-03-26 Thread Michael Della Bitta
Glad you are sorted out! Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b

Re: Applying Tokenizers and Filters to CopyFields

2015-03-25 Thread Michael Della Bitta
happens at query time. Not sure if that's significant for you. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com

Re: Applying Tokenizers and Filters to CopyFields

2015-03-25 Thread Michael Della Bitta
is the query supposed to retrieve the lower-case version? (sorry, if this sounds like a naive question, but I have a feeling that I am missing something really basic here). Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East

Re: Solr and HDFS configuration

2015-03-24 Thread Michael Della Bitta
the most. Solr on HDFS currently doesn't have any sort of rack locality like there is with say HBase colocated on the HDFS nodes. So you can expect that even with Solr installed on the same nodes as your datanodes for HDFS, that there will be remote IO. Michael Della Bitta Senior Software Engineer

RE: Invalid Date String:'1992-07-10T17'

2015-03-10 Thread Ryan, Michael F. (LNG-DAY)
You'll need to wrap the date in quotes, since it contains a colon: String a = speechDate:\1992-07-10T17:33:18Z\; -Michael -Original Message- From: Mirko Torrisi [mailto:mirko.torr...@ucdconnect.ie] Sent: Tuesday, March 10, 2015 3:34 PM To: solr-user@lucene.apache.org Subject: Invalid

RE: Performance on faceting using docValues

2015-03-05 Thread Ryan, Michael F. (LNG-DAY)
thought maybe I was the only one... -Michael -Original Message- From: lei [mailto:simpl...@gmail.com] Sent: Thursday, March 05, 2015 2:40 PM To: solr-user@lucene.apache.org Subject: Re: Performance on faceting using docValues Here is the specs of some example query faceting on three

[ANNOUNCE] Apache Solr 4.10.4 released

2015-03-05 Thread Michael McCandless
October 2014, Apache Solr™ 4.10.4 available The Lucene PMC is pleased to announce the release of Apache Solr 4.10.4 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted

Re: 8 Shards of Cloud with 4.10.3.

2015-02-24 Thread Michael Della Bitta
time, but on the other hand, you don't have to maintain a Zookeeper ensemble or devote brain cells to understanding collections/shards/etc. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017

Re: 8 Shards of Cloud with 4.10.3.

2015-02-24 Thread Michael Della Bitta
Benson: Are you trying to run independent invocations of Solr for every node? Otherwise, you'd just want to create a 8 shard collection with maxShardsPerNode set to 8 (or more I guess). Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence

Re: highlighting the boolean query

2015-02-24 Thread Michael Sokolov
There is also PostingsHighlighter -- I recommend it, if only for the performance improvement, which is substantial, but I'm not completely sure how it handles this issue. The one drawback I *am* aware of is that it is insensitive to positions (so words from phrases get highlighted even in

Re: incorrect Java version reported in solr dashboard

2015-02-23 Thread Michael Della Bitta
You're probably launching Solr using the older version of Java somehow. You should make sure your PATH and JAVA_HOME variables point at your Java 8 install from the point of view of the script or configuration that launches Solr. Hope that helps. Michael Della Bitta Senior Software Engineer o

Re: ignoring bad documents during index

2015-02-20 Thread Michael Della Bitta
At the layer right before you send that XML out, have it have a fallback option on error where it sends each document one at a time if there's a failure with the batch. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East

Re: Solr suggest is related to second letter, not to initial letter

2015-02-18 Thread Michael Sokolov
On 02/17/2015 03:46 AM, Volkan Altan wrote: First of all thank you for your answer. You're welcome - thanks for sending a more complete example of your problem and expected behavior. I don’t want to use KeywordTokenizer. Because, as long as the compound words written by the user are

Re: Solr suggest is related to second letter, not to initial letter

2015-02-15 Thread Michael Sokolov
StandardTokenizer splits your text into tokens, and the suggester suggests tokens independently. It sounds as if you want the suggestions to be based on the entire text (not just the current word), and that only adjacent words in the original should appear as suggestions. Assuming that's

Re: variaton on boosting recent documents gives exception

2015-02-13 Thread Michael Lackhoff
, which is not very handy. -Michael

variaton on boosting recent documents gives exception

2015-02-12 Thread Michael Lackhoff
see the current year (2015) is hard coded. Is there an easy way to get the current year within the function? Messing around with NOW looks very complicated. -Michael

Re: DIH: entities in xml problem

2015-02-04 Thread Michael Sokolov
(and the content has the entities) and will be dificult add the DTD to the content... Thanks - Raul El 03/02/15 a las 17:15, Michael Sokolov escribió: If the entities are in the content, you would need to add the DTD to the content, not to the stylesheet. Or you could transform the content

Re: DIH: entities in xml problem

2015-02-03 Thread Michael Sokolov
If the entities are in the content, you would need to add the DTD to the content, not to the stylesheet. Or you could transform the content converting the entities. -Mike On 02/03/2015 10:41 AM, Raul wrote: Hi all! I'm trying to use Solr with the DIH and xslt processing. All is fine till i

Re: Solr Logging files get high

2015-02-03 Thread Michael Della Bitta
If you're trying to do a bulk ingest of data, I recommend committing less frequently. Don't soft commit at all until the end of the batch, and hard commit every 60 seconds. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18

Re: How to exclude selected filter (facet) from search result?

2015-02-02 Thread Michael Sokolov
Have a look here: https://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams; it might answer your question. Typically what I recommend is to keep the selected facet in view, but without any limitation on its counts. However if you want to hide it altogether, I

Re: Solr Suggester Autocomplete Working Example

2015-02-02 Thread Michael Sokolov
Please go ahead and play with autocomplete on safaribooksonline.com/home - if you are not a subscriber you will have to sign up for a free trial. We use the AnalyzingInfixSuggester. From your description, it sounds as if you are building completions from a field that you also use for

Re: Solr Logging files get high

2015-02-02 Thread Michael Sokolov
I was tempted to suggest rehab -- but seriously it wasn't clear if Nitin meant the log files Michael is referring to, or the transaction log (tlog). If it's the transaction log, the solution is more frequent hard commits. -Mike On 2/2/2015 11:48 AM, Michael Della Bitta wrote: If you'd like

Re: Solr Logging files get high

2015-02-02 Thread Michael Della Bitta
If you'd like to reduce the amount of lines Solr logs, you need to edit the file example/resources/log4j.properties in Solr's home directory. Change lines that say INFO to WARN. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing

Re: Solr Logging files get high

2015-02-02 Thread Michael Della Bitta
Good call, it could easily be the tlog Nitin is talking about. As for which definition of high, I was making assumptions as well. :) Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t

Re: Does DocValues improve Grouping performance ?

2015-01-31 Thread Michael Sokolov
We were using grouping (no DocValues, though) and recently switched to using block-indexing and joins (see https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers). We got a nice speedup on average (perhaps 2x faster) and an even better improvement in

Re: Does DocValues improve Grouping performance ?

2015-01-31 Thread Michael Sokolov
On 1/31/2015 2:47 PM, Mikhail Khludnev wrote: Michael, Please check two questions inlined below Hi Mikhail, On Sat, Jan 31, 2015 at 10:14 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: You can only handle a single relation this way since you have to restructure your index

Re: [MASSMAIL]Re: Contextual sponsored results with Solr

2015-01-31 Thread Michael Sokolov
If you have a finite known set of hosts, you could do something truly awful: create a field for each distinct host and set all of them to have value={id of the document} except for the host to which the document belongs: assign that hostname field some constant value, like true. Then query

RE: Avoiding wildcard queries using edismax query parser

2015-01-23 Thread Ryan, Michael F. (LNG-DAY)
Here's a Jira for this: https://issues.apache.org/jira/browse/SOLR-3031 I've attached a patch there that might be useful for you. -Michael -Original Message- From: Jorge Luis Betancourt González [mailto:jlbetanco...@uci.cu] Sent: Thursday, January 22, 2015 4:34 PM To: solr-user

SolrCloud timing out marking node as down during startup.

2015-01-22 Thread Michael Roberts
Hi, I'm seeing some odd behavior that I am hoping someone could explain to me. The configuration I'm using to repro the issue, has a ZK cluster and a single Solr instance. The instance has 10 Cores, and none of the cores are sharded. The initial startup is fine, the Solr instance comes up and

Re: AW: transactions@Solr(J)

2015-01-20 Thread Michael Sokolov
the commits to the Solr-transaction.log? -Clemens -Ursprüngliche Nachricht- Von: Michael Sokolov [mailto:msoko...@safaribooksonline.com] Gesendet: Dienstag, 20. Januar 2015 14:54 An: solr-user@lucene.apache.org Betreff: Re: transactions@Solr(J) On 1/20/2015 5:18 AM, Clemens Wyss DEV wrote: http

Re: transactions@Solr(J)

2015-01-20 Thread Michael Sokolov
On 1/20/2015 5:18 AM, Clemens Wyss DEV wrote: http://stackoverflow.com/questions/10805117/solr-transaction-management-using-solrj Is it true, that a SolrServer-instance denotes a transaction context? Say I have two concurrent threads, each having a SolrServer-instance pointing to the same

Re: Need Debug Direction on Performance Problem

2015-01-18 Thread Michael Sokolov
You can also implement your own cursor easily enough if you have a unique sortkey (not relevance score). Say you can sort by id, then you select batch 1 (50k docs, say) and record the last (maximum) id in the batch. For the next batch, limit it to id last_id and get the first 50k docs (don't

Re: Solr example for Solr 4.10.2 gives warning about Multiple request handlers with same name

2015-01-16 Thread Michael Sokolov
I've seen the same thing, poked around a bit and eventually decided to ignore it. I think there may be a ticket related to that saying it's a logging bug (ie not a real issue), but I couldn't swear to it. -Mike On 01/16/2015 12:36 PM, Tom Burton-West wrote: Hello, I'm running Solr 4.10.2

Re: Occasionally getting error in solr suggester component.

2015-01-15 Thread Michael Sokolov
is that we can avoid the rebuilt index on every commit or optimize. Is this the right way ?? or any that I missed ??? Regards dhanesh s.r On Thu, Jan 15, 2015 at 3:20 AM, Michael Sokolov msoko...@safaribooksonline.com wrote: did you build the spellcheck index using spellcheck.build

Re: Occasionally getting error in solr suggester component.

2015-01-14 Thread Michael Sokolov
at 12:47 AM, Michael Sokolov msoko...@safaribooksonline.com wrote: I think you are probably getting bitten by one of the issues addressed in LUCENE-5889 I would recommend against using buildOnCommit=true - with a large index this can be a performance-killer. Instead, build the index yourself

Re: OutOfMemoryError for PDF document upload into Solr

2015-01-14 Thread Michael Della Bitta
Yep, you'll have to increase the heap size for your Tomcat container. http://stackoverflow.com/questions/6897476/tomcat-7-how-to-set-initial-heap-size-correctly Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st

Re: How to configure Solr PostingsFormat block size

2015-01-14 Thread Michael Sokolov
As a foolish dev (not malicious I hope!), I did mess around with something like this once; I was writing my own Codec. I found I had to create a file called META-INF/services/org.apache.lucene.codecs.Codec in my solr plugin jar that contained the fully-qualified class name of my codec: I

Re: Occasionally getting error in solr suggester component.

2015-01-13 Thread Michael Sokolov
I think you are probably getting bitten by one of the issues addressed in LUCENE-5889 I would recommend against using buildOnCommit=true - with a large index this can be a performance-killer. Instead, build the index yourself using the Solr spellchecker support (spellcheck.build=true)

Re: Solr limiting number of rows to indexed to 21500 every time.

2015-01-13 Thread Michael Della Bitta
if there are any errors on the Oracle side? Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com

Re: solrcloud nodes registering as 127.0.1.1

2015-01-12 Thread Michael Della Bitta
Another way of doing it is by setting the -Dhost=$hostname parameter when you start Solr. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions

Re: How to configure Solr PostingsFormat block size

2015-01-12 Thread Michael Sokolov
It looks like this is a good starting point: http://wiki.apache.org/solr/SolrConfigXml#codecFactory -Mike On 01/12/2015 03:37 PM, Tom Burton-West wrote: Hello all, Our indexes have around 3 billion unique terms, so for Solr 3, we set TermIndexInterval to about 8 times the default. The net

pf doesn't work like normal phrase query

2015-01-11 Thread Michael Lackhoff
can fix it? --Michael [1] http://robotlibrarian.billdueber.com/2012/03/boosting-on-exactish-anchored-phrase-matching-in-solr-sst-4/

Re: pf doesn't work like normal phrase query

2015-01-11 Thread Michael Lackhoff
both titles). In such cases it is almost impossible to move the search fields to the qf parameter. --Michael

Re: pf doesn't work like normal phrase query

2015-01-11 Thread Michael Lackhoff
Am 11.01.2015 um 14:19 schrieb Michael Lackhoff: Or put another way: How can I do this boost in more complex queries like: title:foo AND author:miller AND year:[2010 TO *] It would be nice to have a title foo before another title some foo and bar (given the other criteria also match both

Re: pf doesn't work like normal phrase query

2015-01-11 Thread Michael Lackhoff
that have an exact(ish) match. --Michael

Re: pf doesn't work like normal phrase query

2015-01-11 Thread Michael Lackhoff
exactly at the top, even if combined with dozens of other criteria. And it doesn't really help to question the demand since the demand is there and somewhat external. The point is how to best meet it. --Michael

Re: pf doesn't work like normal phrase query

2015-01-11 Thread Michael Lackhoff
f.title.pf=title_exact^10 title_proper^5 analogous to (the existing) f.title.qf=title_proper^10 title_related everything should work just fine But I guess this will only come if or when one of the developers has an itch to scratch ;-) Anyway, thanks a lot for all help and a great product --Michael

Re: Running Multiple Solr Instances

2015-01-06 Thread Michael Della Bitta
I would do one of either: 1. Set a different Solr home for each instance. I'd use the -Dsolr.solr.home=/d/2 command line switch when launching Solr to do so. 2. RAID 10 the drives. If you expect the Solr instances to get uneven traffic, pooling the drives will allow a given Solr instance to

Re: solrcloud without faceting, i.e. for failover only

2015-01-06 Thread Michael Della Bitta
The downsides that come to mind: 1. Every write gets amplified by the number of nodes in the cloud. 1000 write requests end up creating 1000*N HTTP calls as the leader forwards those writes individually to all of the followers in the cloud. Contrast that with classical replication where only

Re: .htaccess / password

2015-01-06 Thread Michael Della Bitta
The Jetty servlet container that Solr uses doesn't understand those files. It would not use them to determine access, and would likely make them accessible to web requests in plain text. On 1/6/15 16:01, Craig Hoffman wrote: Thanks Otis. Do think a .htaccess / .passwd file in the Solr admin

Re: Frequent deletions

2015-01-01 Thread Michael McCandless
Also see this G+ post I wrote up recently showing how %tg deletions changes over time for an every add also deletes a previous document stress test: https://plus.google.com/112759599082866346694/posts/MJVueTznYnD Mike McCandless http://blog.mikemccandless.com On Wed, Dec 31, 2014 at 12:21 PM,

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Michael Sokolov
On 12/30/14 12:42 PM, Jonathan Rochkind wrote: On 12/30/14 12:35 PM, Walter Underwood wrote: You want preserveOriginal=“1”. You should only do this processing at index time. If I only do this processing at index time, then mixedCase at query time will no longer match mixed Case in the

Re: Multi Language Suggester Solr Issue

2014-12-28 Thread Michael Sokolov
I noticed that your suggester analyzers include filter class=solr.PatternReplaceFilterFactory pattern=([^\w\d\*æøåÆØÅ ]) replacement= replace=all / which seems like a bad idea -- this will strip all those arabic, russian and japanese characters entirely, leaving you with probably only

Re: Endless 100% CPU usage on searcherExecutor thread

2014-12-18 Thread Michael Della Bitta
thrashing. You might try bumping up your heap some to see if that helps. It's made a difference for me, but mostly in delaying the onset and limiting the occurrence of this. Likely I just need an even larger heap. Michael On 12/18/14 17:36, heaven wrote: Hi, We have 2 shards, each one has

Re: questions about BlockJoinParentQParser

2014-12-17 Thread Michael Sokolov
Thanks Andrey! I voted for your patch -Mike On 12/17/2014 4:01 AM, Kydryavtsev Andrey wrote: For support scoreMode parameter in BlockJoinParentQParser we have this jira with attached patch https://issues.apache.org/jira/browse/SOLR-5882 17.12.2014, 06:54, Michael Sokolov msoko

converting to parent/child block indexing

2014-12-17 Thread Michael Sokolov
Have other people tried migrating an index that was created without block (parent/child) indexing to one that *does* have it? Did you find that you got duplicate documents - ie multiple documents with the same uniqueField value? That's what I found, and I don't see how that's possible.

Re: converting to parent/child block indexing

2014-12-17 Thread Michael Sokolov
, Michael Sokolov msoko...@safaribooksonline.com wrote: Have other people tried migrating an index that was created without block (parent/child) indexing to one that *does* have it? Did you find that you got duplicate documents - ie multiple documents with the same uniqueField value? That's what I

questions about BlockJoinParentQParser

2014-12-16 Thread Michael Sokolov
I'm trying to use BJPQP and ran into a few little gotchas that I'd like to share with y'all in case you have any advice. First I ran into an NPE that probably should be handled better - maybe just an exception with a better message. The framework I'm working in makes it slightly annoying to

Re: My new lemmatizer interfers with the highlighter

2014-12-15 Thread Michael Sokolov
I'm not sure, but is it necessary to set positionIncAttr to 1 when there are *not* any lemmas found? I think the usual pattern is to call clearAttributes() at the start of incrementToken -Mike On 12/15/14 7:38 AM, Erlend Garåsen wrote: I have written a dictionary-based lemmatizer for

Re: My new lemmatizer interfers with the highlighter

2014-12-15 Thread Michael Sokolov
Well I think your first step should be finding a reproducible test case and encoding it as a unit test. But I suspect ultimately the fix will be something to do with positionIncrement ... -Mike On 12/15/2014 09:08 AM, Erlend Garåsen wrote: On 15.12.14 14:11, Michael Sokolov wrote: I'm

Re: different fields for user-supplied phrases in edismax

2014-12-13 Thread Michael Sokolov
I want terms to be stemmed, unless they are quoted, using dismax. On 12/12/14 8:19 PM, Amit Jha wrote: Hi Mike, What is exact your use case? What do mean by controlling the fields used for phrase queries ? Rgds AJ On 12-Dec-2014, at 20:11, Michael Sokolov msoko...@safaribooksonline.com

Re: different fields for user-supplied phrases in edismax

2014-12-13 Thread Michael Sokolov
for edismax phrase boosting, although it might be interesting to support both, so more precise phrases get an even higher boost as do less-precise phrases. But it does need to be optional since it has an added cost at query time. -- Jack Krupansky -Original Message- From: Michael

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-12 Thread Michael Della Bitta
, Michael Della Bitta wrote: Only thing you have to worry about (in both the CUSS and the home grown case) is a single bad document in a batch fails the whole batch. It's up to you to fall back to writing them individually so the rest of the batch makes it in. With CUSS, your program will never

Re: different fields for user-supplied phrases in edismax

2014-12-12 Thread Michael Sokolov
On Thursday, December 11, 2014 10:50 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: I'd like to supply a different set of fields for phrases than for bare terms. Specifically, we'd like to treat phrases as more exact - probably turning off stemming and generally having a tighter analysis chain

Re: different fields for user-supplied phrases in edismax

2014-12-12 Thread Michael Sokolov
Doug - I believe pf controls the fields that are used for the phrase queries *generated by the parser*. What I am after is controlling the fields used for the phrase queries *supplied by the user* -- ie surrounded by double-quotes. -Mike On 12/12/2014 08:53 AM, Doug Turnbull wrote: Michael

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Michael Della Bitta
document in a batch fails the whole batch. It's up to you to fall back to writing them individually so the rest of the batch makes it in. Michael On 12/11/14 11:04, Erick Erickson wrote: I don't think so, it uses SolrInputDocuments and lists thereof. So if you parse the xml and then put things

different fields for user-supplied phrases in edismax

2014-12-11 Thread Michael Sokolov
I'd like to supply a different set of fields for phrases than for bare terms. Specifically, we'd like to treat phrases as more exact - probably turning off stemming and generally having a tighter analysis chain. Note: this is *not* what's done by configuring pf which controls fields for the

Re: Highlighting integer field

2014-12-11 Thread Michael Sokolov
So the short answer to your original question is no. Highlighting is designed to find matches *within* a tokenized (text) field only. That is difficult because text gets processed and there are all sorts of complications, but for integers it should be pretty easy to match the values in the

Re: AW: AW: Keeping capitalization in suggestions?

2014-12-09 Thread Michael Sokolov
=solr.LowerCaseFilterFactory/-- /analyzer /fieldType ... -Ursprüngliche Nachricht- Von: Michael Sokolov [mailto:msoko...@safaribooksonline.com] Gesendet: Donnerstag, 4. Dezember 2014 14:05 An: solr-user@lucene.apache.org Betreff: Re: Keeping capitalization in suggestions? Have

Re: Q: Does anybody asks/answer Solr questions on Stack Overflow? Why?

2014-12-09 Thread Michael Sokolov
Alex, I spent some time answering questions there, but got ultimately got turned off by the competitive nature of it. I wanted to increase my score -- fun! But if you are not watching it all the time, the questions go by very fast, and you lose your edge. The typical pattern seems to be:

Re: Anti-Pattern in lucent-join jar?

2014-12-08 Thread Michael Sokolov
I get the impression there was a concern that the caller could hold on to the query generated by JoinUtil for too long - eg across requests in Solr. I'm not sure why the OP thinks that would happen, though. -Mike On 12/08/2014 04:57 AM, Mikhail Khludnev wrote: On Fri, Dec 5, 2014 at 10:44

Re: Anti-Pattern in lucent-join jar?

2014-12-08 Thread Michael Sokolov
Right - allowing Solr to manage these queries (SOLR-6234) seems like the way to go ... OP == original poster (I lost track of who started the discussion) -Mike On 12/08/2014 10:19 AM, Mikhail Khludnev wrote: On Mon, Dec 8, 2014 at 5:38 PM, Michael Sokolov msoko...@safaribooksonline.com

Re: DocsEnum and TermsEnum reuse in lucene join library?

2014-12-06 Thread Michael McCandless
They should be reused if the impl. allows for it. Besides reducing GC cost, it can also be a sizable performance gain since these enums can have quite a bit of state that otherwise must be re-initialized. If you really don't want to reuse them (force a new enum every time), pass null. Mike

Re: Get the new terms of fields since last update

2014-12-05 Thread Michael Sokolov
How about creating a new core that only holds a single week's documents, and retrieving all of its terms? Then each week, flush it and start over. -Mike On 12/05/2014 07:54 AM, lboutros wrote: Dear all, I would like to get the new terms of fields since last update (once a week). If I

Re: Keeping capitalization in suggestions?

2014-12-04 Thread Michael Sokolov
Have a look at AnalyzingInfixSuggester - it does what you want. -Mike On 12/4/14 3:05 AM, Clemens Wyss DEV wrote: When I index a text such as Chamäleon and look for suggestions for chamä and/or Chamä, I'd expect to get Chamäleon (uppercased). But what happens is If lowecasefilter (see below

Re: Large fields storage

2014-12-04 Thread Michael Sokolov
There's no appreciable RAM cost during querying, faceting, sorting of search results and so on. Stored fields are separate from the inverted index. There is some cost in additional disk space required and I/O during merging, but I think you'll find these are not significant. The main cost

Re: Question on Solr Caching

2014-12-04 Thread Michael Della Bitta
, it will probably answer some questions: https://wiki.apache.org/solr/SolrCaching I hope that helps! Michael

Re: Problem with additional Servlet Filter (SolrRequestParsers Exception)

2014-12-03 Thread Michael Sokolov
Stefan I had problems like this -- and the short answer is -- it's a PITA. Solr is not really designed to be extended in this way. In fact I believe they are moving towards an architecture where this is even less possible - folks will be encouraged to run solr using a bundled exe, perhaps

Re: SOLR Join Query, Use highest weight.

2014-12-02 Thread Michael Sokolov
Have you considered using grouping? If I understand your requirements, I think it does what you want. https://cwiki.apache.org/confluence/display/solr/Result+Grouping On 12/02/2014 12:59 PM, Darin Amos wrote: Thanks! I will take a look at this. I do have an additional question, since after

Re: Getting the position of a word via Solr API

2014-12-02 Thread Michael Sokolov
I would keep trying with the highlighters. Some of them, at least, have options to provide an external text source, although you will almost certainly have to write some java code to get this working; extend the highlighter you choose and supply its text from an external source. -Mike On

Re: SOLR Join Query, Use highest weight.

2014-12-02 Thread Michael Sokolov
, Michael Sokolov msoko...@safaribooksonline.com wrote: Have you considered using grouping? If I understand your requirements, I think it does what you want. https://cwiki.apache.org/confluence/display/solr/Result+Grouping https://cwiki.apache.org/confluence/display/solr/Result+Grouping On 12/02

Re: indexing numbers in texts for range queries

2014-12-02 Thread Michael Sokolov
Mikhail - I can imagine a filter that strips out everything but numbers and then indexes those with a (separate) numeric (trie) field. But I don't believe you can do phrase or other proximity queries across multiple fields. As long as an or-query is good enough, I think this problem is not

Re: indexing numbers in texts for range queries

2014-12-02 Thread Michael Sokolov
On 12/02/2014 03:41 PM, Mikhail Khludnev wrote: Thanks for suggestions. Do I remember correctly that you ignored last Lucene Revolution? I wouldn't say I ignored it, but it's true I wasn't there in DC: I'm excited to catch up on the presentations as the videos become available, though.

Re: Constantly high disk read access (40-60M/s)

2014-11-29 Thread Michael Sokolov
Of course testing is best, but you can also get an idea of the size of the non-storage part of your index by looking in the solr index folder and subtracting the size of the files containing the stored fields from the total size of the index. This depends of course on the internal storage

Re: Constantly high disk read access (40-60M/s)

2014-11-29 Thread Michael Sokolov
: https://www.linkedin.com/groups?gid=6713853 On 29 November 2014 at 13:16, Michael Sokolov msoko...@safaribooksonline.com wrote: Of course testing is best, but you can also get an idea of the size of the non-storage part of your index by looking in the solr index folder and subtracting the size

Re: Standardized index metrics (Was: Constantly high disk read access (40-60M/s))

2014-11-29 Thread Michael Sokolov
On 11/29/14 1:30 PM, Toke Eskildsen wrote: Michael Sokolov [msoko...@safaribooksonline.com] wrote: I wonder if there's any value in providing this metric (total index size - stored field size - term vector size) as part of the admin panel? Is it meaningful? It seems like there would be a lot

Re: updateNumericDocValue in solr 4.6.1

2014-11-26 Thread Michael Sokolov
Yes - here's a working example we have in production (tested in 4.8.1 and 4.10.2, but the underlying lucene stuff hasn't changed since 4.6.1 I'm pretty sure):

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Michael Sokolov
The index size will not increase as quickly as you might think, and is not an issue in most cases. An alternative to two fields, though, is to index both upper- and lower-case tokens at the same position in a single field, and then to perform no case folding at query time. There is no

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Michael Sokolov
right -- missed Ahmet's answer there in my haste to respond ... -Mike On 11/25/14 6:56 AM, Ahmet Arslan wrote: Hi Apurv, I wouldn't worry about index size, increase in index size is not linear (2x) like that. Please see similar discussion : https://issues.apache.org/jira/browse/LUCENE-5620

Re: Fwd: Change in the Score of Similiar Documents

2014-11-25 Thread Michael Sokolov
Scores are related to total term frequencies *in each shard*, not globally, and I think they may include term counts from deleted documents as well, which could account for the discrepancy in scores across the two shards. -Mike On 11/25/14 3:22 AM, rashi gandhi wrote: Hi, I have created

<    1   2   3   4   5   6   7   8   9   10   >