RE: Quantity wise price searching in Apache SOLR

2015-07-28 Thread Reitzel, Charles
This is a job for a custom query function. -Original Message- From: unique.jim...@gmail.com [mailto:unique.jim...@gmail.com] Sent: Tuesday, July 28, 2015 2:35 AM To: solr-user@lucene.apache.org Subject: Quantity wise price searching in Apache SOLR Currently I am working on e-commerce web

RE: Optimizing Solr indexing over WAN

2015-07-22 Thread Reitzel, Charles
Indexing over a WAN will be slow, limited by the bandwidth of the pipe. I think you will be better served to move the data in bulk to the same LAN as your target solr instances.I would suggest ZIP+scp ... or your favorite file system replication/synchronization tool. It's true, if you are u

RE: Solr Cloud: Duplicate documents in multiple shards

2015-07-21 Thread Reitzel, Charles
the distribution of data. -Original Message----- From: Reitzel, Charles Sent: Tuesday, July 21, 2015 9:55 AM To: solr-user@lucene.apache.org Subject: RE: Solr Cloud: Duplicate documents in multiple shards When are you generating the UUID exactly? If you set the unique ID field on an "updat

RE: Solr Cloud: Duplicate documents in multiple shards

2015-07-21 Thread Reitzel, Charles
When are you generating the UUID exactly? If you set the unique ID field on an "update", and it contains a new UUID, you have effectively created a new document. Just a thought. -Original Message- From: mesenthil1 [mailto:senthilkumar.arumu...@viacomcontractor.com] Sent: Tuesday, Ju

RE: copying data from one collection to another collection (solr cloud 521)

2015-07-15 Thread Reitzel, Charles
as suggested https://cwiki.apache.org/confluence/display/solr/Making+and+Restoring+Backu ps+of+SolrCores Trying to find if there are any better ways than above option. Thanks Raja On 7/15/15, 10:23 AM, "Reitzel, Charles" wrote: >Since they want explicitly search within a given "vers

RE: MapReduceIndexerTool

2015-07-15 Thread Reitzel, Charles
%20%22Hadoop%22%29%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC -Original Message- From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] Sent: Wednesday, July 15, 2015 11:24 AM To: solr-user@lucene.apache.org Subject: RE: copying data from one collection to an

RE: copying data from one collection to another collection (solr cloud 521)

2015-07-15 Thread Reitzel, Charles
Since they want explicitly search within a given "version" of the data, this seems like a textbook application for collection aliases. You could have N public collection names: current_stuff, previous_stuff_1, previous_stuff_2, ... At any given time, these will be aliased to reference the

RE: Which default Boolean operator to set, AND or OR?

2015-07-15 Thread Reitzel, Charles
A common approach to this problem is to include the spellcheck component and, if there are corrections, include a "Did you mean ..." link in the results page. -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Wednesday, July 15, 2015 10:36 AM To: solr-user@lu

RE: SOLR nrt read writes

2015-07-15 Thread Reitzel, Charles
And, to answer your other question, yes, you can turn off auto-warming.If your instance is dedicated to this client task, it may serve no purpose or be actually counter-productive. In the past, I worked on a Solr-based application that committed frequently under application control (vs. aut

RE: Range Facet queries for date ranges with with non-constant gaps

2015-07-13 Thread Reitzel, Charles
Try facet.mincount=1. It will still apply to range facets. -Original Message- From: JoeSmith [mailto:fidw...@gmail.com] Sent: Monday, July 13, 2015 5:56 PM To: solr-user Subject: Range Facet queries for date ranges with with non-constant gaps I am trying to do a range facet query f

RE: Multiple facet fields Query

2015-07-13 Thread Reitzel, Charles
Indeed, it is built into the HTML Forms specification that any query parameter may be repeated any number of times. If your ESB tool didn't support this, it would be very broken. My expectation is that it does and a bit more debugging and/or research into the product will yield results.

RE: LowerCaseFilterFactory burns CPU

2015-07-09 Thread Reitzel, Charles
ware of the all characters in Unicode. https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory The ICU classes require additional jars to be loaded into Solr before they will work. Thanks, Shawn -Original Message- From: Reitzel, Charles [mailto:charl

RE: Do I really need copyField when my app can do the copy?

2015-07-09 Thread Reitzel, Charles
-Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Thursday, July 09, 2015 9:55 AM To: solr-user@lucene.apache.org Subject: Re: Do I really need copyField when my app can do the copy? On 7/9/2015 2:35 AM, Nir Barel wrote: > I wants to add a question regarding copyF

RE: LowerCaseFilterFactory burns CPU

2015-07-09 Thread Reitzel, Charles
That should be fixable. In a past life, I generated a perfect hash to fold case for Unicode in a locale-neutral manner and it was very fast. If I remember right, there are only about 2500 Unicode characters that can be case folded at all. So the generated, collision-free hash function was v

RE: Spell checking the synonym list?

2015-07-09 Thread Reitzel, Charles
One of the uses of synonyms is to replace a mis-spelled query term with a correctly spelled value. The "2 sided" synonym file format allows you to control which values "survive" into the actual query. lawyer, attorney, ambulance chaser, atorney, lowyor => lawyer, attorney I am not aware, howev

RE: optimize status

2015-07-01 Thread Reitzel, Charles
On 6/29/2015 2:48 PM, Reitzel, Charles wrote: > > I take your point about shards and segments being different things. I > > understand that the hash ranges per segment are not kept in ZK. I guess I > > wish they were. > > > > In this regard, I liked Mongodb,

RE: optimize status

2015-06-29 Thread Reitzel, Charles
I see what you mean. Many thanks for the details. -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Monday, June 29, 2015 6:36 PM To: solr-user@lucene.apache.org Subject: Re: optimize status Reitzel, Charles wrote: > Question, Toke: in your "i

RE: optimize status

2015-06-29 Thread Reitzel, Charles
say 2.5?) be a good best practice? -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Monday, June 29, 2015 3:56 PM To: solr-user@lucene.apache.org Subject: Re: optimize status Reitzel, Charles wrote: > Is there really a good reason to consolidate down t

RE: optimize status

2015-06-29 Thread Reitzel, Charles
ing would be to ensure that document route ranges are handled properly, and I don't think the value used for routing has anything to do with what segment they happen to be stored into. -Original Message- From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] Sent: Monday, June 29

RE: optimize status

2015-06-29 Thread Reitzel, Charles
Is there really a good reason to consolidate down to a single segment? Any incremental query performance benefit is tiny compared to the loss of managability. I.e. shouldn't segments _always_ be kept small enough to facilitate re-balancing data across shards? Even in non-cloud instances th

RE: How to do a Data sharding for data in a database table

2015-06-19 Thread Reitzel, Charles
gt;> _problem_ other than "queries are too slow". Let's nail down the >> reason queries are taking a second before jumping into sharding. I've >> just spent too much of my life fixing the wrong thing ;) >> >> It would be useful to see a couple of

RE: How to do a Data sharding for data in a database table

2015-06-19 Thread Reitzel, Charles
two primary queries and some work in middle tier). But this approach is unlikely to work for most cases. -Original Message- From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] Sent: Friday, June 19, 2015 9:52 AM To: solr-user@lucene.apache.org Subject: RE: How to do a Data shard

RE: How to do a Data sharding for data in a database table

2015-06-19 Thread Reitzel, Charles
Hi Wenbin, To me, your instance appears well provisioned. Likewise, your analysis of test vs. production performance makes a lot of sense. Perhaps your time would be well spent tuning the query performance for your app before resorting to sharding? To that end, what do you see when you se

RE: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Reitzel, Charles
Yes. Typically, the content file is used to populate a single field in each document, e.g. "content". Typically, this field is the primary target for searches.Sometimes, additional metadata (title, author, etc.) can be extracted from the source files. But the idea remains the same: the t

RE: Show all fields in Solr highlighting output

2015-06-11 Thread Reitzel, Charles
Moving the highlighted snippets to the main response is a bad thing for some applications. E.g. if you do any sorting or searching on the returned fields, you need to use the original values. The same is true if any of the values are used as a key into some other system or table lookup. Spe

RE: The best way to exclude "seen" results from search queries

2015-06-11 Thread Reitzel, Charles
So long as the fields are indexed, I think performance should be ok. Personally, I would also look at using a single document per user with a multi-valued field for recommendation ID. Assuming only a small fraction of all recommendation IDs are ever presented to any single user, this schema wo

RE: The best way to exclude "seen" results from search queries

2015-06-10 Thread Reitzel, Charles
I don't see any way around storing which recommendations have been delivered to each user. Sounds like a separate collection with the unique ID created from the combination of the user ID and the recommendation ID (with the IDs also available as a separate, searchable and returnable fields).

RE: optimal shard assignment with low shard key cardinality using compositeId to enable shard splitting

2015-05-29 Thread Reitzel, Charles
ponse SLAs are. If there's 1 query/second peak load, the sharding overhead for queries is probably not noticeable. If there are 1,000 QPS, then it might be worth it. Measure, measure, measure.. I think your composite ID understanding is fine. Best, Erick On Thu, May 28, 2015 at 1

RE: When is too many fields in "qf" is too many?

2015-05-29 Thread Reitzel, Charles
ndlers, > > some > > >> have 3490 field items in "qf" (that's the most and the qf line > > >> spans > > over > > >> 95,000 characters in solrconfig.xml file) and the least one has > > >> 1341 fields. I'm working on s

RE: optimal shard assignment with low shard key cardinality using compositeId to enable shard splitting

2015-05-28 Thread Reitzel, Charles
We have used a similar sharding strategy for exactly the reasons you say. But we are fairly certain that the # of documents per user ID is < 5000 and, typically, <500. Thus, we think the overhead of distributed searches clearly outweighs the benefits. Would you agree? We have done some l

RE: When is too many fields in "qf" is too many?

2015-05-28 Thread Reitzel, Charles
still have outstanding with using copyField in this way is that it >> could lead to a complete re-indexing of all the data in that view >> when a field is adding / removing from that view. >> >> Thanks >> >> Steve >> >> On Wed, May 27, 2015 at 6:02 P

RE: docValues: Can we apply synonym

2015-05-28 Thread Reitzel, Charles
-user@lucene.apache.org Subject: Re: docValues: Can we apply synonym Ok and what synonym processor you is talking about maybe it could help ? With Regards Aman Tandon On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles < charles.reit...@tiaa-cref.org> wrote: > Sorry, my bad. The synon

RE: docValues: Can we apply synonym

2015-05-27 Thread Reitzel, Charles
Sorry, my bad. The synonym processor I mention works differently. It's an extension of the EDisMax query processor and doesn't require field level synonym configs. -Original Message----- From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] Sent: Wednesday, May 27, 20

RE: docValues: Can we apply synonym

2015-05-27 Thread Reitzel, Charles
The problem here is that the docValues works only with primitives data type only like String, int, etc So how could we apply synonym on primitive data type. With Regards Aman Tandon On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles < charles.reit...@tiaa-cref.org> wrote: > Is there

RE: When is too many fields in "qf" is too many?

2015-05-27 Thread Reitzel, Charles
One request handler per view? I think if you are able to make the actual view in use for the current request a single value (vs. all views that the user could use over time), it would keep the qf list down to a manageable size (e.g. specified within the request handler XML). Not sure if th

RE: docValues: Can we apply synonym

2015-05-27 Thread Reitzel, Charles
Is there any reason you cannot apply the synonyms at query time? Applying synonyms at indexing time has problems, e.g. polluting the term frequency for synonyms added, preventing distance queries, ... Since city names often have multiple terms, e.g. New York, Den Hague, etc., I would recommen

RE: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-12 Thread Reitzel, Charles
Fwiw, we ended up preferring the 4.x spellcheck approach. For starters, it is supported by SolrJ ... :-) But more importantly, we wanted a mix of both terms and field values in our suggestions. We found the Suggester component doesn't do that. We also weren't interested in matching in the

RE: indexing java byte code in classes / jars

2015-05-08 Thread Reitzel, Charles
There are a number of reverse compilers for Java. Some are quite good and very detailed, so long as the byte code has not been deliberately obfuscated. Of course the original sources would be better for picking up comments. But, then you'd need a java parser (the compiler front end), of wh

RE: Spurious _version_ conflict?

2015-04-17 Thread Reitzel, Charles
what does the _version_ in the *RAW* response from your /get or /select request return when you use something like curl that does *NO* processing of the response data? : Date: Fri, 17 Apr 2015 15:37:21 + : From: "Reitzel, Charles" : Reply-To: solr-user@lucene.apache.org

RE: Spurious _version_ conflict?

2015-04-17 Thread Reitzel, Charles
ing place. But it looks to be upstream from my client code (which is seeing the same thing as the Admin UI). I am running 4.10.3 on 64-bit Windows desktop. Java is jdk1.7.0_67, 64-bit. -----Original Message- From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] Sent: Friday, April

RE: Spurious _version_ conflict?

2015-04-17 Thread Reitzel, Charles
Thanks for getting back. Something like that crossed my mind but I checked the values on the way into SolrJ SolrInputDocument match the values printed in the Admin Query interface and they both match the expected value in the error message exactly. Besides the difference is only in the last f

Spurious _version_ conflict?

2015-04-16 Thread Reitzel, Charles
Hi All, I have been getting intermittent 409 conflict responses to updates. I check and double-check that the _version_ I am passing in matches the current value in the index. I notice that the expected value in the error message matches both what I pass in and the index contents. But the ac

RE: _version_ returned from /update?

2015-04-15 Thread Reitzel, Charles
Hey, that's great! I'll give it a try. File under, "never hurts to ask" ... :-) -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Wednesday, April 15, 2015 5:15 PM To: solr-user@lucene.apache.org Subject: Re: _version_ returned from /update? : In the i

_version_ returned from /update?

2015-04-15 Thread Reitzel, Charles
Hi All, In the interests of minimizing round-trips to the database, is there any way to get the added/changed _version_ values returned from /update? Or do you always have to do a fresh get? Yes, I am using optimistic concurrency. No, I am not using atomic updates (yet). Has anyone tried t

RE: SOLR searching

2015-04-09 Thread Reitzel, Charles
The other question is how often do prices change? Is it much more often than other product info (or per-user-and-product info)? These are use cases for things like CurrencyField and ExternalFileField.The thing to know about these is that CurrencyField values are searchable, while External

RE: Solr 3.6, Highlight and multi words?

2015-04-01 Thread Reitzel, Charles
now) but concerning hl.simple.pre hl.simple.post I can define only one color no ? in the sample solrconfig.xml there are several color, How can I tell to solr to use these color instead of hl.simple.pre/post ? Le 01/04/2015 20:58, Reitzel,

RE: Solr 3.6, Highlight and multi words?

2015-04-01 Thread Reitzel, Charles
mounting element has a compensation section which is made of an elastic material, particularly a<em>plastic</em> material. The compensation section So my last question is why I haven't instead having colored ? How can I tell to solr to use the colored ? Thank

RE: Solr 3.6, Highlight and multi words?

2015-04-01 Thread Reitzel, Charles
Haven't used Solr 3.x in a long time. But with 4.10.x, I haven't had any trouble with multiple terms. I'd look at a few things. 1. Do you have a typo in your query? Shouldn't it be q=aben:(plastic and bicycle)?

RE: How to find out which fields a search came from

2015-03-31 Thread Reitzel, Charles
Highlighting is the way to go. Note, you have options to make it better suit your application. e.g. You can control the delimiters the highlighter uses. You can also choose from a couple different implementations. We have been able to use the highlight results, as is, to pull data from fie

RE: Solr Unexpected Query Parser Exception

2015-03-30 Thread Reitzel, Charles
Saw that one. Can't remember for certain, but recall the actual syntax error was in a filter query. It could have been a quoting error or a date math error in a range expression. But, either way, the issue was in the fq. Using edismax. hth -Original Message- From: Jack Krupansky [

RE: Structured and Unstructured data indexing in SolrCloud

2015-03-30 Thread Reitzel, Charles
Hi Vijay, The short answer is yes, you can combine almost anything you want into a single collection. But, in addition to working out your queries, you might want work out your data life cycle. In our application, we have comingled the structured and unstructured documents into a single col

RE: Solr TCP layer

2015-03-09 Thread Reitzel, Charles
A couple thoughts: 0. Interesting topic. 1. But perhaps better suited to the dev list. 2. Given the existing architecture, shouldn't we be looking to transport projects, e.g. Jetty, Apache HttpComponents, for support of new socket or even HTTP layer protocols? 3. To the extent such support exists

RE: Combine multiple SOLR Query Results

2015-03-09 Thread Reitzel, Charles
Hi AnilJayanti, You shouldn't need 2 separate solr queries. Just make sure both 'track name' and 'artist name' fields are queried. Solr will rank and sort the results for you. e.q. q=foo&qf=trackName,artistName This is preferable for a number of reasons. I will be faster and simpler. But

RE: SpellCheck component query

2015-03-09 Thread Reitzel, Charles
Hi Ashish, We are doing some very close to what you describe. As Aman says, it requires two solr queries to achieve that result. I.e. you need to build this logic into your application. Solr won't do it for you.In our case, for the second query, we use a faceted results against an ng

RE: Solr query to match document "templates" - sort of a reverse wildcard match

2015-03-09 Thread Reitzel, Charles
Have a look at solr.StopFilterFactory. https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-StopFilter If your place holders (?) are works like and, the, is, to, etc (see lang/stopwords_??.txt), the stop filter is designed to do what you want. It leaves

RE: Check the return of suggestions

2015-03-09 Thread Reitzel, Charles
Hi Alex, It looks like your search term and index are both subject to a stem filter. Is that right? To avoid the default query parser for spellcheck purposes, you might try spellcheck.q=cartouche. But that may not be sufficient if the spellcheck field is also "aggressively" stemmed. I.e.

RE: Validate data Indexed and versioning

2015-03-02 Thread Reitzel, Charles
First, I would invest the largest effort towards developing good test cases and a good test harness for your ETL software itself. If validation in production does encounter errors, it should be considered a bug in your code! So be sure to always add these cases to your test harness. Also, th

RE: Collations are not working fine.

2015-02-25 Thread Reitzel, Charles
t; sort them? > > On Wed, Feb 18, 2015 at 3:53 AM, Reitzel, Charles < > charles.reit...@tiaa-cref.org> wrote: > > > Hi Nitin, > > > > I was trying many different options for a couple different queries. In > > fact, I have collations working ok now with the S

RE: Collations are not working fine.

2015-02-23 Thread Reitzel, Charles
How you patch the suggester to get frequency information in the spellcheck response? It's very good. I also want to do that? On Mon, Feb 16, 2015 at 7:59 PM, Reitzel, Charles < charles.reit...@tiaa-cref.org> wrote: > I have been working with collations the last couple days and I

RE: Collations are not working fine.

2015-02-17 Thread Reitzel, Charles
Will you please send the configuration which you tried. It will help to solve my problem. Have you sorted the collations on hits or frequencies of suggestions? If you did than please assist me. On Mon, Feb 16, 2015 at 7:59 PM, Reitzel, Charles < charles.reit...@tiaa-cref.org> wro

RE: Collations are not working fine.

2015-02-16 Thread Reitzel, Charles
I have been working with collations the last couple days and I kept adding the collation-related parameters until it started working for me. It seems I needed 50. But, I am using the Suggester with the WFSTLookupFactory. Also, I needed to patch the suggester to get frequency information in