This is a job for a custom query function.
-Original Message-
From: unique.jim...@gmail.com [mailto:unique.jim...@gmail.com]
Sent: Tuesday, July 28, 2015 2:35 AM
To: solr-user@lucene.apache.org
Subject: Quantity wise price searching in Apache SOLR
Currently I am working on e-commerce web
Indexing over a WAN will be slow, limited by the bandwidth of the pipe.
I think you will be better served to move the data in bulk to the same LAN as
your target solr instances.I would suggest ZIP+scp ... or your favorite
file system replication/synchronization tool.
It's true, if you are u
the distribution
of data.
-Original Message-----
From: Reitzel, Charles
Sent: Tuesday, July 21, 2015 9:55 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr Cloud: Duplicate documents in multiple shards
When are you generating the UUID exactly? If you set the unique ID field on
an "updat
When are you generating the UUID exactly? If you set the unique ID field on
an "update", and it contains a new UUID, you have effectively created a new
document. Just a thought.
-Original Message-
From: mesenthil1 [mailto:senthilkumar.arumu...@viacomcontractor.com]
Sent: Tuesday, Ju
as suggested
https://cwiki.apache.org/confluence/display/solr/Making+and+Restoring+Backu
ps+of+SolrCores
Trying to find if there are any better ways than above option.
Thanks
Raja
On 7/15/15, 10:23 AM, "Reitzel, Charles"
wrote:
>Since they want explicitly search within a given "vers
%20%22Hadoop%22%29%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC
-Original Message-
From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
Sent: Wednesday, July 15, 2015 11:24 AM
To: solr-user@lucene.apache.org
Subject: RE: copying data from one collection to an
Since they want explicitly search within a given "version" of the data, this
seems like a textbook application for collection aliases.
You could have N public collection names: current_stuff, previous_stuff_1,
previous_stuff_2, ... At any given time, these will be aliased to reference
the
A common approach to this problem is to include the spellcheck component and,
if there are corrections, include a "Did you mean ..." link in the results page.
-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org]
Sent: Wednesday, July 15, 2015 10:36 AM
To: solr-user@lu
And, to answer your other question, yes, you can turn off auto-warming.If
your instance is dedicated to this client task, it may serve no purpose or be
actually counter-productive.
In the past, I worked on a Solr-based application that committed frequently
under application control (vs. aut
Try facet.mincount=1. It will still apply to range facets.
-Original Message-
From: JoeSmith [mailto:fidw...@gmail.com]
Sent: Monday, July 13, 2015 5:56 PM
To: solr-user
Subject: Range Facet queries for date ranges with with non-constant gaps
I am trying to do a range facet query f
Indeed, it is built into the HTML Forms specification that any query parameter
may be repeated any number of times. If your ESB tool didn't support this, it
would be very broken. My expectation is that it does and a bit more debugging
and/or research into the product will yield results.
ware of the all
characters in Unicode.
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory
The ICU classes require additional jars to be loaded into Solr before they will
work.
Thanks,
Shawn
-Original Message-
From: Reitzel, Charles [mailto:charl
-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Thursday, July 09, 2015 9:55 AM
To: solr-user@lucene.apache.org
Subject: Re: Do I really need copyField when my app can do the copy?
On 7/9/2015 2:35 AM, Nir Barel wrote:
> I wants to add a question regarding copyF
That should be fixable. In a past life, I generated a perfect hash to fold
case for Unicode in a locale-neutral manner and it was very fast. If I
remember right, there are only about 2500 Unicode characters that can be case
folded at all. So the generated, collision-free hash function was v
One of the uses of synonyms is to replace a mis-spelled query term with a
correctly spelled value.
The "2 sided" synonym file format allows you to control which values "survive"
into the actual query.
lawyer, attorney, ambulance chaser, atorney, lowyor => lawyer, attorney
I am not aware, howev
On 6/29/2015 2:48 PM, Reitzel, Charles wrote:
> > I take your point about shards and segments being different things. I
> > understand that the hash ranges per segment are not kept in ZK. I guess I
> > wish they were.
> >
> > In this regard, I liked Mongodb,
I see what you mean. Many thanks for the details.
-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
Sent: Monday, June 29, 2015 6:36 PM
To: solr-user@lucene.apache.org
Subject: Re: optimize status
Reitzel, Charles wrote:
> Question, Toke: in your "i
say
2.5?) be a good best practice?
-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
Sent: Monday, June 29, 2015 3:56 PM
To: solr-user@lucene.apache.org
Subject: Re: optimize status
Reitzel, Charles wrote:
> Is there really a good reason to consolidate down t
ing would be to ensure that document route
ranges are handled properly, and I don't think the value used for routing has
anything to do with what segment they happen to be stored into.
-Original Message-
From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
Sent: Monday, June 29
Is there really a good reason to consolidate down to a single segment?
Any incremental query performance benefit is tiny compared to the loss of
managability.
I.e. shouldn't segments _always_ be kept small enough to facilitate
re-balancing data across shards? Even in non-cloud instances th
gt;> _problem_ other than "queries are too slow". Let's nail down the
>> reason queries are taking a second before jumping into sharding. I've
>> just spent too much of my life fixing the wrong thing ;)
>>
>> It would be useful to see a couple of
two primary queries and some
work in middle tier). But this approach is unlikely to work for most cases.
-Original Message-
From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
Sent: Friday, June 19, 2015 9:52 AM
To: solr-user@lucene.apache.org
Subject: RE: How to do a Data shard
Hi Wenbin,
To me, your instance appears well provisioned. Likewise, your analysis of test
vs. production performance makes a lot of sense. Perhaps your time would be
well spent tuning the query performance for your app before resorting to
sharding?
To that end, what do you see when you se
Yes. Typically, the content file is used to populate a single field in each
document, e.g. "content". Typically, this field is the primary target for
searches.Sometimes, additional metadata (title, author, etc.) can be
extracted from the source files. But the idea remains the same: the t
Moving the highlighted snippets to the main response is a bad thing for some
applications. E.g. if you do any sorting or searching on the returned fields,
you need to use the original values. The same is true if any of the values
are used as a key into some other system or table lookup. Spe
So long as the fields are indexed, I think performance should be ok.
Personally, I would also look at using a single document per user with a
multi-valued field for recommendation ID. Assuming only a small fraction of
all recommendation IDs are ever presented to any single user, this schema wo
I don't see any way around storing which recommendations have been delivered to
each user. Sounds like a separate collection with the unique ID created from
the combination of the user ID and the recommendation ID (with the IDs also
available as a separate, searchable and returnable fields).
ponse SLAs are.
If there's 1 query/second peak load, the sharding overhead for queries is
probably not noticeable. If there are 1,000 QPS, then it might be worth it.
Measure, measure, measure..
I think your composite ID understanding is fine.
Best,
Erick
On Thu, May 28, 2015 at 1
ndlers,
> > some
> > >> have 3490 field items in "qf" (that's the most and the qf line
> > >> spans
> > over
> > >> 95,000 characters in solrconfig.xml file) and the least one has
> > >> 1341 fields. I'm working on s
We have used a similar sharding strategy for exactly the reasons you say. But
we are fairly certain that the # of documents per user ID is < 5000 and,
typically, <500. Thus, we think the overhead of distributed searches clearly
outweighs the benefits. Would you agree? We have done some l
still have outstanding with using copyField in this way is that it
>> could lead to a complete re-indexing of all the data in that view
>> when a field is adding / removing from that view.
>>
>> Thanks
>>
>> Steve
>>
>> On Wed, May 27, 2015 at 6:02 P
-user@lucene.apache.org
Subject: Re: docValues: Can we apply synonym
Ok and what synonym processor you is talking about maybe it could help ?
With Regards
Aman Tandon
On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles <
charles.reit...@tiaa-cref.org> wrote:
> Sorry, my bad. The synon
Sorry, my bad. The synonym processor I mention works differently. It's an
extension of the EDisMax query processor and doesn't require field level
synonym configs.
-Original Message-----
From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
Sent: Wednesday, May 27, 20
The problem here is that the docValues works only with primitives data type
only like String, int, etc So how could we apply synonym on primitive data type.
With Regards
Aman Tandon
On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles <
charles.reit...@tiaa-cref.org> wrote:
> Is there
One request handler per view?
I think if you are able to make the actual view in use for the current request
a single value (vs. all views that the user could use over time), it would keep
the qf list down to a manageable size (e.g. specified within the request
handler XML). Not sure if th
Is there any reason you cannot apply the synonyms at query time? Applying
synonyms at indexing time has problems, e.g. polluting the term frequency for
synonyms added, preventing distance queries, ...
Since city names often have multiple terms, e.g. New York, Den Hague, etc., I
would recommen
Fwiw, we ended up preferring the 4.x spellcheck approach. For starters, it is
supported by SolrJ ... :-)
But more importantly, we wanted a mix of both terms and field values in our
suggestions. We found the Suggester component doesn't do that. We also
weren't interested in matching in the
There are a number of reverse compilers for Java. Some are quite good and
very detailed, so long as the byte code has not been deliberately obfuscated.
Of course the original sources would be better for picking up comments. But,
then you'd need a java parser (the compiler front end), of wh
what does the _version_ in the *RAW* response from your /get or /select request
return when you use something like curl that does *NO* processing of the
response data?
: Date: Fri, 17 Apr 2015 15:37:21 +
: From: "Reitzel, Charles"
: Reply-To: solr-user@lucene.apache.org
ing place. But it looks to be upstream
from my client code (which is seeing the same thing as the Admin UI).
I am running 4.10.3 on 64-bit Windows desktop. Java is jdk1.7.0_67, 64-bit.
-----Original Message-
From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
Sent: Friday, April
Thanks for getting back. Something like that crossed my mind but I checked
the values on the way into SolrJ SolrInputDocument match the values printed in
the Admin Query interface and they both match the expected value in the error
message exactly.
Besides the difference is only in the last f
Hi All,
I have been getting intermittent 409 conflict responses to updates. I check
and double-check that the _version_ I am passing in matches the current value
in the index.
I notice that the expected value in the error message matches both what I pass
in and the index contents. But the ac
Hey, that's great! I'll give it a try.
File under, "never hurts to ask" ... :-)
-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: Wednesday, April 15, 2015 5:15 PM
To: solr-user@lucene.apache.org
Subject: Re: _version_ returned from /update?
: In the i
Hi All,
In the interests of minimizing round-trips to the database, is there any way to
get the added/changed _version_ values returned from /update? Or do you
always have to do a fresh get?
Yes, I am using optimistic concurrency. No, I am not using atomic updates
(yet).
Has anyone tried t
The other question is how often do prices change? Is it much more often than
other product info (or per-user-and-product info)?
These are use cases for things like CurrencyField and ExternalFileField.The
thing to know about these is that CurrencyField values are searchable, while
External
now)
but concerning hl.simple.pre hl.simple.post I can define only one color no ?
in the sample solrconfig.xml there are several color,
How can I tell to solr to use these color instead of hl.simple.pre/post ?
Le 01/04/2015 20:58, Reitzel,
mounting element has a compensation
section which is made of an elastic material, particularly
a<em>plastic</em> material. The compensation section
So my last question is why I haven't instead having colored ?
How can I tell to solr to use the colored ?
Thank
Haven't used Solr 3.x in a long time. But with 4.10.x, I haven't had any
trouble with multiple terms. I'd look at a few things.
1. Do you have a typo in your query? Shouldn't it be q=aben:(plastic and
bicycle)?
Highlighting is the way to go. Note, you have options to make it better suit
your application. e.g. You can control the delimiters the highlighter uses.
You can also choose from a couple different implementations. We have been
able to use the highlight results, as is, to pull data from fie
Saw that one. Can't remember for certain, but recall the actual syntax error
was in a filter query. It could have been a quoting error or a date math
error in a range expression. But, either way, the issue was in the fq. Using
edismax. hth
-Original Message-
From: Jack Krupansky [
Hi Vijay,
The short answer is yes, you can combine almost anything you want into a single
collection. But, in addition to working out your queries, you might want work
out your data life cycle.
In our application, we have comingled the structured and unstructured documents
into a single col
A couple thoughts:
0. Interesting topic.
1. But perhaps better suited to the dev list.
2. Given the existing architecture, shouldn't we be looking to transport
projects, e.g. Jetty, Apache HttpComponents, for support of new socket or even
HTTP layer protocols?
3. To the extent such support exists
Hi AnilJayanti,
You shouldn't need 2 separate solr queries. Just make sure both 'track name'
and 'artist name' fields are queried. Solr will rank and sort the results for
you.
e.q. q=foo&qf=trackName,artistName
This is preferable for a number of reasons. I will be faster and simpler.
But
Hi Ashish,
We are doing some very close to what you describe. As Aman says, it requires
two solr queries to achieve that result.
I.e. you need to build this logic into your application. Solr won't do it for
you.In our case, for the second query, we use a faceted results against an
ng
Have a look at solr.StopFilterFactory.
https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-StopFilter
If your place holders (?) are works like and, the, is, to, etc (see
lang/stopwords_??.txt), the stop filter is designed to do what you want. It
leaves
Hi Alex,
It looks like your search term and index are both subject to a stem filter. Is
that right?
To avoid the default query parser for spellcheck purposes, you might try
spellcheck.q=cartouche. But that may not be sufficient if the spellcheck
field is also "aggressively" stemmed. I.e.
First, I would invest the largest effort towards developing good test cases and
a good test harness for your ETL software itself. If validation in production
does encounter errors, it should be considered a bug in your code! So be sure
to always add these cases to your test harness.
Also, th
t; sort them?
>
> On Wed, Feb 18, 2015 at 3:53 AM, Reitzel, Charles <
> charles.reit...@tiaa-cref.org> wrote:
>
> > Hi Nitin,
> >
> > I was trying many different options for a couple different queries. In
> > fact, I have collations working ok now with the S
How you patch the suggester to get frequency information in the
spellcheck response?
It's very good. I also want to do that?
On Mon, Feb 16, 2015 at 7:59 PM, Reitzel, Charles <
charles.reit...@tiaa-cref.org> wrote:
> I have been working with collations the last couple days and I
Will you please send the configuration which you tried. It
will help to solve my problem. Have you sorted the collations on hits or
frequencies of suggestions? If you did than please assist me.
On Mon, Feb 16, 2015 at 7:59 PM, Reitzel, Charles <
charles.reit...@tiaa-cref.org> wro
I have been working with collations the last couple days and I kept adding the
collation-related parameters until it started working for me. It seems I
needed 50.
But, I am using the Suggester with the WFSTLookupFactory.
Also, I needed to patch the suggester to get frequency information in
61 matches
Mail list logo