On Tue, May 28, 2013 at 2:20 PM, Upayavira wrote:
> The schema provides Solr with a description of what it will find in the
> Lucene indexes. If you, for example, changed a string field to an
> integer in your schema, that'd mess things up bigtime. I recently had to
> upgrade a date field from the
Hi All
I am trying to sort the results as per last updated date. My url looks as
below.
*&fq=last_updated_date:[NOW-60DAY TO NOW]&fq=experience:[0 TO
588]&fq=salary:[0 TO 500] OR
salary:0&fq=-bundle:job&fq=-bundle:panel&fq=-bundle:page&fq=-bundle:article&spellcheck=true&q=+java
+sip&fl=id,entity_i
Hi there,
Checked out branch_4x and applied the latest patch
LUCENE-2899-current.patch however I ran into 2 problems
Followed the wiki page instruction and set up a field with this type aiming
to keep nouns and verbs and do a facet on the field
==
Hi Koji,
Great news ! I am looking forward for this OpenNLP toolkit.
Thanks a lot !
Rajesh
On Wed, May 29, 2013 at 4:12 AM, Koji Sekiguchi wrote:
> Hi Rajesh,
>
> Thanks!
> I'm planning to open an NLP tool kit for Lucene, and the tool kit will
> include
> the following synonym library.
>
> k
Hi everyone,
I've been searching about how to configure the SpellCheckerComponent in
Solr 4.0 to support suggestion queries based on s subset of the
configured fields in schema.xml. Let's say the spell checking is
configured to use these 4 fields:
I'd like to know if there's any possib
Thanks a lot for all your input.
I will go ahead and store as strings.
Best Regards
Kamal
On Wed, May 29, 2013 at 9:00 AM, Jack Krupansky wrote:
> As a general rule with Solr, do a proof of concept implementation with the
> simplest sensible approach and only start piling on complexity if
> per
As a general rule with Solr, do a proof of concept implementation with the
simplest sensible approach and only start piling on complexity if
performance or capacity become problematic. If the data is naturally a
string, use a string. If it is naturally a number, use a number. Use
whatever the q
Store them as a string token in multivalued fields. Solr/Lucene will
do the necessary mapping and lookups. That's what you are paying it
for. :-) That way you can easily facet and so on.
You may need to change some parts of your architecture later, but you
seem to be over-thinking it too early in
Thanks Alex.
I am in dilemma how do I store the skill sets with solr index as a string
token or as an integer. To give little background -
As of today, each skill I assign a unique id (take as auto increment field
in mysql table), and the store them against user id in a separate table.
That's how
Better still start here: http://en.wikipedia.org/wiki/Inverted_index
http://nlp.stanford.edu/IR-book/html/htmledition/a-first-take-at-building-an-inverted-index-1.html
And there are several books on search engines and related algorithms.
On Tue, May 28, 2013 at 10:41 PM, Alexandre Rafalovitch
And you need to know this why?
If you are really trying to understand how this all works under the
covers, you need to look at Lucene's inverted index as a start. Start
here:
http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/codecs/lucene42/package-summary.html#package_description
Might
Dear All
I have a basic doubt how the data is stored in apache solr indexes.
Say I have thousand registered users in my site. Lets say I want to store
skills of each users as a multivalued string index.
Say
user 1 has skill set - Java, MySql, PHP
user 2 has skill set - C++, MySql, PHP
user 3 has
You can store them and then use different analyzer chains on it (stored,
doesn't need to be indexed)
I'd probably use the collector pattern
se.search(new MatchAllDocsQuery(), new Collector() {
private AtomicReader reader;
private int i = 0;
@Override
public boolean a
I can't quite apply SolrMeter to my problem, so I did something of my
own. The brains of the operation are the function here.
This feeds a ConcurrentUpdateSolrServer about 95 documents, each about
10mb, and 'threads' is six. Yet Solr just barely uses more than one
core.
private long doIterati
Thanks Steve, that worked for branch_4x
-Original Message-
From: Steve Rowe [mailto:sar...@gmail.com]
Sent: Friday, 24 May 2013 3:19 a.m.
To: solr-user@lucene.apache.org
Subject: Re: OPENNLP current patch compiling problem for 4.x branch
Hi Patrick,
I think you should check out and app
Volume of data:
1 log insert every 30 seconds, queries done sporadically asynchronously every
so often at a much lower frequency every few days
Also the majority of the requests are indeed going to be within a splice of
time (typically hours or at most a few days)
Type of queries:
Keyword or te
: I've created a custom ValueSourceParser and ValueSource that retrieve the
: availability information from a MySQL database. An example query is as
: follows.
:
:
http://localhost:8983/solr/collection1/select?q=restaurant_id:*&fl=*,available:availability(2013-05-23,
: 2, 1700, 2359)
:
: This r
: As erik alluded to in his response, you should be able to configure an
: "appended" fq using the "switch" QParserPlugin to get something like what you
are
: describing, by taking advantage of the "default" behavior.
I've updated the javadocs with 2 additiona examples inspired by this
thread
: I want to use Solr for an academical research. One step of my purpose is I
: want to store tokens in a file (I will store it at a database later) and I
you could absolutely write a java program which access the analyzers
directly nad does whatever you want with the results of analysing a piece
: This is kind of the approach used by elastic search , if I'm not using
: solrcloud will I be able to use shard aliasing, also with this approach
: how would replication work, is it even needed?
you haven't said much about hte volume of data you expect to deal with,
nor have you really explai
Great. And I did verify that the field order cannot be guaranteed by a
single CloneFieldUpdateProcessorFactory with multiple field names - the
underlying code iterates over the input values, checks the field selector
for membership and then immediately adds to the output, so changing the
input
Hi Rajesh,
Thanks!
I'm planning to open an NLP tool kit for Lucene, and the tool kit will include
the following synonym library.
koji
(13/05/28 14:12), Rajesh Nikam wrote:
Hello Koji,
This is seems pretty useful post on how to create synonyms file.
Thanks a lot for sharing this !
Have you sh
Hoss, you read my mind Thanks a lott for your awesome
explanation! You rock!!!
--
View this message in context:
http://lucene.472066.n3.nabble.com/SOLR-4-3-0-How-to-make-fq-optional-tp4066592p4066630.html
Sent from the Solr - User mailing list archive at Nabble.com.
You may wish to explore the concept of using the Result Grouping (Field
Collapsing) feature in which your paragraphs are individual documents that
share a field to group them by (the ID of the document/book/article/whatever).
http://wiki.apache.org/solr/FieldCollapsing
This will net you absolut
Thanks, Alexandre.
But I need to know in which paragraph is matched the request. I need it
because paragraphs are binded to some extra data that I need to output on
result page. So I need to know paragraphs is'd. How to bind such attribute
to multivalued field?
--
View this message in context:
: David, I felt like there should be a flag with which we can either throw the
: error message or do nothing in case of bad inputs..
As erik alluded to in his response, you should be able to configure an
"appended" fq using the "switch" QParserPlugin to get something like what you
are
describ
Thanks Jack, That fixed it and guarantees the order.
As far as I can tell SOLR cloud 4.2.1 needs a uniquekey defined in its schema,
or I get an exception.
SolrCore Initialization Failures
* testCloud2_shard1_replica1:
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Q
Erik,
I am trying to enable / disable a part of fq based on a particular value
passed from the query.
For Ex: If I have the value for the keyword where in the query then I would
like to enable this fq, else just ignore it..
select?where="New york,NY"
Enable only when where has some value. (I g
You have mentioned Pivot Facets, but have you looked at the Path Hierarchy
Tokenizer Factory:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PathHierarchyTokenizerFactory
This matches your use case, as best as I understand it.
Jason
On May 28, 2013, at 12:47 PM, vibhoreng04
I imagine the new "switch" query parser could help here somehow.
Erik
On May 28, 2013, at 16:43, "David Smiley (@MITRE.org)"
wrote:
> Your client needs to know to submit the proper filter query conditionally.
> It's not really a spatial issue, and I disagree with the idea to make bbox
>
Thanks for looking at this.
> What are the QTimes for the 0fq,1fq,2fq,4fq & 4fq cases with spellcheck
> entirely turned off? Is it about (or a little more than) half the total when
> maxCollationTries=1 ?
With spellcheck off I get 8ms for 4fq query.
> Also, with the varying # of fq's, how m
David, I felt like there should be a flag with which we can either throw the
error message or do nothing in case of bad inputs..
--
View this message in context:
http://lucene.472066.n3.nabble.com/SOLR-4-3-0-How-to-make-fq-optional-tp4066592p4066610.html
Sent from the Solr - User mailing list
We'll have a blog for the book. We hope to have a first
raw/rough/partial/draft published as an e-book in maybe 10 days to 2 weeks.
As soon as we get that process under control, we'll start the blog. I'll
keep your email on file and keep you posted.
-- Jack Krupansky
-Original Message
Your client needs to know to submit the proper filter query conditionally.
It's not really a spatial issue, and I disagree with the idea to make bbox
(and all other query parsers for that matter) do nothing if not given an
expected input.
~ David
bbarani wrote
> I am using the SOLR geospatial c
Andy,
What are the QTimes for the 0fq,1fq,2fq,4fq & 4fq cases with spellcheck
entirely turned off? Is it about (or a little more than) half the total when
maxCollationTries=1 ? Also, with the varying # of fq's, how many collation
tries does it take to get 10 collations?
Possibly, a better wa
The TL;DR response: Try this:
userid_s
id
docid_s
id
id
--
That will assure that the userid gets processed before the docid.
I'll have to review the contract for CloneFieldUpdateProcessorFactory to see
what is or ain't guaranteed when there are multiple inpu
I am using the SOLR geospatial capabilities for filtering the results based
on the particular radius (something like below).. I have added the below fq
query in solrconfig and passing the latitude and longitude information
dynamically..
select?q=firstName:john&fq={!bbox%20sfield=geo%20pt=40.279392
I thought the same, but that doesn't seem to be the case.
-Original Message-
From: Jack Krupansky
To: solr-user
Sent: Tue, May 28, 2013 3:32 pm
Subject: Re: Solr Composite Unique key from existing fields in schema
The order in the ID should be purely dependent on the order of
Hi Erick and Markus,
Any Idea on this ? can we resolve this by group by queries?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Nested-Facets-and-distributed-shard-system-tp4065847p4066583.html
Sent from the Solr - User mailing list archive at Nabble.com.
The order in the ID should be purely dependent on the order of the field
names in the processor configuration:
docid_s
userid_s
-- Jack Krupansky
-Original Message-
From: Rishi Easwaran
Sent: Tuesday, May 28, 2013 2:54 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Composite Un
I'm working on using spellcheck for giving suggestions, and collations
are giving me good results, but they turn out to be very slow if
my original query has any FQs in it. We can do 100 maxCollationTries
in no time at all, but if there are FQs in the query, things get
very slow. As maxCollationT
At first glance unless I missed something hourglass will definitely not work
for our use-case which just involves real time inserts of new log data and no
appends at all. However I would like to examine the guts of hourglass to see
if we can customize it for our use-case.
> From: arafa...@gmai
Jack,
No sure if this is the correct behaviour.
I set up updateRequestorPorcess chain as mentioned below, but looks like the
compositeId that is generated is based on input order.
For example:
If my input comes in as
1
12345
I get the following compositeId1-12345.
If I reverse the input
Hi,
Try to pass URL encode value(%23) for # .
Thanks.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Query-syntax-error-Cannot-parse-tp4066560p4066566.html
Sent from the Solr - User mailing list archive at Nabble.com.
I'd definitely prefer the spiral bound as well. E-books are great and your
draft version seems very reasonably priced (aka I would definitely get it).
Really looking forward to this. Is there a separate mailing list / etc. for the
book for those who would like to receive updates on the status o
Thanks Jack, looks like that will do the trick from me. I will try it out.
-Original Message-
From: Jack Krupansky
To: solr-user
Sent: Tue, May 28, 2013 12:07 pm
Subject: Re: Solr Composite Unique key from existing fields in schema
You can do this by combining the builtin upd
Hi,
When I try run this query,
http://localhost:8983/solr/coreA/select?q=source_id:(7D1FFB# OR 7D1FFB)
city:ES, I have the error below:
400
1
org.apache.solr.search.SyntaxError: Cannot parse 'source_id:(7D1FFB':
Encountered "" at line 1, column 43. Was expecting one of: ...
... ... "+" .
Jack,
It is worth considering something like https://leanpub.com/ . That way
people can pre-pay for the result and enjoy (however 'draft'-y)
results earlier.
In terms of reference vs narrative, my strong desire would have been
for the narrative part. The problem always seems to be around
understa
I copied those accented words directly from web pages in Google Chrome on a
Windows PC, but then copied them to a text file as well, so their encoding
is dubious. You will have to make sure to use accented characters for UTF-8
in your environment. And... make sure that you are using an editor th
You can do this by combining the builtin update processors.
Add this to your solrconfig:
docid_s
userid_s
id
id
--
Add documents such as:
curl
"http://localhost:8983/solr/update?commit=true&update.chain=composite-id"; \
-H 'Content-type:application/json' -d '
[{"
Absolutely. Use "location_rpt" in the example schema. Do *not* use
LatLonType, which doesn't support multiValued data.
~ David Smiley
On 5/28/13 8:02 AM, "Spadez" wrote:
> currently have an item which gets imported into solr, lets call it a book
>entry. Well that has a single location associa
Hello Steve
Thanks for your reply
I don't want to upgrade solr 4
so your suggestion will be as below
---
you should instead convert these HTML character entities yourself to the
characters they represent (e.g. "é" -> "é") before sending the
docs to Solr.
---
Ple
Eric,
Thank you for the explanation.
My problem was that allowing the docs with the same unique ids to be
present in the multiple shards in a "normal" situation,
makes it impossible to estimate the number of shards needed for an index
with a "really large" number of docs.
Thanks,
Val
On 05
On 5/28/2013 12:31 AM, Kristian Rink wrote:
(a) The usual tutorials outline something like
WHERE LASTMODIFIED > '${dih.last_index_time}
[snip]
(b) I see that "last_index_time" returns a particularly fixed format.
In our database, with a modestly more complex SELECT, we also could
figure out
Hello Jack
Thanks for your reply..
I have tried to add below contents to solr, as you suggest
-
doc-1
Hola Mañana en le Café, habla el Académie
française!
--
BUT I am getting below error
--
I:\Program
Files\EasyPHP-
The cleanest is to do this from the outside.
Alternatively, it will perhaps work to populate your uniqueKey in a custom
UpdateProcessor. You can try.
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
28. mai 2013 kl. 17:12 skrev Rishi Easwaran :
> Hi All,
>
> Historic
Hi All,
Historically we have used a single field in our schema as a uniqueKey.
docid
Wanted to change this to a composite key something like
userid-docid.
I know I can auto generate compositekey at document insert time, using custom
code to generate a new field, but wanted to know if th
:)
-- Jack Krupansky
-Original Message-
From: Alexandre Rafalovitch
Sent: Tuesday, May 28, 2013 10:41 AM
To: solr-user@lucene.apache.org
Subject: Re: Paging with all Hits
I feel that the strength of the Jack's rant is somewhat unprovoked by
the original question. I also feel that the
Indeed, I commented all entries for cache in solrconfig, but solrmeter shows me
cache for field cache type, Now I know why.
Thanks Shalin,
--
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Tuesday, May 28, 2013 at 3:53 PM, Shalin Shekhar Mangar wrote:
> Edit the solr
Edit the solrconfig.xml and remove/comment , ,
. Note that some caches such as FieldCache (created for
sorting/faceting on demand) cannot be disabled.
On Tue, May 28, 2013 at 8:10 PM, yriveiro wrote:
> Hi,
>
> How I can disable all caches that solr use?
>
> Regards
>
> /Yago
>
>
>
> -
> Bes
I feel that the strength of the Jack's rant is somewhat unprovoked by
the original question. I also feel that the rant itself is worth being
printed and framed :-)
But more than anything else, I feel that supposedly-known limitations
of Solr/Lucene are not actually exposed all that much. Certainl
Hi,
How I can disable all caches that solr use?
Regards
/Yago
-
Best regards
--
View this message in context:
http://lucene.472066.n3.nabble.com/Disable-all-caches-in-solr-tp4066517.html
Sent from the Solr - User mailing list archive at Nabble.com.
solr-user-unsubscribe
2013/5/28 Michał Matulka
> Thanks for your responses, I must admit that after hours of trying I
> made some mistakes.
> So the most problematic phrase will now be:
> "4nSolution Inc." which cannot be found using query:
>
> name:4nSolution
>
> or even
>
> name:4nSolution
Hi,
I have added the missing WIKI pages for
https://wiki.apache.org/solr/Solr4.1
https://wiki.apache.org/solr/Solr4.2
https://wiki.apache.org/solr/Solr4.3
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Dynamic and multi-valued fields are both powerful but dangerous features.
Yes, there offer wonderful capabilities - if used within moderation, but
expecting that they are "get out of jail free / go past go as many times as
you want" cards to ignore the limits of Solr and do anything you want is
The technical answer: Undefined and not guaranteed.
Sure, you can experiment and see what the effects "happen" to be in any
given release, and maybe they don't tend to change (too much) between most
releases, but there is no guarantee that any given "change schema but keep
existing data withou
Thanks for your responses, I must admit
that after hours of trying I made some mistakes.
So the most problematic phrase will now be:
"4nSolution Inc." which cannot be found using query:
name:4nSolution
or even
name:4nSolution Inc.
This sounds like a bug. I'll open an issue. Thanks!
On Tue, May 28, 2013 at 2:29 PM, AlexeyK wrote:
> The cluster state problem reported above is not an issue - it was caused by
> our own code.
> Speaking about the update log - i have noticed a strange behavior
> concerning
> the replay. The re
Hmmm, with 4.x I get much different behavior than you're
describing, what version of Solr are you using?
Besides Alex's comments, try adding &debug=query to the url and see what comes
out from the query parser.
A quick glance at the code shows that DefaultAnalyzer is used, which doesn't do
any an
currently have an item which gets imported into solr, lets call it a book
entry. Well that has a single location associated with it as a coordinate
and location name but I am now finding out that a single entry may actually
need to be associated with more than one location, for example "New York"
Hmmm, that's the second time somebody's had that problem. It's
assigned to me now anyway, thanks for creating it!
Erick
On Mon, May 27, 2013 at 10:11 AM, André Widhani
wrote:
> I created SOLR-4862 ... I found no way to assign the ticket to somebody
> though (I guess it is is under "Workflow", b
What does analyzer screen say in the Web AdminUI when you try to do that?
Also, what are the tokens stored in the field (also in Web AdminUI).
I think it is very strange to have TextField without a tokenizer chain.
Maybe you get a standard one assigned by default, but I don't know what the
standa
On Tue, May 28, 2013, at 10:21 AM, Dotan Cohen wrote:
> When adding or removing a text field to/from the schema and then
> restarting Solr, what exactly happens to extant documents? Is the
> schema only consulted when Solr writes a document, therefore extant
> documents are unaffected?
>
> Consi
When adding or removing a text field to/from the schema and then
restarting Solr, what exactly happens to extant documents? Is the
schema only consulted when Solr writes a document, therefore extant
documents are unaffected?
Considering that Solr supports dynamic fields, my experimentation with
re
Hello,
i indexed some monographs with solr. Within each document a have a
multi-valued field where i store the paragraphs. When i search for a
specific term within the monographs i get the whole monograph as a
result object. The single hits can be accessed via the highlight
component. The prevents
The cluster state problem reported above is not an issue - it was caused by
our own code.
Speaking about the update log - i have noticed a strange behavior concerning
the replay. The replay is *supposed* to be done for a predefined number of
log entries, but actually it is always done for the whole
You also get some smooth UI stuff "for free"
F
On Tue, May 28, 2013 at 10:58 AM, Fergus McDowall
wrote:
> Hi Richa
>
> Solrstrap is probably the best way to go if you just want to get up a PoC
> as fast as possible. Solrstrap requires no installation of middleware, you
> just add in the address
Hi Richa
Solrstrap is probably the best way to go if you just want to get up a PoC
as fast as possible. Solrstrap requires no installation of middleware, you
just add in the address of your solr server and open the file in your
browser.
Regards
Fergus
On Wed, Apr 24, 2013 at 5:23 PM, richa wr
Switching from single to multivalued shouldn't cause your index to break
(but your app might not like it).
Do you have a deduplication issue, or does each message have a unique
ID? You might be able to use the DedupUpdateProcessorFactory to prevent
updates to an existing message getting into the i
Hello,
I've got following problem. I have a text type in my schema and a field
"name" of that type.
That field contains a data, there is, for example, record that has
"300letters" as name.
Now field type definition:
And, of course, field definition:
yes, that's all - there are no tokenize
I've created https://issues.apache.org/jira/browse/SOLR-4866
Elodie
Le 07.05.2013 18:19, Chris Hostetter a écrit :
: I am using the Lucene FieldCache with SolrCloud and I have "insane" instances
: with messages like:
FWIW: I'm the one that named the result of these "sanity checks"
"FieldCacheI
81 matches
Mail list logo