Hi,
We are planning to set up *2* *High-Memory Quadruple Extra Large Instance *as
master and slave for our multicore solr setup which has more than 200
cores spread between a couple of webapps on a single JVM on *AWS*
All indexing [via a queue will go to master ] . One Slave Server will
Will create an issue for that.
It's probably a good idea to recreate the shards and index all documents
again or is there a way to fix this?
Thx!
On 03/04/2013 07:13 PM, Mark Miller wrote:
Yeah, you need numShards from 4.1 up or you are in a mode were you have to
distrib updates yourself.
Hello,
I have a folder contains about 50 word doc files. Is there a way to index
them in one shot? The only experience that I have on indexing is with DIH.
Is it possible to provide a link to a tutorial or info on how to do the
above task (data-config and schema examples)?
Many thanks in
Hello,
Look towards Tika. It can handle these MS Word file formats:
http://tika.apache.org/1.3/formats.html#Microsoft_Office_document_formats
Solr Wiki:
http://wiki.apache.org/solr/ExtractingRequestHandler
I don't have a link for a tutorial with example schemas.
Dmitry
On Tue, Mar 5, 2013
Hello Bruno,
Am 01.03.2013 12:43, schrieb Bruno Mannina:
Dear Users,
Actually we use Solr3.6/Tomcat6 on a specific port like 1234.
We connected our software to the solr on this specific port,
but several users have a lot of problem to open this specific port on
their network company.
I
Hello,
I'd like to know if there is some specific way, in Solr 3.6.1, to have
something like an homogeneous dispersion of documents in a bbox.
My use case is I a have a request returning let's say 1000 documents in a
bbox (they all have the same solr score), and I want only 50 documents, but
not
Hi,
I'm about to take a look at the source to debug this but any input
appreciated. I'm trying to cluster mlt results. Clustering works, MLT
works, but MLT query with clustering does not. My query handler is
requestHandler name=/mlt_clustering class=solr.MoreLikeThisHandler
lst
Probably, the bulk indexing feature is not implemented for tika processing,
but you can easily compile a script yourself:
Extract in a loop over the word files in a directory:
curl
http://localhost:8983/solr/update/extract?literal.id=doc5defaultField=text;
--data-binary @tutorial.html -H
Hi Sujatha,
If I understand correctly, you will have only 1 slave (and 1 master), so
that's not really a HA architecture. You could manually turn master into
slave, but that's going to mean some down time...
Otis
--
Solr ElasticSearch Support
http://sematext.com/
On Tue, Mar 5, 2013 at
code seems to indicate MLTHandler doesn't support components...best route
here seems to create my own handler.
On Tue, Mar 5, 2013 at 10:43 AM, Dale McDiarmid d...@ravn.co.uk wrote:
Hi,
I'm about to take a look at the source to debug this but any input
appreciated. I'm trying to cluster mlt
You can use more like this as a component, but you don't get info about
what terms made the documents similar.
If you don't need that stuff, then just have MLT and clustering as
components within a standard SearchHandler.
Upayavira
On Tue, Mar 5, 2013, at 11:53 AM, Dale McDiarmid wrote:
code
Hello,
I am trying to index some FS folder tree.
Spent 2 days finding what could be the problem - got nothing :) There are
not so much examples on indexing File System.
In the logs I cant find any exceptions why it does not process the info
Data import configuration and debug response are
If i go down that route, all query parameters will apply to the search
results, and MLT will be calculated on those search results.
Clustering will also be calculated on search results... not on the MLT
results.
On Tue, Mar 5, 2013 at 11:56 AM, Upayavira u...@odoko.co.uk wrote:
You can use
Hi Otis,
Since currently we are planning for only one slave due to cost
considerations, can we have an ELB fronting the master and slave for HA.
1. All index requests will go to the master .
2. Slave replicates from master .
3. Search request can go either to master /slave via ELB.
Hi Jack, I've updated the gist:
https://gist.github.com/caarlos0/4ad53583fb2b30ef0bec
I'm give you the wrong browser tab result yesterday, sorry.
The schema seems right to me. I have a field name BoosterField, with the
synonyms and etc enabled in its fieldtype...
can't figure out what's wrong.
On 5 March 2013 15:08, Syao Work syao.w...@gmail.com wrote:
Hello,
I am trying to index some FS folder tree.
Spent 2 days finding what could be the problem - got nothing :) There are
not so much examples on indexing File System.
In the logs I cant find any exceptions why it does not process
And if I need to index file name, path, size and/or mime?
On Tue, Mar 5, 2013 at 2:45 PM, Gora Mohanty g...@mimirtech.com wrote:
On 5 March 2013 15:08, Syao Work syao.w...@gmail.com wrote:
Hello,
I am trying to index some FS folder tree.
Spent 2 days finding what could be the problem -
The fix to throw an exception if the user incorrectly attempts to perform a
phrase query on a field that does not have position info was made as part of
LUCENE-2370 - Reintegrate flex branch into trunk. Unfortunately, there is no
discussion there of that specific change, which was bundled as
In that last example you're doing a wildcard query (java*), and by default that
does not run (all of) the analysis chain you have defined.
If you need to expand synonyms for wildcarded terms like this, you'll need to
define a multiterm analysis chain. See here for more details:
Thanks for your answer Erik!
I changed the FieldType to:
https://gist.github.com/caarlos0/89b7c0484b154550bc63
And got a 400 error with message analyzer returned too many terms for
multiTerm term: java.
I also tried to change the query to do not use wildcard, but it still
ignoring the
You're getting the 400 error because you are using the keyword tokenizer
which means that there will be lots of terms (really just raw strings
that begin with java. That simply isn't going to work. Stick with the
standard tokenizer.
You have way too much going on here that is clearly way
Hi,
I'm hitting a brick wall trying to diagnose this issue. We have a field,
configured like this:
fieldType name=class class=solr.TextField
positionIncrementGap=100
analyzer
tokenizer
Hi Jack,
Thanks for your answer, and yes, I'm pretty confused.
The thing is: This problem is going on in one of my job applications, and I
must fix it.
Can you give me some tips or links that I should read to clear my mind and
understand it?
Thanks in advance.
On Tue, Mar 5, 2013 at 10:48 AM,
Is there anything wrong with set up?
On Tue, Mar 5, 2013 at 5:43 PM, Sujatha Arun suja.a...@gmail.com wrote:
Hi Otis,
Since currently we are planning for only one slave due to cost
considerations, can we have an ELB fronting the master and slave for HA.
1. All index requests will go to
Hello,
I spent some more time on this and used Mikhail's suggestions of which
classes would need to be implemented.
1. Since we use SpanQuery family, we would need to modify the SpanScorer to
collect some stats over matched spans.
2. DelegatingCollector takes Scorer class via setScorer() method.
Hi,
That my be fine. I'd use the sticky session setting in ELB to avoid
having the same user's query hit both master and slave, say while paging
through results, and risking seeing inconsistent results. THis will also
help with cache utilization. This said, this is not a recommended setup.
Hi,
I'm not sure what the cause of this is, but:
1. no need to keep increasing max warming searchers. 2 is often enough
2. using cold searcher doesn't sound right - you should warm them up
3. solr 4.1 is out, consider using it
4. solr 4.2 may be out this month, consider trying the snapshot
5. no
Follow the advice you've already been given: 1) switch from the keyword
tokenizer to the standard tokenizer, 2) get rid of regex replace (for now),
and otherwise simplify your analyzers as much as possible. Then run a test
with a simple, consistent example, and review the debugQuery and parsed
OK, thanks. I will do this and try to make this thing work.
Thank you very much for your help.
On Tue, Mar 5, 2013 at 11:34 AM, Jack Krupansky j...@basetechnology.comwrote:
Follow the advice you've already been given: 1) switch from the keyword
tokenizer to the standard tokenizer, 2) get rid
Maybe you made changes to the analyzer but then failed to fully reindex your
data. I mean, it sounds like your index still contains terms that had been
tokenized by the standard tokenizer.
-- Jack Krupansky
-Original Message-
From: John, Phil (CSS)
Sent: Tuesday, March 05, 2013 8:53
Hi everyone!
I'm trying to develop a central index, I installed Solr and I reach the screen
that I attach. But the problem is that I don't know how to continue since this
point, I wanted to develop an app in php which use Solr, but I don't know how,
anyone that can help me maybe with a tutorial
Hi,
See http://lucene.apache.org/solr/tutorial.html :)
Otis
--
Solr ElasticSearch Support
http://sematext.com/
On Tue, Mar 5, 2013 at 9:52 AM, Álvaro Vargas Quezada al...@outlook.comwrote:
Hi everyone!
I'm trying to develop a central index, I installed Solr and I reach the
screen that
On 5 March 2013 18:22, Syao Work syao.w...@gmail.com wrote:
And if I need to index file name, path, size and/or mime?
[...]
You would need to create separate entities for each field that
you need to index. The referenced Wiki page on DIH has
other examples of configurations with multiple
Thanks for the quick answer!
I have read that tutorial, but I have a problem and I forgot to tell it, I'm
using Windows (it suc** I know but it is for my enterprise), do you know any
tutorial or way to implements this?
Date: Tue, 5 Mar 2013 09:59:17 -0500
Subject: Re: Building a central index
That's not great, yes
Maybe you can attempt something with Solr and when you hit specific
problems ask here. That's going to work much better than asking for
general help.
Otis
--
Solr ElasticSearch Support
http://sematext.com/
On Tue, Mar 5, 2013 at 10:20 AM, Álvaro Vargas Quezada
You should not have any issues starting with Solr on Windows side either.
You may have more issues on the php side, but there seem to be at least
some support for Solr there too: https://packagist.org/search/?tags=solr
Once you get going, you may have some issues running Solr as a service,
etc.
If your index is on EBS, you'll see big iowait percentages when merges happen.
I'm not sure what that's going to do to your master's ability to
service requests. You should test.
Alternatively, you might figure out the size of machine you need to
index vs. the size of machine you need to service
Can you send an example?
On Tue, Mar 5, 2013 at 5:11 PM, Gora Mohanty g...@mimirtech.com wrote:
On 5 March 2013 18:22, Syao Work syao.w...@gmail.com wrote:
And if I need to index file name, path, size and/or mime?
[...]
You would need to create separate entities for each field that
you
Would Solr's post.jar work for you? It has a directory recurse option. The
usage/help output is pasted below.
Here's what should work for you: java -Dauto -Drecursive -jar post.jar
/some/folder
Erik
exampledocs java -jar post.jar --help
SimplePostTool version 1.5
Usage: java
Hi Alvaro,
I agree with Otis Alexandre (esp. Windows + PHP!). However, there are plenty
of people using Solr PHP out there very successfully. There's another good
package at http://code.google.com/p/solr-php-client/ which is easy to implement
and has some example usage.
Regards,
DQ
There's also a very full featured PHP front-end to Solr. It's historically a
bit library-centric, but I imagine that it is general purpose enough to be
useful to get started. VUFind: http://vufind.org/
Erik
On Mar 5, 2013, at 10:56 , David Quarterman wrote:
Hi Alvaro,
I agree
Agreed, PHP and Solr are an excellent combination. I'm using Solr 3.6 + PHP
(Symfony2 + NelmioSolariumBundle + Solarium) and getting excellent results.
Even solarium as a PHP library is great, right now it lack's of solr4 support,
but for solr 3.6 it's great.
- Mensaje original -
De:
I use Solarium as a PHP library too, and I would greatly recommend it.
2013/3/5 Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu
Agreed, PHP and Solr are an excellent combination. I'm using Solr 3.6 +
PHP (Symfony2 + NelmioSolariumBundle + Solarium) and getting excellent
results. Even
I've changed my solr-4.1.0.war file to use log4j, now every instance of solr I
have on tomcat produces logs to /logs/solr.log
In tomcat/solr_app1/WEB-INF/classes/log4j.properties I have a variable to set
the path for the log file
log4j.appender.FILE.File=${solr.logs.home}/solr.log which is set
On 3/4/2013 6:52 PM, Erick Erickson wrote:
No folding doesn't apply to punctuation, only a set of accents, circumflex,
etc. It essentially just removes all of the diacritics and folds the
letters into their unaccented counterparts
I get that it would fold an accented character into the
I'm doing a search for prod and would assume it would pull back matches for
product, production, etc. but I get zero hits. Any ideas?
Here is my field type:
fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer
Hi
I am using timestamp field configured in the schema, so this way:
field name=timestamp type=date indexed=true stored=true
default=NOW multiValued=false/
when I've checked the new data, I see datetime value have one hour less
than current date.
I though it was problem on java
Your assumption is wrong. Solr and Lucene match entire words.
You can use wildcards, but you need to be aware of the performance issues.
If there words are related parts of speech, like singular and plural, you can
use a stemmer to index a root form.
You can also configure synonyms at index
Thank you!
-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org]
Sent: Tuesday, March 05, 2013 11:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Unable to match partial word
Your assumption is wrong. Solr and Lucene match entire words.
You can use wildcards, but
: when I've checked the new data, I see datetime value have one hour less than
: current date.
Please note the documentation about DateField (and TrieDateField which
extends it)...
https://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/schema/DateField.html
The response format is
Hi Otis,Michael,
Thanks for your input and suggestions .
Yes, we were considering the sticky session for pagination and we are not
planning for having index on EBS
I would like to understand why its not the recommended approach, can you
please explain?
Till now we were having a single
Hmm; weird. It looks right. Does it work without the sort? -- i.e. does the
filter work? Are there more interesting looking error messages output by
Solr?
Rakudten wrote
Hello!
I´m trying to sort by geodist() distance, but it seems that I can´t:
*The query:*
And alternatively, http://yonik.com/solr/getting-started/
- Mark
On Mar 5, 2013, at 6:59 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote:
Hi,
See http://lucene.apache.org/solr/tutorial.html :)
Otis
--
Solr ElasticSearch Support
http://sematext.com/
On Tue, Mar 5,
Something like this.
On Tue, Mar 5, 2013 at 6:16 PM, Dmitry Kan solrexp...@gmail.com wrote:
Hello,
I spent some more time on this and used Mikhail's suggestions of which
classes would need to be implemented.
1. Since we use SpanQuery family, we would need to modify the SpanScorer to
Without the sort it works perfectly, and there are no more error messages,
just the one I copypasted, :-(
El 05/03/2013 19:05, David Smiley (@MITRE.org) dsmi...@mitre.org
escribió:
Hmm; weird. It looks right. Does it work without the sort? -- i.e. does
the
filter work? Are there more
You could also consider using EdgeNGramFilterFactory at index time, which
can index all or some of the prefixes for each term, so that a query of
prod would find product, production, etc.
See:
I upgrade one of my slaves by replacing the solr.war, all other slaves and
master were still 4.0. When I started to monitor it w/ SPM I noticed that
the request rate was way up while the request count was way down. I've
since put back the solr.war for 4.0 and the slave has returned to normal.
1) which version of solr are you using?
2) what is the field fieldtype for geolocation
2) can you try changing your query to q={!func}geodist() to verify that the
function works at all?
: Date: Tue, 5 Mar 2013 19:30:33 +0100
: From: Luis Cappa Banda luisca...@gmail.com
: Reply-To:
: I get that it would fold an accented character into the non-accented
: character, that's a prime reason why I use it ... but it's taking the accent
: as a standalone character (like ` and ¨) and just getting rid of it entirely.
: That seems a little odd.
Isn't that part of the point though? to
Hello,
I'm trying to set up Solr with a multi core configuration but I'm running
into troubles starting using start.jar.
Specifically, running java -jar start.jar inside of the example directory
works fine. However, I've created a new directory some place else with the
following:
solr.xml
On 3/5/2013 2:17 PM, JW West wrote:
Hello,
I'm trying to set up Solr with a multi core configuration but I'm running
into troubles starting using start.jar.
Specifically, running java -jar start.jar inside of the example directory
works fine. However, I've created a new directory some place
On Mar 5, 2013, at 3:50 PM, Chris Hostetter hossman_luc...@fucit.org wrote:
: I get that it would fold an accented character into the non-accented
: character, that's a prime reason why I use it ... but it's taking the accent
: as a standalone character (like ` and ¨) and just getting rid of it
Hi,
Check out core/conf/solrcore.properties
http://wiki.apache.org/solr/SolrConfigXml#System_property_substitution
You could put in your core name there. But I seem to recall that the variable
${core) may be pre-filled for you, or perhaps that's just in the scope of
solr.xml?
Or perhaps you
Solr 3x had a master/slave architecture which meant that indexing did not
happen in the same process as querying, in fact normally not even on the
same machine. The querier only needed to copy down snapshots of the new
index files and commit them. Great isolation for maximum query performance
On Mar 5, 2013, at 3:44 PM, Mike Schultz mike.schu...@gmail.com wrote:
Solr 3x had a master/slave architecture which meant that indexing did not
happen in the same process as querying, in fact normally not even on the
same machine. The querier only needed to copy down snapshots of the new
4.1 turns on stored field compression by default, perhaps what's happening
here is that you're seeing the spike when you fetch your very large
document and it gets uncompressed? Just a shot in the dark.
But you could test it by turning off compression...
That said, I shouldn't think that
1) that's pretty much it. But the number of segments will change
all the time based on your merge policy. Great visualizatin
of segment merging from Mike McCandless here:
http://www.youtube.com/watch?v=YOklKW9LJNY
2) What do you mean a failure with the tlog? Unfortunately
there's
What's f.{!ex.? Do you mean
fq={ex...?
Best
Erick
On Tue, Mar 5, 2013 at 9:15 AM, Giorgi Jvaridze
giorgi.jvari...@gmail.comwrote:
Hi all,
I want to use date range facet and I want to allow user to select several
facet values.
So I added date range facet with 'ex' LocalParam
:dynamicField name=stamp_* type=string indexed=false
: stored=false multiValued=true/
Take a look at IgnoreFieldUpdateProcessorFactory...
https://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/IgnoreFieldUpdateProcessorFactory.html
Thanks Hoss .. Is this available in 4.0 ?
On Tue, Mar 5, 2013 at 5:14 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:
:dynamicField name=stamp_* type=string indexed=false
: stored=false multiValued=true/
Take a look at IgnoreFieldUpdateProcessorFactory...
Hi,
In earlier lucene version it merges segements periodically
according to merge policy, when it reached merge time, indexing
request may take longer time to finish (in my test it may delay
10-30 seconds, depending on indexed data size).
I read solr 3.6 - 4.1 doc and we have entries in
Mark - just added https://issues.apache.org/jira/browse/SOLR-4532 for
whipping post-analyzed fields.
For Mike, sounds like he should just stick to master-slave with 4.x for
nowalthough I see what he is saying - what Mike is after could be
thought of as SolrCloud with very non-RT replication
Hello Richard,
Did you see anything in the logs?
What did other metrics look like? I'l look at system metrics like disk IO
and network IO, CPU, and also JVM/GC metrics first. Any sudden changes in
those metrics could point you in the right direction.
Otis
--
Solr ElasticSearch Support
Hello,
This is not recommended because people typically don't want the load from
indexing affect queries/user experience. If your numbers are low, then
this may not be a big deal. If you already need to create a core on 2
machines, creating it on 3 doesn't seem a big deal. There is a slight
|
I am getting the following error when I try to use a stats.facet for a date
field. Anyone know how to fix this?
str name=msgInvalid Date String:' #1;#0;#0;#0;#5;oT$#0;'/str
||I have checked the values of the date and they are all fine.
I am not sure where this is coming from.
The really
Thanks Otis . Yes true but considering that the Indexing is via a queue,
there would actually be minimal load on the machine. And we are planning to
replicate this set up via adding more machines when the server reaches
about 80% capacity for adding more cores.
Regards,
Sujatha
On Wed, Mar 6,
Hi,
I am new to PayloadTermQuery usage and found it working for simple matches
from the example given @ Search Hub.
As with Lucene-4.1, I couldn't find any API to support Fuzzy Query inside
PayloadTermQuery.
Can you help me in understanding why there is restriction on Term
specification rather
Hi,
I am new to PayloadTermQuery usage and found it working for simple matches
from the example given @ Search Hub.
As with Lucene-4.1, I couldn't find any API to support Fuzzy Query inside
PayloadTermQuery.
Can you help me in understanding why there is restriction on Term
specification rather
I just upgraded from solr3 to solr4, and I wiped the previous work and
reloaded 500,000 documents.
I see in solr that I loaded the documents, and from the console, if I do a
query *:* I see documents returned.
I copied a single word from the text of the query results I got from *:*
but any query
You may simply need to set the default value of the df parameter in the
/select request handler in solrconfig.xml to be your default query field
name if it is not text.
-- Jack Krupansky
-Original Message-
From: David Parks
Sent: Wednesday, March 06, 2013 1:26 AM
To:
You should probably be looking at which Analyzer you used in solr version
3.x and which one you are using in solr version 4.x.
If there is any change in that you may have to do either of the following:
- Do a full-import so that documents are created according to your new
schema
- Do a
Good though, thanks for the quick reply too.
Seems that this is still set to my unique ID field:
requestHandler name=/select class=solr.SearchHandler
!-- default values for query parameters can be specified, these
will be overridden by parameters in the request
--
lst
Hi Chris thank you for replying. My content field in the schema is
stored=true and indexed=false because I am copying the content field
in text field which is by default indexed=true.
I was having a query that I am able to search in the html documents I had
fed to the solr, but as the results
All but the unique ID field use the out-of-the-box default text_en_splitting
field type, this copied over from v3 to v4 without change as far as I know.
I've done the import from scratch (deleted the solr data directory and
re-imported and committed).
fieldType name=text_en_splitting
Oops, I didn't include the full XML there, hopefully this formats ok.
fieldType name=text_en_splitting class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=trueanalyzer
type=indextokenizer class=solr.WhitespaceTokenizerFactory/!-- in this
example, we will only use synonyms
Ah, I think I see the issue, in the debug results it's only searching the id
field, which is the unique ID, that must have gotten changed in the upgrade. In
fact I think I might have had a misconfiguration in the 3.x version here. Can I
set it to query multiple fields by default? I tried a
Hello,
when i set term vector on, it will return information of all the words in
document.
i have highlighting on also.
i want to get term vector information of only those words which are there in
highlighting fragment.
how to do that?
--
View this message in context:
thanks Cris
I've used this configuration to my timestamp field and it's works
field name=timestamp type=date indexed=true stored=true
default=NOW+1HOUR multiValued=false/
Anyway, I would like to know possible configuration of TZ parameter.
When you speak clients can specify a TZ param,
88 matches
Mail list logo