Yes indeed I currently use a workaround with regex filter.
Example for limiting to 30 characters:
filter class=solr.PatternReplaceFilterFactory pattern=(.{1,30})(.{31,})
replacement=$1 replace=all/
Just thought there might be already a filter.
But as Karsten showed it is pretty easy to
Hi Luis,
As far as i know, the position increment gap only affects in some queries,
like phrase queries if you use the slop. The position incremente gap does
not affect the similarity scoring formula of lucene :
score(q,d) =
I will be out of the office starting 09/08/2011 and will not return until
10/08/2011.
Please email to itsta...@actionimages.com for any urgent issues.
Action Images is a division of Reuters Limited and your data will therefore be
protected
in accordance with the Reuters Group Privacy / Data
In my Solr (3.3) configuration I specified these two params:
str name=hl.simple.pre![CDATA[b]]/str
str name=hl.simple.post![CDATA[/b]]/str
when I do a simple search I obtain correctly highlighted results where
matches areenclosed with correct tag.
If I do the same request with
Sorry. The jar files needed was insufficient.
Regards,
Shinichiro Abe
On 2011/08/08, at 14:31, Shinichiro Abe wrote:
Hi.
I use EmbeddedSolrServer.The solrJ indexing code(attached) worked well
on Solr1.4 but didn't work on Solr3.3(since 3.1). Do I need to do anything
else?
Exception:
Try using -
str name=hl.tag.pre![CDATA[b]]/str
str name=hl.tag.post![CDATA[/b]]/str
Regards,
Jayendra
On Tue, Aug 9, 2011 at 4:46 AM, Massimo Schiavon mschia...@volunia.com wrote:
In my Solr (3.3) configuration I specified these two params:
str name=hl.simple.pre![CDATA[b]]/str
Hi!
After 1,5 days digging on google, solr wiki, solr 1.4 book (Smiley/Pugh),
solr-user mailing list no solution turn up for my problem *sigh*.
I use:
- solr 3.3
- Date Import Handler 3.3
- JDBC source is MySQL
Constrains:
- No changes to core database schema
- I can only add new views, stored
Hi list,
while searching with debug on I see strange query parsing:
str name=rawquerystringidentifier:ub.uni-bielefeld.de/str
str name=querystringidentifier:ub.uni-bielefeld.de/str
str name=parsedquery
+MultiPhraseQuery(identifier:(ub.uni-bielefeld.de ub) uni bielefeld de)
/str
str
Hi all.
I've tried to index pdf documents using the libraries includes in the
example distribution of solr 3.3.0.
I've copied all the jars includes in /dist and /contrib directories in a
common /lib directory and I've included this path to the solrconfig.xml
file.
The request handler
OK, what does not working mean? You never answered Markus' question:
Are you looking at the returned result set or what you've actually indexed?
Analyzers are not run on the stored data, only on indexed data.
If not working means that your returned results contain the markup, then
you're
The most common way to handle this is to just index to
language-specific fields, e.t. text_ex, text_en, text_de. Since
you know what language the user is searching in, you can
route the queries to the correct set of fields
That said, this is an interesting approach. You don't
necessarily need
Christian,
It looks like you should probably write a Transformer for your DIH script. I
assume you have a child entity set up for PriceTable. Add a Transformer to
this entity that will look at the value of currency and price, remove these
from the row, then add them back in with currency as
Yes, i understand the difference between generateWordParts and catenateWords.
But i can't fix my problem with these options, It doesn't fix all the
possibilities.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239186.html
Sent from the
OK, what are the other possibilities that it doesn't fix? Just saying
it won't work without some examples doesn't leave much to
go on...
Best
Erick
On Tue, Aug 9, 2011 at 10:41 AM, roySolr royrutten1...@gmail.com wrote:
Yes, i understand the difference between generateWordParts and
The TermsComponent is looking at *indexed* terms that have
been passed through the analysis chain. So I suspect you're
seeing the results of stemming.
WordDelimiterFilterFactory will also break things up, as will
other tokenizers/analyzers. If you want your original input
you'll need to have a
Please review:
http://wiki.apache.org/solr/UsingMailingLists
Have you looked at:
http://wiki.apache.org/solr/DataImportHandler#Usage_with_RDBMS
Best
Erick
On Tue, Aug 9, 2011 at 7:28 AM, nagarjuna nagarjuna.avul...@gmail.com wrote:
Hi everybody ...
pls help me to get the data from mysql
Because you've got a stemmer in your analysis chain for those fields. If you
want unstemmed terms, remove the stemmer, or copyField to a different field to
use for the terms component.
Erik
On Aug 9, 2011, at 10:20 , Royi Ronen wrote:
Hi,
I am using the terms component.
Many times
Ok, i there are three query possibilities:
Manchester-united
Manchester united
Manchesterunited
The original name of the club is manchester-united.
generateWordParts will fixes two of these possibilities:
Manchester-united = manchester,united
I can search for Manchester-united and
I believe that the FilterFactory is not designed to be called for each
instant of field processing. Think of it, that would be terribly
inefficient. The instantiated stemmer is meant to be reused as much as
possible. Maybe the FilterFactory is called to instantiate a new stemmer in
association
Hello, everyone,
My company will be using Solr on the server appliance we deliver to our
clients. We would like to maintain remote backups of clients' search
indexes to avoid rebuilding a large index when an appliance fails.
One of our clients backs up their data onto a remote server
Hi I might be wrong as I've not tried it out to be sure but from the wiki docs:
These parameters may be combined in any way.
Example of generateWordParts=1 and catenateWords=1:
PowerShot - 0:Power, 1:Shot 1:PowerShot
(where 0,1,1 are token positions)
does that fit the bill ?
On 9 August 2011
You can use rsync to automatically only transfer the files that have
changed. I don't think you'll have to home grow your own 'only transfer
the diffs' solution, I think rsync will do that for you.
But yes, running an optimization, after many updates/deletes, will
generally mean nearly
: during indexing). However, due to the pre-analysis whitespace tokenization
: done by lucene query parser, the reverse is not handled well - document with
: string 'thunderbolt' being matched to query 'thunder bolt'.
it's not so much pre-analysis whitespace tokenization as it is query
parser
We've seen a few problems lately, and I'm hoping someone can offer insight on
resolving them. We are currently on 1151296 on machines that are definitely not
overloaded on mem/CPU/IO/network.
1)When moving from build 1151296 from 1150478 the index format changed, or
some other marker
Next IR Meetup will be held at Farmington Hills Community Library on August 17,
2011. Please RSVP here:
http://www.meetup.com/Michigan-Information-Retrieval-Enthusiasts-Group
Thank you,
Ivan Provalov
: Subject: Weighted facet strings
First off: a terminology clarification. what you are describing has very
little to do with facets. it's true that your category field is a
facet of your documents, but in the context of your question, you aren't
asking about any facet related features of solr.
: A quick question - is it possible to have 2 cores in Solr on two different
: machines?
your question is a little vague ... like asking is it possible to have to
have two betamax VCRs in two different rooms of my house ... sure, if you
want ... but why are you asking the question? are you
: then in the CustomQueryParser I iterate over all the arguments adding
: each key/value to a Map. I then pass in this to the constructor of a
: basically copied ExtendedDismaxQParser (only difference is the added
: aliases and the logic to add those to the ExtendedSolrQParser).
:
: Now, the
: E.g. I want to pass the query red shoes as q=shoesfq=color:red. I have
: a service that can tell me that in the phrase red shoes the word red is
: the color.
:
: My question is where should I invoke this external service,
:
: 1) should my search client call the service, form the request and
: I recently modified the DefaultSolrHighlighter to support external
: fields, but is there a way to do this for solr itself? I'm looking to
: store a field in an external store and give Solr access to that field.
: Where in Solr would I do this?
it depends on when/how you want to use that
: I have arrived a site where solr is being run under jetty. It is ubuntu 10.04
: i386 hosted on AWS (xen). Our combined solr index size is a mere 21 MB. What
: I am seeing that solr is steadily consuming about 150 MB of swap per week
: and won't relinquish it until sunspot is restarted.
how
Hello all
We've just switched from the default parser to the edismax parser and a user
has noticed some inconsistencies when using implicit/explicit ANDs, ORs and
grouping search terms
in parenthesis.
First, the default query operator is AND. I switched it from OR today.
The query:
I'm wondering if the caches on all the slaves are replicated across (such as
queryResultCache). That is to say, if I hit one of my slaves and cache a
result, and I make a search later and that search happens to hit a different
slave, will that first cached result be available for use?
This is
I have done this using a custom tokenfilter that (among other things)
detects hyphenated words and converts it to the 3 variations, using a
regex match on the incoming token:
(\w+)-(\w+)
that runs the following regex transform:
s/(\w+)-(\w+)/$1$2__$1 $2/
and then splits by __ and passes the
No, caches are not replicated across slaves. You really have
two choices:
1 use some sort of sticky addressing whereby requests
from the same client are sent to the same slave.
2 don't worry about it G. Examine your cache stats
to see how often your caches, particularly your
That's not what I get. This is for Solr 3.3, but there's no
reason that I know of that other versions should give
different results.
Here's the field def form the 3.3 example, this is just
the standard implementation.
fieldType name=text_en_splitting class=solr.TextField
Please verify my understanding. I have a field called category and it has a
value computers. If I use this same field and value for all of my documents,
it is really only stored on disk once because category:computers is a unique
term. Is this correct?
But, what about multi-valued fields. So,
Thanks for the informative response. I'll consider using the 'sticky'
addressing as you suggested. The reason cache is so important for me is
because I'm actually doing more processing after the query component to come
up with my query result and I want to avoid that processing as much as
Chris, sorry for not being clear when I asked the question.
We are still experimenting with Solr. We have 2 tables in Postgres that we
want to migrate to Solr for faster query results. One index is of static
data and the other related index would be of data that changes once or twice
a month.
Betamax VCR? really ? :-)
On Tue, Aug 9, 2011 at 3:38 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:
: A quick question - is it possible to have 2 cores in Solr on two
different
: machines?
your question is a little vague ... like asking is it possible to have to
have two betamax
tables. Others are suggesting 2 separate indexes on 2 different machines and
using SOLRs capacity to combine cores and generate a third index that
denormalizes the tables for us.
What capability is that, exaclty? I think you may be imagining it.
Solr does have some capability to distribute
Arian,
I've been doing results post-processing in some versions of the ActiveMath
server and it has been the wrong choice as much as possible.
Maybe this is not what you do, but the biggest flaw was that the
post-processing was eliminating or adding results (for insiders of ActiveMath:
42 matches
Mail list logo