Re: string cut-off filter?

2011-08-09 Thread Bernd Fehling
Yes indeed I currently use a workaround with regex filter. Example for limiting to 30 characters: filter class=solr.PatternReplaceFilterFactory pattern=(.{1,30})(.{31,}) replacement=$1 replace=all/ Just thought there might be already a filter. But as Karsten showed it is pretty easy to

Re: PositionIncrement gap and multi-valued fields.

2011-08-09 Thread Marco Martinez
Hi Luis, As far as i know, the position increment gap only affects in some queries, like phrase queries if you use the slop. The position incremente gap does not affect the similarity scoring formula of lucene : score(q,d) =

Saravanan Chinnadurai/Actionimages is out of the office.

2011-08-09 Thread Saravanan . Chinnadurai
I will be out of the office starting 09/08/2011 and will not return until 10/08/2011. Please email to itsta...@actionimages.com for any urgent issues. Action Images is a division of Reuters Limited and your data will therefore be protected in accordance with the Reuters Group Privacy / Data

Possible bug in FastVectorHighlighter

2011-08-09 Thread Massimo Schiavon
In my Solr (3.3) configuration I specified these two params: str name=hl.simple.pre![CDATA[b]]/str str name=hl.simple.post![CDATA[/b]]/str when I do a simple search I obtain correctly highlighted results where matches areenclosed with correct tag. If I do the same request with

Re: ServerSolrException: No such core: collection1

2011-08-09 Thread Shinichiro Abe
Sorry. The jar files needed was insufficient. Regards, Shinichiro Abe On 2011/08/08, at 14:31, Shinichiro Abe wrote: Hi. I use EmbeddedSolrServer.The solrJ indexing code(attached) worked well on Solr1.4 but didn't work on Solr3.3(since 3.1). Do I need to do anything else? Exception:

Re: Possible bug in FastVectorHighlighter

2011-08-09 Thread Jayendra Patil
Try using - str name=hl.tag.pre![CDATA[b]]/str str name=hl.tag.post![CDATA[/b]]/str Regards, Jayendra On Tue, Aug 9, 2011 at 4:46 AM, Massimo Schiavon mschia...@volunia.com wrote: In my Solr (3.3) configuration I specified these two params: str name=hl.simple.pre![CDATA[b]]/str

Problem with DIH: How to map key value pair stored in 1-N relation from a JDBC Source?

2011-08-09 Thread Christian Bordis
Hi! After 1,5 days digging on google, solr wiki, solr 1.4 book (Smiley/Pugh), solr-user mailing list no solution turn up for my problem *sigh*. I use: - solr 3.3 - Date Import Handler 3.3 - JDBC source is MySQL Constrains: - No changes to core database schema - I can only add new views, stored

question about query parsing

2011-08-09 Thread Bernd Fehling
Hi list, while searching with debug on I see strange query parsing: str name=rawquerystringidentifier:ub.uni-bielefeld.de/str str name=querystringidentifier:ub.uni-bielefeld.de/str str name=parsedquery +MultiPhraseQuery(identifier:(ub.uni-bielefeld.de ub) uni bielefeld de) /str str

Trying to index pdf docs - lazy loading error - ClassNotFoundException: solr.extraction.ExtractingRequestHandler

2011-08-09 Thread Rode González
Hi all. I've tried to index pdf documents using the libraries includes in the example distribution of solr 3.3.0. I've copied all the jars includes in /dist and /contrib directories in a common /lib directory and I've included this path to the solrconfig.xml file. The request handler

Re: strip html from data

2011-08-09 Thread Erick Erickson
OK, what does not working mean? You never answered Markus' question: Are you looking at the returned result set or what you've actually indexed? Analyzers are not run on the stored data, only on indexed data. If not working means that your returned results contain the markup, then you're

Re: Multiplexing TokenFilter for multi-language?

2011-08-09 Thread Erick Erickson
The most common way to handle this is to just index to language-specific fields, e.t. text_ex, text_en, text_de. Since you know what language the user is searching in, you can route the queries to the correct set of fields That said, this is an interesting approach. You don't necessarily need

RE: Problem with DIH: How to map key value pair stored in 1-N relation from a JDBC Source?

2011-08-09 Thread Dyer, James
Christian, It looks like you should probably write a Transformer for your DIH script. I assume you have a child entity set up for PriceTable. Add a Transformer to this entity that will look at the value of currency and price, remove these from the row, then add them back in with currency as

Re: Strip special chars like -

2011-08-09 Thread roySolr
Yes, i understand the difference between generateWordParts and catenateWords. But i can't fix my problem with these options, It doesn't fix all the possibilities. -- View this message in context: http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239186.html Sent from the

Re: Strip special chars like -

2011-08-09 Thread Erick Erickson
OK, what are the other possibilities that it doesn't fix? Just saying it won't work without some examples doesn't leave much to go on... Best Erick On Tue, Aug 9, 2011 at 10:41 AM, roySolr royrutten1...@gmail.com wrote: Yes, i understand the difference between generateWordParts and

Re: problem with terms component results ?

2011-08-09 Thread Erick Erickson
The TermsComponent is looking at *indexed* terms that have been passed through the analysis chain. So I suspect you're seeing the results of stemming. WordDelimiterFilterFactory will also break things up, as will other tokenizers/analyzers. If you want your original input you'll need to have a

Re: How to retreive data from mysql table using DataImportHandler?

2011-08-09 Thread Erick Erickson
Please review: http://wiki.apache.org/solr/UsingMailingLists Have you looked at: http://wiki.apache.org/solr/DataImportHandler#Usage_with_RDBMS Best Erick On Tue, Aug 9, 2011 at 7:28 AM, nagarjuna nagarjuna.avul...@gmail.com wrote: Hi everybody ...       pls help me to get the data from mysql

Re: problem with terms component results ?

2011-08-09 Thread Erik Hatcher
Because you've got a stemmer in your analysis chain for those fields. If you want unstemmed terms, remove the stemmer, or copyField to a different field to use for the terms component. Erik On Aug 9, 2011, at 10:20 , Royi Ronen wrote: Hi, I am using the terms component. Many times

Re: Strip special chars like -

2011-08-09 Thread roySolr
Ok, i there are three query possibilities: Manchester-united Manchester united Manchesterunited The original name of the club is manchester-united. generateWordParts will fixes two of these possibilities: Manchester-united = manchester,united I can search for Manchester-united and

Re: Multiplexing TokenFilter for multi-language?

2011-08-09 Thread cnyee
I believe that the FilterFactory is not designed to be called for each instant of field processing. Think of it, that would be terribly inefficient. The instantiated stemmer is meant to be reused as much as possible. Maybe the FilterFactory is called to instantiate a new stemmer in association

Remote backup of Solr index over low-bandwith connection

2011-08-09 Thread Peter Kritikos
Hello, everyone, My company will be using Solr on the server appliance we deliver to our clients. We would like to maintain remote backups of clients' search indexes to avoid rebuilding a large index when an appliance fails. One of our clients backs up their data onto a remote server

Re: Strip special chars like -

2011-08-09 Thread lee carroll
Hi I might be wrong as I've not tried it out to be sure but from the wiki docs: These parameters may be combined in any way. Example of generateWordParts=1 and catenateWords=1: PowerShot - 0:Power, 1:Shot 1:PowerShot (where 0,1,1 are token positions) does that fit the bill ? On 9 August 2011

Re: Remote backup of Solr index over low-bandwith connection

2011-08-09 Thread Jonathan Rochkind
You can use rsync to automatically only transfer the files that have changed. I don't think you'll have to home grow your own 'only transfer the diffs' solution, I think rsync will do that for you. But yes, running an optimization, after many updates/deletes, will generally mean nearly

Re: Handling space variations in queries - matching 'thunderbolt' for query 'thunder bolt'

2011-08-09 Thread Chris Hostetter
: during indexing). However, due to the pre-analysis whitespace tokenization : done by lucene query parser, the reverse is not handled well - document with : string 'thunderbolt' being matched to query 'thunder bolt'. it's not so much pre-analysis whitespace tokenization as it is query parser

Solr repllication oddities

2011-08-09 Thread Dan Pinkard
We've seen a few problems lately, and I'm hoping someone can offer insight on resolving them. We are currently on 1151296 on machines that are definitely not overloaded on mem/CPU/IO/network. 1)When moving from build 1151296 from 1150478 the index format changed, or some other marker

Michigan Information Retrieval Enthusiasts Group Quarterly Meetup - August 17th 2011 - Solr in the Cloud, Erick Erickson

2011-08-09 Thread Provalov, Ivan
Next IR Meetup will be held at Farmington Hills Community Library on August 17, 2011. Please RSVP here: http://www.meetup.com/Michigan-Information-Retrieval-Enthusiasts-Group Thank you, Ivan Provalov

Re: Weighted facet strings

2011-08-09 Thread Chris Hostetter
: Subject: Weighted facet strings First off: a terminology clarification. what you are describing has very little to do with facets. it's true that your category field is a facet of your documents, but in the context of your question, you aren't asking about any facet related features of solr.

Re: Multiple Cores on different machines?

2011-08-09 Thread Chris Hostetter
: A quick question - is it possible to have 2 cores in Solr on two different : machines? your question is a little vague ... like asking is it possible to have to have two betamax VCRs in two different rooms of my house ... sure, if you want ... but why are you asking the question? are you

Re: Query Rewrite

2011-08-09 Thread Chris Hostetter
: then in the CustomQueryParser I iterate over all the arguments adding : each key/value to a Map. I then pass in this to the constructor of a : basically copied ExtendedDismaxQParser (only difference is the added : aliases and the logic to add those to the ExtendedSolrQParser). : : Now, the

Re: extending edismax?

2011-08-09 Thread Chris Hostetter
: E.g. I want to pass the query red shoes as q=shoesfq=color:red. I have : a service that can tell me that in the phrase red shoes the word red is : the color. : : My question is where should I invoke this external service, : : 1) should my search client call the service, form the request and

Re: Solr and External Fields

2011-08-09 Thread Chris Hostetter
: I recently modified the DefaultSolrHighlighter to support external : fields, but is there a way to do this for solr itself? I'm looking to : store a field in an external store and give Solr access to that field. : Where in Solr would I do this? it depends on when/how you want to use that

Re: solr chewing up system swap

2011-08-09 Thread Chris Hostetter
: I have arrived a site where solr is being run under jetty. It is ubuntu 10.04 : i386 hosted on AWS (xen). Our combined solr index size is a mere 21 MB. What : I am seeing that solr is steadily consuming about 150 MB of swap per week : and won't relinquish it until sunspot is restarted. how

edismax, inconsistencies with implicit/explicit AND when used with explicit OR

2011-08-09 Thread Mark juszczec
Hello all We've just switched from the default parser to the edismax parser and a user has noticed some inconsistencies when using implicit/explicit ANDs, ORs and grouping search terms in parenthesis. First, the default query operator is AND. I switched it from OR today. The query:

Cache replication

2011-08-09 Thread arian487
I'm wondering if the caches on all the slaves are replicated across (such as queryResultCache). That is to say, if I hit one of my slaves and cache a result, and I make a search later and that search happens to hit a different slave, will that first cached result be available for use? This is

Re: Strip special chars like -

2011-08-09 Thread Sujit Pal
I have done this using a custom tokenfilter that (among other things) detects hyphenated words and converts it to the 3 variations, using a regex match on the incoming token: (\w+)-(\w+) that runs the following regex transform: s/(\w+)-(\w+)/$1$2__$1 $2/ and then splits by __ and passes the

Re: Cache replication

2011-08-09 Thread Erick Erickson
No, caches are not replicated across slaves. You really have two choices: 1 use some sort of sticky addressing whereby requests from the same client are sent to the same slave. 2 don't worry about it G. Examine your cache stats to see how often your caches, particularly your

Re: Strip special chars like -

2011-08-09 Thread Erick Erickson
That's not what I get. This is for Solr 3.3, but there's no reason that I know of that other versions should give different results. Here's the field def form the 3.3 example, this is just the standard implementation. fieldType name=text_en_splitting class=solr.TextField

unique terms and multi-valued fields

2011-08-09 Thread Kevin Osborn
Please verify my understanding. I have a field called category and it has a value computers. If I use this same field and value for all of my documents, it is really only stored on disk once because category:computers is a unique term. Is this correct? But, what about multi-valued fields. So,

Re: Cache replication

2011-08-09 Thread arian487
Thanks for the informative response. I'll consider using the 'sticky' addressing as you suggested. The reason cache is so important for me is because I'm actually doing more processing after the query component to come up with my query result and I want to avoid that processing as much as

Re: Multiple Cores on different machines?

2011-08-09 Thread Satish Talim
Chris, sorry for not being clear when I asked the question. We are still experimenting with Solr. We have 2 tables in Postgres that we want to migrate to Solr for faster query results. One index is of static data and the other related index would be of data that changes once or twice a month.

Re: Multiple Cores on different machines?

2011-08-09 Thread Shashi Kant
Betamax VCR? really ? :-) On Tue, Aug 9, 2011 at 3:38 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : A quick question - is it possible to have 2 cores in Solr on two different : machines? your question is a little vague ... like asking is it possible to have to have two betamax

RE: Multiple Cores on different machines?

2011-08-09 Thread Jonathan Rochkind
tables. Others are suggesting 2 separate indexes on 2 different machines and using SOLRs capacity to combine cores and generate a third index that denormalizes the tables for us. What capability is that, exaclty? I think you may be imagining it. Solr does have some capability to distribute

Re: Cache replication

2011-08-09 Thread Paul Libbrecht
Arian, I've been doing results post-processing in some versions of the ActiveMath server and it has been the wrong choice as much as possible. Maybe this is not what you do, but the biggest flaw was that the post-processing was eliminating or adding results (for insiders of ActiveMath: