2010-10-07 Thread Scott Yeadon
Thanks. Not sure what the value should be (assume it is the servlet name, but is there a default servlet name for term vectors? - the docs don't really say much, so any guidance useful). It also looks like using the ModifiableParams returns only a single offset for each term i.e. if tf 1

Re: Solr UIMA integration

2010-10-07 Thread maheshkumar
Hi Tommaso, Thanks a lot i am able index the content and extract the entities has mentioned by you. I have made the xml content like this add doc field name=referenceEntity.xml/field field name=textSenator Dick Durbin (D-IL) Chicago , March 3,2007./field field name=titleEntity

Re: Query slop vs. phrase slop

2010-10-07 Thread Ahmet Arslan
phrase slop is about proximity search. ps (Phrase Slop) affects boosting, if you play with ps value, numFound and result set do not change. But the order of result set change. This is about the phrase query that is

Re: Invalid boolean value for query with exclamation

2010-10-07 Thread Ahmet Arslan
Our tests show we cannot query for !Fast and find the document. I've just taken a look and Solr is coming back with Invalid boolean value: !Fast It's a valid query for a valid value. I appreciate that exclamation is a not operator. But how does one get around this scenario? You can

Re: Invalid boolean value for query with exclamation

2010-10-07 Thread Markus Jelsma
On Thu, 7 Oct 2010 06:55:14 -0400, Allistair Crossley wrote: Hi, Quick one ... we have some documents that have punctuation in their indexed title, e.g. !Fast Note the leading exclamation. Our tests show we cannot query for !Fast and find the document. I've just taken a

Very slow queries

2010-10-07 Thread Christos Constantinou
Hello everyone, All of a sudden, I am experiencing some very slow queries with solr. I have 13GB of indexed documents, each averaging 50-100kb. They have an id key, so I expect to be getting results really fast if I execute id:7cd6cb99fd239c1d743a51bb85a48f790f4a6d3c as the query with no other

Re: Strategy for re-indexing

2010-10-07 Thread Shawn Heisey
On 10/6/2010 10:49 AM, Allistair Crossley wrote: Hi, I was interested in gaining some insight into how you guys schedule updates for your Solr index (I have a single index). Right now during development I have added deltaQuery specifications to data import entities to control the number of

slow inserts (updatecsv) on solr

2010-10-07 Thread Sharma, Raghvendra
Hi, I am running my instance on an old Pentium D (two cores) with 3 GB RAM on Ubuntu 64 bit server. My schema is a mix of various data types from int, float, double and string. I am using uuid as my unique key, and my schema is pretty wide, 232 columns to be exact. The average load speed

Re: Query slop vs. phrase slop

2010-10-07 Thread David Boxenhorn
Thank you very much! Forgive me, but I'm still having some trouble understanding: qs = the maximum number of words apart the match can be ps = ? - something that affects boosting, but how? On Thu, Oct 7, 2010 at 12:57 PM, Ahmet Arslan wrote: phrase slop is about proximity

Re: Query slop vs. phrase slop

2010-10-07 Thread Ahmet Arslan
ps = ? - something that affects boosting, but how? Lets say your query is apache solr. (without quotation marks) Lets say these three documents contains all of these words and returned. 1-) solr is built on the top of apache lucene. 2-) apache solr is fast, mature and popular. 3-) solr is

access control for spellcheck suggestions?

2010-10-07 Thread Peter Wolanin
We have a content access control system that works well for the actual search results, but we see that the spellcheck suggestions include words that are not within the set of documents the current user is allowed to access. Does anyone have an approach to this problem for Solr 1.4.x? Anything

TikaEntityProcessor and metadata

2010-10-07 Thread Peter Blokland
hi, I'm using Solr to index document both through a combination of DataImportHandler/TikaEntityProcessor and Solr's ExtractingRequestHandler. The latter gives me the option of dynamically mapping metadata to fields using uprefix='attr_' in the configuration. Is it possible to do the same thing

RE: access control for spellcheck suggestions?

2010-10-07 Thread Dyer, James
Look at SOLR-2010 which has patches for 1.4.1 and trunk. It works with the spellcheck collate functionality and ensures that collations are returned only if they can result in hits if requeried (it tests each collation with any fq you put on the original query). This would effectively prevent

Re: Query slop vs. phrase slop

2010-10-07 Thread David Boxenhorn
Got it! Thanks a lot. On Thu, Oct 7, 2010 at 3:00 PM, Ahmet Arslan wrote: ps = ? - something that affects boosting, but how? Lets say your query is apache solr. (without quotation marks) Lets say these three documents contains all of these words and returned. 1-) solr

Index time boosting is not working with boosting value in document level

2010-10-07 Thread Shanmugavel SRD
We are having 10 documents (all 10 documents with name_s=john) with boosting value 10.0 in doc level out of 100 documents. DataImportHandler is used to index the documents in xml. We gave omitNorms=false in a field called text and having schema.xml configured as below. Default query field is

CoreContainer Usage

2010-10-07 Thread Amit Nithian
I am trying to understand the multicore setup of Solr more and saw that SolrCore.getCore is deprecated in favor of CoreContainer.getCore(name). How can I get a reference to the CoreContainer for I assume it's been created somewhere in Solr and is it possible for one core to get access to another

Re: Query slop vs. phrase slop

2010-10-07 Thread Jonathan Rochkind
What you said in your own quoted message is correct. qs is slop applied to phrases explicitly in the q with double quotes. ps is slop applied to the phrases created from the entire query for evaluating pf boosts. qs will (potentially) change your result set. ps will only (potentially) change

SOLR 1.4.1 to SOLR 1.3 issues (backward compatability issue)

2010-10-07 Thread Shanmugavel SRD
When I tried to roll back from SOLR 1.4.1 to SOLR 1.3, I am unable to view the admin page alone rest of the functionality are fine. I removed the index folder before reverting back to SOLR 1.3. Does any one have answer for this? -- View this message in context:

subquery with stopwords

2010-10-07 Thread Rodrigo Rezende
I'm not sure but it seems to me that subqueries query(.) [ ] with only stopwords are evaluated forall documents. Example: q={!func}myFunction(query(field:the))fq=field:(helloworld) Since the is a stopword for field field, query(field:the) will

Re: Very slow queries

2010-10-07 Thread Amit Nithian
Try stopping replication and see if your query performance may improve. I think the caches get reset each time replication occurs. You can look at the cache performance using the admin console.. try and see if any of the caches are constantly being missed.. this could be due to your

Re: Memory usage

2010-10-07 Thread Jeff Moss
Taking Chris's information into mind I was able to isolate this to a test case. I found this ticket that seems to indicate a fundamental problem in the solr/lucene boundary. Here's how to reproduce my results: 1. Create an index with a field like

Re: having problem about Solr Date Field.

2010-10-07 Thread Lance Norskog
Solr stores dates in UTC. There is no timezone conversion or other date-format processing in Solr. The admin screen only shows in UTC. -- I want to get local time(JST) on Solr Admin. On Wed, Oct 6, 2010 at 9:14 AM, Gora Mohanty wrote: On Wed, Oct 6, 2010 at 9:17 PM, Kouta

How do I get the solr error response as XML instead of HTML

2010-10-07 Thread Scott K
solr errors come back as HTML instead of XM or JSON Is it possible to get the response to come back as XML or JSON, or at least something I could show to an end user? Is there a way to tell solr to ignore unparseable terms and still return a result, ideally with a warning so the end user doesn't

Re: script transformer vs. custom transformer

2010-10-07 Thread Lance Norskog
The javascript will be slower. There is no bytecode compilation. On Wed, Oct 6, 2010 at 1:36 PM, Tim Heckman wrote: This might be a dumb question, but should I expect that a custom transformer written in java will perform better than a javascript script transformer? Or

CollapseComponent with MLT component

2010-10-07 Thread Amit Nithian
Few questions about the CollapseComponent: 1) From what I can tell in either SOLR-236, SOLR-1682, this component extends the QueryComponent which allows one to dedup when doing a normal search. Is there a concern about performance in the worst case that you have a bunch of docs with the same value

Re: access control for spellcheck suggestions?

2010-10-07 Thread Lance Norskog
Thanks, I've been hunting for a solution to this problem. On Thu, Oct 7, 2010 at 7:43 AM, Dyer, James wrote: Look at SOLR-2010 which has patches for 1.4.1 and trunk.  It works with the spellcheck collate functionality and ensures that collations are returned only if

case-insensitive phrase query for string fields

2010-10-07 Thread Matt Mitchell
What's the recommended approach for handling case-insensitive phrase queries? I've got this setup, but no luck: fieldType name=ci_string class=solr.StrField analyzer filter class=solr.LowerCaseFilterFactory/ tokenizer class=solr.KeywordTokenizerFactory/ /analyzer

Re: Can anyone compare Solr with Autonomy?

2010-10-07 Thread Otis Gospodnetic
Scott, I doubt anyone (here or elsewhere) can give you a thorough and unbiased comparison. My suggestion is: 1) Figure out what functionality you need and what matters to you 2) Evaluate Solr with those requirements in mind. Check with the list or an expert if you see/think that Solr can't do

Re: How do I get the solr error response as XML instead of HTML

2010-10-07 Thread Otis Gospodnetic
Scott, Regarding unparseable terms - I think even edismaxc query parser is more forgiving that the standard one, but if that is not the case, one can always build a custom query parser that is more forgiving regarding invalid query string syntax. Re HTML response - I'm guessing you are seeing

Re: Copying index to another server

2010-10-07 Thread Otis Gospodnetic
Of course, you have to make sure that the source index was not in flux, not being written to while you copied it. Otis Sematext :: :: Solr - Lucene - Nutch Lucene ecosystem search :: - Original Message From: Robert Petersen

Re: subquery with stopwords

2010-10-07 Thread Otis Gospodnetic
Hi Rodrigo, I'm not sure I follow the question, so I'll simply suggest using debugQuery=true as step one. But yes, if the is a stop word and if stop words are being removed from queries, then field:the will have no effect, yes. Otis Sematext :: :: Solr - Lucene -

RE: case-insensitive phrase query for string fields

2010-10-07 Thread Jonathan Rochkind
If you are going to put explict phrase quotes in the query string like that, an ordinary text field will match fine, on phrase searches or other searches. That is a solr.TextField, not a solr.StrField as you're using. And then you can put a LowerCaseFilter on it of course. And use an ordinary

Re: SOLR 1.4.1 to SOLR 1.3 issues (backward compatability issue)

2010-10-07 Thread Otis Gospodnetic
Hi, If this is not about index format incompatibility, and it sounds like it is not, I would: * stop the servlet container * look at your servlet container's work directory (or maybe your temp dir) * remove any remnants of Solr from those directories * start the servlet container Why are you

Re: slow inserts (updatecsv) on solr

2010-10-07 Thread Otis Gospodnetic
Hello, Perhaps it's that old Pentium D with 3 GB RAM that simply can't handle both reading files whtat are a few hundred MBs and writing them at the same time? You could: * split the big file (man split) * open two terminals * from each terminal make that same updatecsv call, but with one half

Re: SOLR 1.4.1 to SOLR 1.3 issues (backward compatability issue)

2010-10-07 Thread Shanmugavel SRD
Thank you Otis. I will try these steps. Yes, it is not about indexing issue. Index and search functionality are working fine. Only admin page is not accessible. There is a process in our company to look for the roll back option while moving any changes to LIVE/Production. As part of that we found