Remove - from search query

2010-11-18 Thread ionysis
Hi I have the following search query LTJ Bukem - Horizons When this is processed by solr it subtracts Horizons from the search e.g. str name=parsedquery_toString+text:LTJ +text:Bukem -text:Horizons/str I do not want to subtract Horizons from the search, but to ignore/remove the minus sign, to

Re: Remove - from search query

2010-11-18 Thread Ahmet Arslan
I have the following search query LTJ Bukem - Horizons When this is processed by solr it subtracts Horizons from the search e.g. str name=parsedquery_toString+text:LTJ +text:Bukem -text:Horizons/str I do not want to subtract Horizons from the search, but to ignore/remove the minus

Re: DateFormatTransformer issue with value 0000-00-00T00:00:00Z

2010-11-18 Thread gwk
While the year zero exists, month zero and day zero don't. And while APIs ofttimes accept those values (ie day zero is the last day of the previous month) the ISO 8601 spec does not accept it as far as I know. On 11/18/2010 4:26 AM, Dennis Gearon wrote: I thought that that value was a

JMX Cache values are wrong

2010-11-18 Thread dan sutton
Hi, I've used three different JMX clients to query solr/core:id=org.apache.solr.search.FastLRUCache,type=queryResultCache and solr/core:id=org.apache.solr.search.FastLRUCache,type=documentCache beans and they appear to return old cache information. As new searchers come online, the newer

Re: Must require quote with single word token query?

2010-11-18 Thread Ahmet Arslan
This happening because query parser pre-tokenizes your query using whites paces. It is tokenized before it reaches your query analyzer. And you are using KeywordTokenizer in your field definition. Is there a special reason for you to use KeywordTokenizer ? --- On Thu, 11/18/10, Chamnap Chhorn

Multivalued field search...

2010-11-18 Thread Dario Rigolin
I think this question is more related to Lucene query search but I'm posting here becuase I feel more Solr User :-) I have multiple value field named field1 containint codes separated by a space doc field name=iddoc1/field field name=field1A BB1 B BB2 C BB3/field field name=field1A CC1 B CC2 C

Re: Multivalued field search...

2010-11-18 Thread Dario Rigolin
On Thursday, November 18, 2010 12:36:40 pm Dario Rigolin wrote: Sorry wrong query: q=field1:(A BB1 AND B BB2) Dario

Upgraing from SOLR 1.3 to 3

2010-11-18 Thread Moritz Krinke
Hello, i have a running solr 1.3 installation and would like to migrate it to solr 3 in order to get speed improvements by using the multiple threads for indexing. When starting SOLR 3, i get the following error message: SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'textfc'

Re: ranged and boolean query

2010-11-18 Thread Peter Blokland
hi, On Wed, Nov 17, 2010 at 05:00:04PM +0100, Peter Blokland wrote: pubdate:([* TO NOW] OR (NOT *)) i've gone back to the examples provided with solr 1.4.1. the standard example has 19 documents, one of which has a date-field called 'incubationdate_dt'. so the query incubationdate_dt:[* TO

Re: case insensitive sort and LowerCaseFilterFactory

2010-11-18 Thread Erick Erickson
On a quick glance: You're trying to sort on a tokenized field, which is not good. Not good at all. You'll go 'round and 'round and it'll never give you what you want. Consider your example Whithers, Alfred Robert The WhitespaceTokenizer breaks this up into three tokens (I'm ignoring everything

Re: Dismax is failing with json response writer

2010-11-18 Thread Erick Erickson
What version of Solr are you using? Could we see the actual query you're sending? And the dismax definition, and perhaps the relevant parts of schema.xml. There's not much information to go on here to help debug this. Best Erick On Thu, Nov 18, 2010 at 1:21 AM, sivaprasad

Re: Does edismax support wildcard queries?

2010-11-18 Thread Erick Erickson
Well, the claim is that eDismax supports full Lucene syntax, so I assume so. Here's the JIRA: https://issues.apache.org/jira/browse/SOLR-1553 https://issues.apache.org/jira/browse/SOLR-1553which indicates that you have two choices: Trunk or the 3.x build, see https://hudson.apache.org/hudson/

Re: Reading Solr Index directly

2010-11-18 Thread Erick Erickson
See below: On Thu, Nov 18, 2010 at 2:59 AM, Sasank Mudunuri sas...@gmail.com wrote: Hi, I've been poking around the JavaDocs a bit, and it looks like it's possible to directly read the index using the Solr Java API. Hoping to clarify a couple of things -- 1) Do I need to read the index

Re: Meaning of avgTimePerRequest avgRequestsPerSecond in SOLR stats page

2010-11-18 Thread Erick Erickson
average time per request is the total time spent servicing X requests divided by X (in milliseconds I believe). If no searches are being processed, this number doesn't change. It's a measure of how long it takes, on average, to service a single request. avrRequests per second is the total time

Spell-Check Component Functionality

2010-11-18 Thread rajini maski
All, I am trying apply the Solr spell check component functionality to our data. The configuration set up I needed to make for it by updating config.xml and schema.xml is done as follows.. Please let me know if any errors in it. I am not getting any suggestions in suggestion tags of solr

RE: Does edismax support wildcard queries?

2010-11-18 Thread Thumuluri, Sai
It does support wildcard queries - we are using that feature from edismax -Original Message- From: Swapnonil Mukherjee [mailto:swapnonil.mukher...@gettyimages.com] Sent: Thursday, November 18, 2010 1:39 AM To: solr-user@lucene.apache.org Subject: Does edismax support wildcard queries?

Re: Meaning of avgTimePerRequest avgRequestsPerSecond in SOLR stats page

2010-11-18 Thread mesenthil
We have recently upgraded some of our solr instances to 1.4.1 from 1.3. Interestingly both these parameter values got increased after our upgrade. When avgRequestsPerSecond increases, avgTimePerRequest should be increased. But it is not in our case.. Any thoughts ? -- View this message in

Re: Upgraing from SOLR 1.3 to 3

2010-11-18 Thread Shawn Heisey
I did a quick grep through the directory listing of the Solr 3.1 source, the only part of your analysis chain that came up empty was HTMLStripWhitespaceTokenizerFactory. I think you'll have to replace it with something like this: charFilter class=solr.HTMLStripCharFilterFactory/

Re: Does edismax support wildcard queries?

2010-11-18 Thread Swapnonil Mukherjee
Thanks. I downloaded and built the solr trunk to test for wild card queries and as you guys reported, edismax does support wildcards and it works beautifully. My next challenge will be to apply these patches 1. https://issues.apache.org/jira/browse/SOLR-1553 2.

Re: Spell-Check Component Functionality

2010-11-18 Thread Peter Karich
Hi Rajani, some notes: * try spellcheck.q=curst or completely without spellcheck.q but with q * compared to the normal q parameter spellcheck.q can have a different analyzer/tokenizer and is used if present * do not do spellcheck.build=true for every request (creating the spellcheck index

Re: Upgraing from SOLR 1.3 to 3

2010-11-18 Thread Moritz Krinke
Thanks for the tip. This seems to work ;) But now i ran into another problem - im trying to use the threads parameter in my entitys in order to speed up the index creation. as soon as i use the threads parameter (e.g. threads=2) i get the following errors in my log:

Respect token order in matches

2010-11-18 Thread Robert Gründler
Hi, is there a way to make solr respect the order of token matches when the query is a multi-term string? Here's an example: Query String: John C Indexed Strings: - John Cage - Cargill John This will return both indexed strings as a result. However, Cargill John should not match in that

Re: Save the file sent to the ExtractingRequestHandler locally on the server.

2010-11-18 Thread Chad Salamon
I'm using solr as a part of what could be described as a content management system. I don't have a problem with uploading the files independently of Solr, but I'm trying to avoid sending excess data. I'm also trying to avoid any solutions that are system dependent. Perhaps another option would be

Re: Meaning of avgTimePerRequest avgRequestsPerSecond in SOLR stats page

2010-11-18 Thread Erick Erickson
No, that's not true. As long as you're not limited by some resource, avgRequestsPerSecond can grow without impacting avgTimePerRequest much. avgTimePerRequest is the elapsed time from the beginning of Solr handling the request to the end, measured in clock time. Say it takes 100 ms. Of that 100

Re: Issue with copyField when updating document

2010-11-18 Thread Pramod Goyal
I am using the solr admin to query the document. The returned document is showing old values. Lance, I will not be able to post my configuration but i will create a simple schema just to highlight the issue. On Wed, Nov 17, 2010 at 9:56 PM, Erick Erickson erickerick...@gmail.comwrote: How are

Re: Issue with copyField when updating document

2010-11-18 Thread Pramod Goyal
Hi, Forgot to mention solr version number: Solr Implementation Version: 2010-04-30_08-05-41 939580 - hudson - 2010-04-30 08:37:22 On Thu, Nov 18, 2010 at 10:50 PM, Pramod Goyal pramod.go...@gmail.comwrote: I am using the solr admin to query the document. The returned document is showing old

[solved] Re: Multivalued field search...

2010-11-18 Thread Dario Rigolin
On Thursday, November 18, 2010 12:42:49 pm Dario Rigolin wrote: On Thursday, November 18, 2010 12:36:40 pm Dario Rigolin wrote: Sorry wrong query: q=field1:(A BB1 AND B BB2) Dario q=field1:(A BB1 B BB2~10) I discovered that proximity search works well with multiple terms Ciao. Dario.

Re: Possibilities of (near) real time search with solr

2010-11-18 Thread Peter Karich
Hi Peter! * I believe the NRT patches are included in the 4.x trunk. I don't think there's any support as yet in 3x (uses features in Lucene 3.0). I'll investage how much effort it is to update to solr4 * For merging, I'm talking about commits/writes. If you merge while commits are going

Re: Meaning of avgTimePerRequest avgRequestsPerSecond in SOLR stats page

2010-11-18 Thread Shanmugavel SRD
Erick, Thanks a lot for explaining about these two fields. Could you please let us know which one we have to look for if we have to monitor the performance? avgTimePerRequest OR avgRequestsPerSecond. Thanks, SRD -- View this message in context:

Re: Dismax - Boosting

2010-11-18 Thread Solr User
Ahmet, I modified the schema as follows: (Added more fields for faceting) field name=title type=text indexed=true stored=true omitNorms=true / field name=author type=text indexed=true stored=true multiValued=true omitNorms=true / field name=authortype type=text indexed=true stored=true

Re: Possibilities of (near) real time search with solr

2010-11-18 Thread Peter Sturge
Maybe I didn't fully understood what you explained: but doesn't this mean that you'll have one index per day? Or are you overwriting, via replicating, every shard and the number of shard is fixed? And why are you replicating from the local replica to the next shard? (why not directly from

LockReleaseFailedException

2010-11-18 Thread Robert Gründler
Hi, i'm suddenly getting a LockReleaseFailedException when starting a full-import using the Dataimporthandler: org.apache.lucene.store.LockReleaseFailedException: Cannot forcefully unlock a NativeFSLock which is held by another indexer component This worked without problems until just now.

WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Peter Karich
Hi, I am going crazy but which config is necessary to include the missing doc 2? I have: doc1 tw:aBc doc2 tw:abc Now a query aBc returns only doc 1 although when I try doc2 from admin/analysis.jsp then the term text 'abc' of the index gets highlighted as intended. I even indexed a simple

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Markus Jelsma
Hi, Please add preserveOriginal=1 to your WDF [1] definition and reindex (or just try with the analysis page). This will make sure the original input token is being preserved along the newly generated tokens. If you then pass it all through a lowercase filter, it should match both documents.

Re: Possibilities of (near) real time search with solr

2010-11-18 Thread Peter Karich
Does yours need to be once a day? no, I only thought you use one day :-) so you don't or do you have 31 shards? having a look at Solr Cloud or Katta - could be useful here in dynamically allocating shards. ah, thx! I will take a look at it (after trying solr4)! Regards, Peter.

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Peter Karich
Hi, Please add preserveOriginal=1 to your WDF [1] definition and reindex (or just try with the analysis page). but it is already there!? filter class=solr.WordDelimiterFilterFactory protected=protwords.txt generateWordParts=1 generateNumberParts=1 catenateAll=0

Re: Respect token order in matches

2010-11-18 Thread Markus Jelsma
Hi, I'm not sure what QParser you're using but with the DismaxQParser you can specify slop on explicit phrase queries, did you set it because it can make a difference. Check it out: http://wiki.apache.org/solr/DisMaxQParserPlugin#qs_.28Query_Phrase_Slop.29 Cheers, Hi, is there a way to

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Ken Stanley
On Thu, Nov 18, 2010 at 3:22 PM, Peter Karich peat...@yahoo.de wrote: Hi, Please add preserveOriginal=1  to your WDF [1] definition and reindex (or just try with the analysis page). but it is already there!? filter class=solr.WordDelimiterFilterFactory protected=protwords.txt            

Reindex Solr Using Tomcat

2010-11-18 Thread Eric Martin
Hi, I searched google and the wiki to find out how I can force a full re-index of all of my content and I came up with zilch. My goal is to be able to adjust the weight settings, re-index my entire database and then search my site and view the results of my weight adjustments. I am using

Re: Meaning of avgTimePerRequest avgRequestsPerSecond in SOLR stats page

2010-11-18 Thread Erick Erickson
avgTimePerRequest is the important one for your users. They don't care if you're processing a million QPS, they care how long *their* query took. But you also have to pay attention to longest response times Best Erick On Thu, Nov 18, 2010 at 12:54 PM, Shanmugavel SRD

Re: Reindex Solr Using Tomcat

2010-11-18 Thread Ken Stanley
On Thu, Nov 18, 2010 at 3:33 PM, Eric Martin e...@makethembite.com wrote: Hi, I searched google and the wiki to find out how I can force a full re-index of all of my content and I came up with zilch. My goal is to be able to adjust the weight settings, re-index  my entire database and then

Re: Dismax - Boosting

2010-11-18 Thread Erick Erickson
The changes that you made have no relevance to the fields you named in your query. Things like author, format, etc. You have to ask to facet by your new fields... And if you did send a different query, did you reindex after your config changes? It would be better if you made a habit of showing

RE: Reindex Solr Using Tomcat

2010-11-18 Thread Eric Martin
Ah, I am using an ApacheSolr module in Drupal and used nutch to insert the data into the Solr index. When I using Jetty I could just delete the data contents in sshd and then restart the service forcing the reindex. Currently, the ApacheSolr module for Drupal allows for a 200 record re-index

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Peter Karich
Peter, I recently had this issue, and I had to set splitOnCaseChange=0 to keep the word delimiter filter from doing what you describe. Can you try that and see if it helps? - Ken Hi Ken, yes this would solve my problem, but then I would lost a match for 'SuperMario' if I query 'mario',

Containers running SOLR: supported or unsupported?

2010-11-18 Thread Dyer, James
We're working on a budgeting for an environment to begin using SOLR in Production and the question came up about whether or not we should pay for commercial support on the container that SOLR runs under. We've pretty much decided to run on JBOSS simply because that's what we use company-wide.

Re: Possibilities of (near) real time search with solr

2010-11-18 Thread Peter Sturge
no, I only thought you use one day :-) so you don't or do you have 31 shards? No, we use 1 shard per month - e.g. 7 shards will hold 7 month's of data. It can be set to 1 day, but you would need to have a huge amount of data in a single day to warrant doing that. On Thu, Nov 18, 2010 at

using DIH with mets/alto file sets

2010-11-18 Thread Fred Gilmore
mets/alto is an xml standard for describing physical objects. In this case, we're describing books. The mets file holds the metadata (author, title, etc.), the alto file is the physical description (words on the page, formatting of the page). So it's a one (mets) to many (alto)

simple production set up

2010-11-18 Thread lee carroll
Hi I'm pretty new to SOLR and interested in getting an idea about a simple standard way of setting up a production SOLR service. I have read the FAQs and the wiki around SOLR security and performance but have not found much on a best practice architecture. I'm particularly interested in best

Re: simple production set up

2010-11-18 Thread Markus Jelsma
Hi, It's a common practice not to use Solr as a frontend. Almost all deployed instances live in the backend near the database servers. And if Solr is being put to the front, it's still being secured by a proxy. Setting up staging and production instances depend on your need. If the load is

Re: Reindex Solr Using Tomcat

2010-11-18 Thread Ken Stanley
On Thu, Nov 18, 2010 at 3:42 PM, Eric Martin e...@makethembite.com wrote: Ah, I am using an ApacheSolr module in Drupal and used nutch to insert the data into the Solr index. When I using Jetty I could just delete the data contents in sshd and then restart the service forcing the reindex.

Experiencing lots of full GC runs

2010-11-18 Thread Simon Wistow
We currently have a 30G index with 73M of .tii files running on a machine with 4 Intel 2.27GHz Xeons with 15G of memory. About once a second a process indexes ~10-20 smallish documents using the XML Update Handler. A commit happens after every update. However we see this behaviour even if the

Re: Experiencing lots of full GC runs

2010-11-18 Thread Simon Wistow
On Fri, Nov 19, 2010 at 12:01:09AM +, me said: I'm baffled - I've had way bigger indexes than this before with no performance problems. At first it was the frequent updates but the fact that it happens even when the indexer isn't running seems to put paid to that. More information: -

Re: Must require quote with single word token query?

2010-11-18 Thread Chamnap Chhorn
Well, this field is a keyphrase. I want to make it to case-insensitive single token field. It matches only when the user types the same as data in solr. What's wrong with that? Does it can be done in another way? On Thu, Nov 18, 2010 at 6:08 PM, Ahmet Arslan iori...@yahoo.com wrote: This

Re: Experiencing lots of full GC runs

2010-11-18 Thread Lance Norskog
Does it need 10G to run? Have you cycled it down to, say, 4-5G as a test? Large memory sizes can just cause more garbage collection. What is the disk activity when this happens? Do you have paging turned on? I generally turn it off- having things go into page-thrash mode is lame. How many

Re: using DIH with mets/alto file sets

2010-11-18 Thread Lance Norskog
Some ideas: XPathEntityProcessor parses a very limited XPath syntax. However, you can add an XSL script as an attribute, and this somehow gets called instead. With this, you might be able to create an XPath that selects out every combination that you want. A second option: SOLR-1499 is an

Re: Issue with copyField when updating document

2010-11-18 Thread Lance Norskog
Have you tried removing the index files and rebuilding it from scratch? The index could be corrupted. It's rare, but it does happen. On Thu, Nov 18, 2010 at 9:30 AM, Pramod Goyal pramod.go...@gmail.com wrote: Hi, Forgot to mention solr version number: Solr Implementation Version:

Re: Master/Slave High CPU Usage

2010-11-18 Thread Lance Norskog
If they are on the same server, you do not need to replicate. If you only do queries, the query server can use the same index directory as the master. Works quite well. Both have to have the same LockPolicy in solrconfig.xml. For security reasons, I would run the query server as a different user

Re: Spell-Check Component Functionality

2010-11-18 Thread rajini maski
Hello Peter, Thanks For reply :)I did spellcheck.q=Curst as you said ...Query is like: http://localhost:8909/solr/select/?spellcheck.q=Curstversion=2.2start=0rows=10indent=onspellcheck=true I am getting this error :( HTTP Status 500 - null java.lang.NullPointerException at

Re: Spell-Check Component Functionality

2010-11-18 Thread rajini maski
And If I am trying to do : http://localhost:8909/solr/select/?spellcheck.q=Curstversion=2.2start=0rows=10indent=onspellcheck=true q=Curst The XML OUTPUT IS -http://localhost:8090/solr/select/?spellcheck.q=Curstversion=2.2start=0rows=10indent=onq=Curst# response

Re: Spell-Check Component Functionality

2010-11-18 Thread Shanmugavel SRD
Did you configure below one in your default request handler? arr name=last-components strspellcheck/str /arr -- View this message in context: http://lucene.472066.n3.nabble.com/Spell-Check-Component-Functionality-tp1923954p1929124.html Sent from the Solr - User mailing list

Doubts regarding Multiple Keyword Search

2010-11-18 Thread Pawan Darira
Hi I am searching for keywords: ad testing (without quotes). I want result containing both words on the top. But it is giving me results containing words: ad test. Is it correct or any logic behind that i.e. will it consider the word test also ? Please help -- Thanks, Pawan Darira

how about another SolrIndexSearcher.numDocs method?

2010-11-18 Thread kafka0102
In my app,I want to search numdocs for some queries. I see SolrIndexSearcher has two methods: public int numDocs(Query a, DocSet b) public int numDocs(Query a, Query b) But these're not fit for me.For search's params,I get q and fq, and q' results are not in filterCache.But above methods are