RE: Problem comitting on 40GB index

2010-01-13 Thread Frederico Azeiteiro
Sorry, my bad... I replied to a current mailing list message only changing the subject... Didn't know about this Hijacking problem. Will not happen again. Just for close this issue, if I understand correctly, for an index of 40G, I will need, for running an optimize: - 40G if all activity on

Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory

2010-01-13 Thread Shalin Shekhar Mangar
On Wed, Jan 13, 2010 at 7:48 AM, Lance Norskog goks...@gmail.com wrote: You can do this stripping in the DataImportHandler. You would have to write your own stripping code using regular expresssions. Note that DIH has a HTMLStripTransformer which wraps Solr's HTMLStripReader. -- Regards,

Problem with 'sint' in More Like This feature

2010-01-13 Thread vi...@8kmiles.com
Hi, I am using the More Like This feature. I have configured it in solrconfig.xml as a dedicated request handler and I am using SolrJ. It's working properly when the similarity fields are all text data types. But when I add a field whose datatype is 'sint', it's throwing an exception.

Re: Queries of type field:value not functioning

2010-01-13 Thread Chantal Ackermann
try /solr/select?q.alt=*:*qt=dismax or /solr/select?q=some search termqt=dismax dismax should be configured in solrconfig.xml by default, but you have to adapt it to list the fields from your schema.xml and for anything with known field: /solr/select?q=field:valueqt=standard Cheers, Chantal

Restricting Facet to FilterQuery in combination with mincount

2010-01-13 Thread Chantal Ackermann
Hi all, is it possible to restrict the returned facets to only those that apply to the filter query but still use mincount=0? Keeping those that have a count of 0 but apply to the filter, and at the same time leaving out those that are not covered by the filter (and thus 0, as well). Some

Re: Multi language support

2010-01-13 Thread Robert Muir
right, but we should not encourage users to significantly degrade overall relevance for all movies due to a few movies and a band (very special cases, as I said). In english, by not using stopwords, it doesn't really degrade relevance that much, so its a reasonable decision to make. This is not

Re: Problem comitting on 40GB index

2010-01-13 Thread Erick Erickson
That's my understanding.. But fortunately disk space is cheap G On Wed, Jan 13, 2010 at 5:01 AM, Frederico Azeiteiro frederico.azeite...@cision.com wrote: Sorry, my bad... I replied to a current mailing list message only changing the subject... Didn't know about this Hijacking

Re: DataImportHandler - synchronous execution

2010-01-13 Thread Alexey Serba
Hi, I created Jira issue SOLR-1721 and attached simple patch ( no documentation ) for this. HIH, Alex 2010/1/13 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com: it can be added On Tue, Jan 12, 2010 at 10:18 PM, Alexey Serba ase...@gmail.com wrote: Hi, I found that there's no explicit

Re: Multi language support

2010-01-13 Thread Paul Libbrecht
Isn't the conclusion here that some stopword and stemming free matching should be the best match if ever and to then gently degrade to weaker forms of matching? paul Le 13-janv.-10 à 07:08, Walter Underwood a écrit : There is a band named The The. And a producer named Don Was. For a

Problem indexing files

2010-01-13 Thread Thomas Stuettner
Hi all, I'm trying to add multiple files to solr 1.4 with solrj. With this programm 1 Doc is added to solr: SolrServer server = SolrHelper.getServer(); server.deleteByQuery( *:* );// delete everything! server.commit(); QueryResponse rsp =

Boosting fields with localsolr

2010-01-13 Thread Kevin Thorley
I have tried several variations now, but have been unable to come up with a way to boost fields in a localsolr query. What I need to do is do a localsolr search and sort the result set so that a specific value is at the top. My idea was to use a nested dismax query with a boost field like

Re: Boosting fields with localsolr

2010-01-13 Thread Kevin Thorley
On Jan 13, 2010, at 10:44 AM, Kevin Thorley wrote: I have tried several variations now, but have been unable to come up with a way to boost fields in a localsolr query. What I need to do is do a localsolr search and sort the result set so that a specific value is at the top. My idea was

RE: Need help Migrating to Solr

2010-01-13 Thread Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS]
I don't have experience with migrating, but you should consider using the example schema.xml in the distro as a starting basis for creating your schema. -Original Message- From: Abin Mathew [mailto:abin.mat...@toostep.com] Sent: Tuesday, January 12, 2010 8:42 PM To:

copyField with Analyzer?

2010-01-13 Thread Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS]
Hi all, I tried creating a case-insensitive string using the values provided to a string, via CopyField. This didn't work, since copyField does it's job before the analyzer on the case-insensitive string field is invoked. Is there another way I might accomplish this field replication on the

RE: Problem comitting on 40GB index

2010-01-13 Thread Marc Des Garets
Just curious, have you checked if the hanging you are experiencing is not garbage collection related? -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 13 January 2010 13:33 To: solr-user@lucene.apache.org Subject: Re: Problem comitting on 40GB index That's

Re: Question

2010-01-13 Thread Bill Bell
On Wed, Jan 13, 2010 at 10:17 AM, Bill Bell bb...@kaango.com wrote: I am using Solr 1.4, and have 3 cores defined in solr.xml. Question on replication 1. How do I set up rsync replication from master to slaves? It was easy to do with just one core and one script.conf, but with multiple

Re: How to display Highlight with VelocityResponseWriter?

2010-01-13 Thread qiuyan . xu
Thanks a lot. It works now. When i added the line #set($hl = $response.highlighting) i got the highlighting. But i wonder if there's any document that describes the usage of that. I mean i didn't know the name of those methods. Actually i just managed to guess it. best regards, Qiuyan

RE: Problem comitting on 40GB index

2010-01-13 Thread Frederico Azeiteiro
The hanging didn't happen again since yesterday. I never run out of space again. This is still a dev environment, so the number of searches is very low. Maybe I'm just lucky... Where can I see the garbage collection info? -Original Message- From: Marc Des Garets

Interesting OutOfMemoryError on a 170M index

2010-01-13 Thread Minutello, Nick
Hi, I have a bit of an interesting OutOfMemoryError that I'm trying to figure out. My client Solr server are running in the same JVM (for deployment simplicity). FWIW, I'm using Jetty to host Solr. I'm using the supplied code for the http-based client interface. Solr 1.3.0. My app is adding

case-insensitive string type

2010-01-13 Thread Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS]
Hi I have a field: field name=srcANYSTRStrCI type=string_ci indexed=true stored=true multiValued=true / With type definition: !-- A Case insensitive version of string type -- fieldType name=string_ci class=solr.StrField

Re: case-insensitive string type

2010-01-13 Thread Rob Casson
from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters On wildcard and fuzzy searches, no text analysis is performed on the search word. i'd just lowercase the wildcard-ed search term in your client code, before you send it to solr. hth, rob On Wed, Jan 13, 2010 at 2:18 PM,

Re: case-insensitive string type

2010-01-13 Thread Erick Erickson
What do you get when you add debugQuery=on to your lower-case query? And does Luke show you what you expect in the index? On Wed, Jan 13, 2010 at 2:18 PM, Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] timothy.j.har...@nasa.gov wrote: Hi I have a field: field name=srcANYSTRStrCI type=string_ci

RE: case-insensitive string type

2010-01-13 Thread Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS]
I considered that, but I'm also having the issue that I can't get an exact match as case insensitive either. -Original Message- From: Rob Casson [mailto:rob.cas...@gmail.com] Sent: Wednesday, January 13, 2010 11:26 AM To: solr-user@lucene.apache.org Subject: Re: case-insensitive string

RE: case-insensitive string type

2010-01-13 Thread Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS]
From the query http://localhost:8080/solr/select?q=idxPartition%3ASOMEPART%20AND%20srcANYSTRStrCI:%22mixcase%20or%20lower%22debugQuery=on Debug info attached -Original Message- From: Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] [mailto:timothy.j.har...@nasa.gov] Sent: Wednesday, January

RE: case-insensitive string type

2010-01-13 Thread Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS]
The value in the srcANYSTRStrCI field is miXCAse or LowER according to Luke. -Original Message- From: Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] [mailto:timothy.j.har...@nasa.gov] Sent: Wednesday, January 13, 2010 11:31 AM To: solr-user@lucene.apache.org Subject: RE: case-insensitive

RE: case-insensitive string type

2010-01-13 Thread Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS]
I created a document that has a string field and a case insensitive string field using my string_ci type, both have the same value sent at document creation time: miXCAse or LowER. I attach two debug query results. One against the string type and one against mine. The query is only different

RE: case-insensitive string type

2010-01-13 Thread Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS]
That seems to work. But why? Does string type not support LowerCaseFilterFactory? Or KeywordTokenizerFactory? -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Wednesday, January 13, 2010 11:51 AM To: solr-user@lucene.apache.org Subject: RE: case-insensitive

RE: case-insensitive string type

2010-01-13 Thread Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS]
Thanks, I know I read that sometime back but I guess I thought that was because there were no analyzer tags defined on the string field in the schema. I guess cause I'm still kind of a noob - I didn't take that to mean that it couldn't be made to have analyzers. A subtle but important

Re: Multi language support

2010-01-13 Thread Lance Norskog
Robert Muir: Thank you for the pointer to that paper! On Wed, Jan 13, 2010 at 6:29 AM, Paul Libbrecht p...@activemath.org wrote: Isn't the conclusion here that some stopword and stemming free matching should be the best match if ever and to then gently degrade to  weaker forms of matching?

Re: copyField with Analyzer?

2010-01-13 Thread Lance Norskog
You can do this filtering in the DataImportHandler. The regular expression tool is probably enough: http://wiki.apache.org/solr/DataImportHandler#RegexTransformer On Wed, Jan 13, 2010 at 8:57 AM, Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] timothy.j.har...@nasa.gov wrote: Hi all, I tried

Re: Interesting OutOfMemoryError on a 170M index

2010-01-13 Thread Lance Norskog
The time in autocommit is in milliseconds. You are committing every second while indexing. This then causes a build-up of sucessive index readers that absorb each commit, which is probably the out-of-memory. On Wed, Jan 13, 2010 at 10:36 AM, Minutello, Nick nick.minute...@credit-suisse.com

RE: Interesting OutOfMemoryError on a 170M index

2010-01-13 Thread Minutello, Nick
Agreed, commit every second. Assuming I understand what you're saying correctly: There shouldn't be any index readers - as at this point, just writing to the index. Did I understand correctly what you meant? -Nick -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent:

Need deployment strategy

2010-01-13 Thread Paul Rosen
Hi all, The way the indexing works on our system is as follows: We have a separate staging server with a copy of our web app. The clients will index a number of documents in a batch on the staging server (this happens about once a week), then they play with the results on the staging server

RE: Problem comitting on 40GB index

2010-01-13 Thread Sven Maurmann
Hi! Garbage collection is an issue of the underlying JVM. You may use –XX:+PrintGCDetails as an argument to your JVM in order to collect details of the garbage collection. If you also use the parameter –XX:+PrintGCTimeStamps you get the time stamps of the garbage collection. For further

Re: Question

2010-01-13 Thread Otis Gospodnetic
Bill, If you are using Solr 1.4, don't bother with rsync, use the Java-based replication - info on zee Wiki. Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch From: Bill Bell bb...@kaango.com To: solr-user@lucene.apache.org Sent: Wed, January

Re: Queries of type field:value not functioning

2010-01-13 Thread Otis Gospodnetic
Hi, Pointers: * What happens when you don't use a field name? * What are your logs showing? * What is debugQuery=on showing? * What is the Analysis page for some of the problematic queries showing? Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch

Re: Interesting OutOfMemoryError on a 170M index

2010-01-13 Thread Ryan McKinley
On Jan 13, 2010, at 5:34 PM, Minutello, Nick wrote: Agreed, commit every second. Do you need the index to be updated this often? Are you reading from it every second? and need results that are that fresh If not, i imagine increasing the auto-commit time to 1min or even 10 secs would

RE: Interesting OutOfMemoryError on a 170M index

2010-01-13 Thread Minutello, Nick
if you are using auto-commit, you should not call commit from the client Cheers, thanks. Do you need the index to be updated this often? Wouldn't increasing the autocommit time make it worse? (ie more documents buffered) I can extend it and see what effect it has -Nick -Original

Re: How to display Highlight with VelocityResponseWriter?

2010-01-13 Thread Sascha Szott
Hi Qiuyan, Thanks a lot. It works now. When i added the line #set($hl = $response.highlighting) i got the highlighting. But i wonder if there's any document that describes the usage of that. I mean i didn't know the name of those methods. Actually i just managed to guess it. Solritas (aka

RE: Interesting OutOfMemoryError on a 170M index

2010-01-13 Thread Minutello, Nick
Hm, Ryan, you may have inadvertently solved the problem. :) Going flat out in a loop, indexing 1 doc at a time, I can only index about 17,000 per minute - roughly what I was seeing with my app running... which makes me suspicious. The number is too close to be coincidental. It could very well

Re: Encountering a roadblock with my Solr schema design...use dedupe?

2010-01-13 Thread Chris Hostetter
: Dedupe is completely the wrong word. Deduping is something else : entirely - it is about trying not to index the same document twice. Dedup can also certainly be used with field collapsing -- that was one of the initial use cases identified for the SignatureUpdateProcessorFactory ... you can

Re: Encountering a roadblock with my Solr schema design...use dedupe?

2010-01-13 Thread Kelly Taylor
Hoss, Would you suggest using dedup for my use case; and if so, do you know of a working example I can reference? I don't have an issue using the patched version of Solr, but I'd much rather use the GA version. -Kelly hossman wrote: : Dedupe is completely the wrong word. Deduping is

Re: What is this error means?

2010-01-13 Thread Ellery Leung
Hi Israel Thank you for your response. However, I use both ini_set and set the _defaultTimeout to 6000 but the error still occur with same error message. Now, when I start build the index, the error pops up much faster than changing it before. So do you have any idea? Thank you in advance

RE: Reverse sort facet query [SOLR-1672]

2010-01-13 Thread Chris Hostetter
: i.e. just extend facet.sort to allow a 'count desc'. By convention, ok : to use a a space in the name? - or would count.desc (and count.asc as : alias for count) be more compliant? i would use space to remain consistent with the existing sort param. it might even make sense to refactor

Re: What is this error means?

2010-01-13 Thread Ellery Leung
Here are a workaround of this issue: On line 382 of SolrPhpClient/Apache/Solr/Service.php, I change to: while(true){ $str = file_get_contents($url, false, $this-_postContext); if(empty($str) == false){ break; } } $response = new

Re: Queries of type field:value not functioning

2010-01-13 Thread Siddhant Goel
Hi, Thanks for the responses. q.alt did the job. Turns out that the dismax query parser was at fault, and wasn't able to handle queries of the type *:*. Putting the query in q.alt, or adding a defType=lucene (as pointed out to me on the irc channel) worked. Thanks, -- - Siddhant