Does DataImportHandler do any sanitizing?

2012-08-15 Thread Jon Drukman
I am pulling some fields from a mysql database using DataImportHandler and some of them have invalid XML in them. Does DataImportHandler do any kind of filtering/sanitizing to ensure that it will go in OK or is it all on me? Example bad data: orphaned ampersands (Peanut Butter Jelly), curly

Re: Running out of memory

2012-08-13 Thread Jon Drukman
On Sun, Aug 12, 2012 at 12:31 PM, Alexey Serba ase...@gmail.com wrote: It would be vastly preferable if Solr could just exit when it gets a memory error, because we have it running under daemontools, and that would cause an automatic restart. -XX:OnOutOfMemoryError=cmd args; cmd args Run

Re: Connect to SOLR over socket file

2012-08-10 Thread Jon Drukman
On Fri, Aug 10, 2012 at 2:44 AM, Jason Axelson jaxel...@referentia.comwrote: You're correct that there is an underlying problem I'm trying to solve. The underlying problem is that due to the security policies I cannot run another service that listens on a TCP port, but a unix domain socket

Re: DataImportHandler WARNING: Unable to resolve variable

2012-08-10 Thread Jon Drukman
* have a value as well - it is getting indexed correctly. Furthermore, the number of warnings I get seems arbitrary. I imported one document (debug mode) and I got roughly ~400 of those warning messages for the single field. -Original Message- From: Jon Drukman [mailto:jdruk

Re: /solr/admin/stats.jsp null pointer exception

2012-08-09 Thread Jon Drukman
On Wed, Aug 8, 2012 at 3:03 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: I can't reproduce with teh example configs -- it looks like you've tweaked hte logging to use the XML file format, anyway to get the stacktrace of the Caused by exception so we can see what is null and where?

/solr/admin/stats.jsp null pointer exception

2012-08-08 Thread Jon Drukman
New install of Solr 3.6.1, getting a Null Pointer Exception when trying to access admin/stats.jsp: record date2012-08-08T17:55:09/date millis138509624/millis sequence694/sequence loggerorg.apache.solr.servlet.SolrDispatchFilter/logger levelSEVERE/level

Solr always at 100% (or more) CPU

2012-07-09 Thread Jon Drukman
I have a very small Solr setup. The index is 32MB and there are only 8 fields, most of which are ints. I run a cron job every hour to use DataImportHandler to do a full reimport of a database which has 42,600 rows. There is minimal traffic on the server. Maybe a few dozen queries a minute.

Re: Solr always at 100% (or more) CPU

2012-07-09 Thread Jon Drukman
last week. http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix/ Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Mon, Jul 9, 2012 at 1:13 PM, Jon Drukman jdruk

Re: Exception in DataImportHandler (stack overflow)

2012-05-15 Thread Jon Drukman
? Michael On Tue, May 15, 2012 at 4:33 PM, Jon Drukman jdruk...@gmail.com wrote: I have a machine which does a full update using DataImportHandler every hour. It worked up until a little while ago. I did not change the dataconfig.xml or version of Solr. Here is the beginning of the error

Re: Exception in DataImportHandler (stack overflow)

2012-05-15 Thread Jon Drukman
and get this fixed in DIH. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Jon Drukman [mailto:jdruk...@gmail.com] Sent: Tuesday, May 15, 2012 4:12 PM To: solr-user@lucene.apache.org Subject: Re: Exception in DataImportHandler (stack

Re: Exception in DataImportHandler (stack overflow)

2012-05-15 Thread Jon Drukman
OK, setting the wait_timeout back to its previous value and adding readOnly didn't help, I got the stack overflow again. I re-upped the mysql timeout value again. -jsd- On Tue, May 15, 2012 at 2:42 PM, Jon Drukman jdruk...@gmail.com wrote: I fixed it for now by upping the wait_timeout

Facet auto-suggest

2012-01-17 Thread Jon Drukman
I don't even know what to call this feature. Here's a website that shows the problem: http://pulse.audiusanews.com/pulse/index.php Notice that you can end up in a situation where there are no results. For example, in order, press: People, Performance, Technology, Photos. The client wants it so

Case insensitive but number sensitive string?

2011-02-25 Thread Jon Drukman
I want a string field that is case insensitive. This is what I tried: fieldType name=cistring class=solr.StrField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory/ /analyzer analyzer type=query

Re: Case insensitive but number sensitive string?

2011-02-25 Thread Jon Drukman
Ahmet Arslan iorixxx at yahoo.com writes: I want a string field that is case insensitive.  This is what I tried: fieldType name=cistring class=solr.StrField sortMissingLast=true omitNorms=true         analyzer type=index                 tokenizer

Sorting - bad performance

2011-02-22 Thread Jon Drukman
The performance factors wiki says: If you do a lot of field based sorting, it is advantageous to add explicitly warming queries to the newSearcher and firstSearcher event listeners in your solrconfig which sort on those fields, so the FieldCache is populated prior to any queries being executed by

DataImportHandler: regex debugging

2011-02-09 Thread Jon Drukman
I am trying to use the regex transformer but it's not returning anything. Either my regex is wrong, or I've done something else wrong in the setup of the entity. Is there any way to debug this? Making a change and waiting 7 minutes to reindex the entity sucks. entity name=boxshot

DataImportHandler: no queries when using entity=something

2011-02-02 Thread Jon Drukman
So I'm trying to update a single entity in my index using DataImportHandler. http://solr:8983/solr/dataimport?command=full-importentity=games It ends near-instantaneously without hitting the database at all, apparently. Status shows: str name=Total Requests made to DataSource0/str str

Re: DataImportHandler: full import of a single entity

2011-01-18 Thread Jon Drukman
Ahmet Arslan iorixxx at yahoo.com writes: I've got a DataImportHandler set up with 5 entities.  I would like to do a full import on just one entity.  Is that possible? Yes, there is a parameter named entity for that. solr/dataimport?command=full-importentity=myEntity That seems

DataImportHandler: full import of a single entity

2011-01-14 Thread Jon Drukman
I've got a DataImportHandler set up with 5 entities. I would like to do a full import on just one entity. Is that possible? I worked around it temporarily by hand editing the dataimport.properties file and deleting the delta line for that one entity, and kicking off a delta. But for

Boosting on a document value

2010-11-15 Thread Jon Drukman
I've got a document with a type field. If the type is 1, I want to boost the document's relevancy, but type=1 is not a requirement. Types other than 1 should still be returned and scored as normal, just without the boost. How do I do this? -jsd-

Searching with AND + OR and spaces

2010-11-12 Thread Jon Drukman
I want to search two fields for the phrase Call Of Duty. I tried this: (title:Call of Duty OR subhead:Call of Duty) No matches, despite the fact that there are many documents that should match. So I left out the quotes, and it seems to work. But now when I try doing things like title:Call of

Re: Searching with AND + OR and spaces

2010-11-12 Thread Jon Drukman
Ahmet Arslan iorixxx at yahoo.com writes: (title:Call of Duty OR subhead:Call of Duty) No matches, despite the fact that there are many documents that should match. Field types of title and subhead are important here. Do you use stopwordfilterfactory with enable position

Re: SEVERE: Could not start SOLR. Check solr/home property

2010-04-28 Thread Jon Drukman
On 4/27/10 12:04 PM, Chris Hostetter wrote: : SEVERE: Could not start SOLR. Check solr/home property it means something when horribly wrong when starting solr, and since this is frequently caused by either an incorrect explicit solr/home or an incorrect implicitly guessed solr home, that is

Re: SEVERE: Could not start SOLR. Check solr/home property

2010-04-26 Thread Jon Drukman
On 4/26/10 1:18 PM, Siddhant Goel wrote: Did you by any chance set up multicore? Try passing in the path to the Solr home directory as -Dsolr.solr.home=/path/to/solr/home while you start Solr. Nope, no multicore. I destroyed the index and re-created it from scratch and now it works fine. No

Boost documents based on a constant value in a field

2010-02-05 Thread Jon Drukman
I have a very simple schema: two integers and two text fields. fields field name=answer_id type=integer indexed=true stored=true required=true / field name=question type=text indexed=true stored=true/ field name=question_source type=integer indexed=true stored=true/ field

DataImportHandler delta-import confusion

2010-02-01 Thread Jon Drukman
First, let me just say that DataImportHandler is fantastic. It got my old mysql-php-xml index rebuild process down from 30 hours to 6 minutes. I'm trying to use the delta-import functionality now but failing miserably. Here's my entity tag: (some SELECT statements reduced to increase

Re: stemming (maybe?) question

2009-03-17 Thread Jon Drukman
Yonik Seeley wrote: Not sure... I just took the stock solr example, and it worked fine. I inserted o'meara into example/exampledocs/solr.xml field name=featuresAdvanced o'meara Full-Text Search Capabilities using Lucene/field the indexed everything: ./post.sh *.xml Then queried in various

Re: stemming (maybe?) question

2009-03-16 Thread Jon Drukman
Yonik Seeley wrote: On Thu, Mar 12, 2009 at 1:36 PM, Jon Drukman jdruk...@gmail.com wrote: is it possible to make solr think that omeara and o'meara are the same thing? WordDelimiter would handle it if the document had o'meara (but you may or may not want the other stuff that comes

stemming (maybe?) question

2009-03-12 Thread Jon Drukman
is it possible to make solr think that omeara and o'meara are the same thing? -jsd-

Re: exceeded limit of maxWarmingSearchers

2009-02-09 Thread Jon Drukman
Otis Gospodnetic wrote: I'd say: Make sure you don't commit more frequently than the time it takes for your searcher to warm up, or else you risk searcher overlap and pile-up. cool. i found a place in our code where we were committing the same thing twice in very rapid succession. fingers

Re: exceeded limit of maxWarmingSearchers

2009-02-05 Thread Jon Drukman
Otis Gospodnetic wrote: Jon, If you can, don't commit on every update and that should help or fully solve your problem. is there any sort of heuristic or formula i can apply that can tell me when to commit? put it in a cron job and fire it once per hour? there are certain updates that

Re: exceeded limit of maxWarmingSearchers

2009-02-04 Thread Jon Drukman
Otis Gospodnetic wrote: That should be fine (but apparently isn't), as long as you don't have some very slow machine or if your caches are are large and configured to copy a lot of data on commit. this is becoming more and more problematic. we have periods where we get 10 of these

exceeded limit of maxWarmingSearchers

2009-01-30 Thread Jon Drukman
I am getting hit by a storm of these once a day or so: SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=16, try again later. I keep bumping up maxWarmingSearchers. It's at 32 now. Is there any way to figure out what the right

Re: exceeded limit of maxWarmingSearchers

2009-01-30 Thread Jon Drukman
Yonik Seeley wrote: I'd advise setting it to a very low limit (like 2) and committing less often. Once you get too many overlapping searchers, things will slow to a crawl and that will just cause more to pile up. The root cause is simply too many commits in conjunction with warming too long.

Re: I get SEVERE: Lock obtain timed out

2009-01-29 Thread Jon Drukman
Julian Davchev wrote: Hi, Any documents or something I can read on how locks work and how I can controll it. When do they occur etc. Cause only way I got out of this mess was restarting tomcat SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SingleInstanceLock:

Re: permanently setting log level?

2009-01-29 Thread Jon Drukman
Vannia Rajan wrote: On Thu, Jan 29, 2009 at 11:55 PM, Jon Drukman jdruk...@gmail.com wrote: if i go to /solr/admin/logging, i can set the root log level to WARNING, which is what i want. however, every time solr restarts, it is set back to INFO. Is there a way to get the WARNING level

Handling proper names

2008-11-07 Thread Jon Drukman
Is there any way to tell Solr that Stephen is the same as Steven and Steve? Carl and Karl? Bobby/Bob/Robert, and so on... -jsd-

exceeded limit of maxWarmingSearchers

2008-10-29 Thread Jon Drukman
I am getting this error quite frequently on my Solr installation: SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=8, try again later. I've done some googling but the common explanation of it being related to autocommit doesn't

Re: exceeded limit of maxWarmingSearchers

2008-10-29 Thread Jon Drukman
Feak, Todd wrote: Have you looked at how long your warm up is taking? If it's taking longer to warm up a searcher then it does for you to do an update, you will be behind the curve and eventually run into this no matter how big that number. Most of them say warmupTime=0. It ranges from 0 to

dismax and stopwords (was Re: dismax and long phrases)

2008-10-09 Thread Jon Drukman
Norberto Meijome wrote: On Tue, 07 Oct 2008 09:27:30 -0700 Jon Drukman [EMAIL PROTECTED] wrote: Yep, you can fake it by only using fieldsets (qf) that have a consistent set of stopwords. does that mean changing the query or changing the schema? Jon, - you change schema.xml to define which

Re: dismax and long phrases

2008-10-07 Thread Jon Drukman
Mike Klaas wrote: On 6-Oct-08, at 11:20 AM, Jon Drukman wrote: Chris Hostetter wrote: It's not a bug in the implementation, it's a side effect of the basic tenent of how dismax works since it inverts the input and creates a DisjunctionMaxQuery for each word in the input, any word

Re: dismax and long phrases

2008-10-06 Thread Jon Drukman
Chris Hostetter wrote: It's not a bug in the implementation, it's a side effect of the basic tenent of how dismax works since it inverts the input and creates a DisjunctionMaxQuery for each word in the input, any word that is valid in at least one of the qf fields generates a should clause

dismax and long phrases

2008-10-03 Thread Jon Drukman
i have a document with the following field nameSaying goodbye to Norman/name if i search for saying goodbye to norman with the standard query, it works fine. if i specify dismax, however, it does not match. here's the output of debugQuery, which I don't understand at all: str

Re: help required: how to design a large scale solr system

2008-09-24 Thread Jon Drukman
Martin Iwanowski wrote: How can I setup to run Solr as a service, so I don't need to have a SSH connection open? The advice that I was given on this very list was to use daemontools. I set it up and it is really great - starts when the machine boots, auto-restart on failures, easy to bring

How to use copyfield with dynamicfield?

2008-09-22 Thread Jon Drukman
I have a dynamicField declaration: dynamicField name=*_t type=text indexed=true stored=true/ I want to copy any *_t's into a text field for searching with dismax. As it is, it appears you can't search dynamicfields this way. I tried adding a copyField: copyField source=*_t dest=text/ I do

Re: dismax - undefined field exception

2008-09-22 Thread Jon Drukman
Sean Timm wrote: Add echoParams=all to your URL and look for the cat field in one of the passed parameters. Specifically, in pf and qf. These can be defaulted in the solrconfig.xml file. i tried that but the exception prevents solr from returning anything. but i did look in solrconfig.xml

Re: Illegal character in xml file

2008-09-19 Thread Jon Drukman
James liu wrote: first, u should escape some string like (code by php) function escapeChars($string) { $string = str_replace(, amp;, $string); $string = str_replace(, lt;, $string); $string = str_replace(, gt;, $string); $string = str_replace(', apos;, $string); $string =

Re: Dismax + Dynamic fields

2008-09-18 Thread Jon Drukman
Daniel Papasian wrote: Norberto Meijome wrote: Thanks Yonik. ok, that matches what I've seen - if i know the actual name of the field I'm after, I can use it in a query it, but i can't use the dynamic_field_name_* (with wildcard) in the config. Is adding support for this something that is

Adding a field?

2008-08-26 Thread Jon Drukman
Is there a way to add a field to an existing index without stopping the server, deleting the index, and reloading every document from scratch? -jsd-

Solr won't start under jetty on RHEL5.2

2008-08-18 Thread Jon Drukman
I just migrated my solr instance to a new server, running RHEL5.2. I installed java from yum but I suspect it's different from the one I used to use. Anyway, my Solr no longer works. 2008-08-18 18:01:12.079::INFO: Logging to STDERR via org.mortbay.log.StdErrLog 2008-08-18

Re: Solr won't start under jetty on RHEL5.2

2008-08-18 Thread Jon Drukman
Jon Drukman wrote: I just migrated my solr instance to a new server, running RHEL5.2. I installed java from yum but I suspect it's different from the one I used to use. Turns out my instincts were correct. The version from yum does not work. I installed the official sun jdk and now

Re: Administrative questions

2008-08-15 Thread Jon Drukman
Jason Rennie wrote: On Wed, Aug 13, 2008 at 1:52 PM, Jon Drukman [EMAIL PROTECTED] wrote: Duh. I should have thought of that. I'm a big fan of djbdns so I'm quite familiar with daemontools. Thanks! :) My pleasure. Was nice to hear recently that DJB is moving toward more flexible

Re: Administrative questions

2008-08-13 Thread Jon Drukman
Jason Rennie wrote: On Tue, Aug 12, 2008 at 8:49 PM, Jon Drukman [EMAIL PROTECTED] wrote: 1. How do people deal with having solr start when system reboots, manage the log output, etc. Right now I run it manually under a unix 'screen' command with a wrapper script that takes care of restarts

Administrative questions

2008-08-12 Thread Jon Drukman
1. How do people deal with having solr start when system reboots, manage the log output, etc. Right now I run it manually under a unix 'screen' command with a wrapper script that takes care of restarts when it crashes. That means that only my user can connect to it, and it can't happen when

Re: Wildcard search question

2008-06-24 Thread Jon Drukman
Norberto Meijome wrote: ok well let's say that i can live without john/jon in the short term. what i really need today is a case insensitive wildcard search with literal matching (no fancy stemming. bobby is bobby, not bobbi.) what are my options?

Re: Wildcard search question

2008-06-23 Thread Jon Drukman
Erik Hatcher wrote: Jon, You provided a lot of nice details, thanks for helping us help you :) The one missing piece is the definition of the text field type. In Solr's _example_ schema, bobby gets analyzed (stemmed) to bobbi[1]. When you query for bobby*, the query parser is not running

Re: Wildcard search question

2008-06-23 Thread Jon Drukman
Erik Hatcher wrote: No, because the original data is str name=nameBobby Gaza/str, so Bobby* would match, but not bobby*. string type (in the example schema, to be clear) does effectively no analysis, leaving the original string indexed as-is, case and all. [...] stemming and wildcard

Best type to use for enum-like behavior

2008-06-12 Thread Jon Drukman
I am going to store two totally different types of documents in a single solr instance. Eventually I may separate them into separate instances but we are a long way from having either the size or traffic to require that. I read somewhere that a good approach is to add a 'type' field to the

Newbie Q: searching multiple fields

2008-06-02 Thread Jon Drukman
I am brand new to Solr. I am trying to get a very simple setup running. I've got just a few fields: name, description, tags. I am only able to search on the default field (name) however. I tried to set up the dismax config to search all the fields, but I never get any results on the other

Re: Newbie Q: searching multiple fields

2008-06-02 Thread Jon Drukman
Yonik Seeley wrote: field name=id type=integer indexed=true stored=true required=true / field name=name type=text indexed=true stored=true/ field name=description type=string indexed=true stored=true/ There is your issue: type string indexes the whole field value as a single token. You

Re: Newbie Q: searching multiple fields

2008-06-02 Thread Jon Drukman
Yonik Seeley wrote: Verify all the fields you want to search on indexed Verify that the query is being correctly built by adding debugQuery=true to the request here is the schema.xml extract: field name=id type=integer indexed=true stored=true required=true / field name=name type=text