I am pulling some fields from a mysql database using DataImportHandler and
some of them have invalid XML in them. Does DataImportHandler do any kind
of filtering/sanitizing to ensure that it will go in OK or is it all on me?
Example bad data: orphaned ampersands (Peanut Butter Jelly), curly
On Sun, Aug 12, 2012 at 12:31 PM, Alexey Serba ase...@gmail.com wrote:
It would be vastly preferable if Solr could just exit when it gets a
memory
error, because we have it running under daemontools, and that would cause
an automatic restart.
-XX:OnOutOfMemoryError=cmd args; cmd args
Run
On Fri, Aug 10, 2012 at 2:44 AM, Jason Axelson jaxel...@referentia.comwrote:
You're correct that there is an underlying problem I'm trying to
solve. The underlying problem is that due to the security policies I
cannot run another service that listens on a TCP port, but a unix
domain socket
* have a value as well - it is getting indexed correctly.
Furthermore, the number of warnings I get seems arbitrary. I imported one
document (debug mode) and I got roughly ~400 of those warning messages for
the single field.
-Original Message-
From: Jon Drukman [mailto:jdruk
On Wed, Aug 8, 2012 at 3:03 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:
I can't reproduce with teh example configs -- it looks like you've
tweaked hte logging to use the XML file format, anyway to get the
stacktrace of the Caused by exception so we can see what is null and
where?
New install of Solr 3.6.1, getting a Null Pointer Exception when trying to
access admin/stats.jsp:
record
date2012-08-08T17:55:09/date
millis138509624/millis
sequence694/sequence
loggerorg.apache.solr.servlet.SolrDispatchFilter/logger
levelSEVERE/level
I have a very small Solr setup. The index is 32MB and there are only 8
fields, most of which are ints. I run a cron job every hour to use
DataImportHandler to do a full reimport of a database which has 42,600 rows.
There is minimal traffic on the server. Maybe a few dozen queries a
minute.
last week.
http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix/
Michael Della Bitta
Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com
On Mon, Jul 9, 2012 at 1:13 PM, Jon Drukman jdruk
?
Michael
On Tue, May 15, 2012 at 4:33 PM, Jon Drukman jdruk...@gmail.com wrote:
I have a machine which does a full update using DataImportHandler every
hour. It worked up until a little while ago. I did not change the
dataconfig.xml or version of Solr.
Here is the beginning of the error
and get this fixed in DIH.
James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311
-Original Message-
From: Jon Drukman [mailto:jdruk...@gmail.com]
Sent: Tuesday, May 15, 2012 4:12 PM
To: solr-user@lucene.apache.org
Subject: Re: Exception in DataImportHandler (stack
OK, setting the wait_timeout back to its previous value and adding readOnly
didn't help, I got the stack overflow again. I re-upped the mysql timeout
value again.
-jsd-
On Tue, May 15, 2012 at 2:42 PM, Jon Drukman jdruk...@gmail.com wrote:
I fixed it for now by upping the wait_timeout
I don't even know what to call this feature. Here's a website that shows
the problem:
http://pulse.audiusanews.com/pulse/index.php
Notice that you can end up in a situation where there are no results.
For example,
in order, press: People, Performance, Technology, Photos. The client
wants it so
I want a string field that is case insensitive. This is what I tried:
fieldType name=cistring class=solr.StrField sortMissingLast=true
omitNorms=true
analyzer type=index
tokenizer class=solr.LowerCaseTokenizerFactory/
/analyzer
analyzer type=query
Ahmet Arslan iorixxx at yahoo.com writes:
I want a string field that is case
insensitive. This is what I tried:
fieldType name=cistring class=solr.StrField
sortMissingLast=true
omitNorms=true
analyzer type=index
tokenizer
The performance factors wiki says:
If you do a lot of field based sorting, it is advantageous to add explicitly
warming queries to the newSearcher and firstSearcher event listeners in your
solrconfig which sort on those fields, so the FieldCache is populated prior to
any queries being executed by
I am trying to use the regex transformer but it's not returning anything.
Either my regex is wrong, or I've done something else wrong in the setup of the
entity. Is there any way to debug this? Making a change and waiting 7 minutes
to reindex the entity sucks.
entity name=boxshot
So I'm trying to update a single entity in my index using DataImportHandler.
http://solr:8983/solr/dataimport?command=full-importentity=games
It ends near-instantaneously without hitting the database at all, apparently.
Status shows:
str name=Total Requests made to DataSource0/str
str
Ahmet Arslan iorixxx at yahoo.com writes:
I've got a DataImportHandler set up
with 5 entities. I would like to do a full
import on just one entity. Is that possible?
Yes, there is a parameter named entity for that.
solr/dataimport?command=full-importentity=myEntity
That seems
I've got a DataImportHandler set up with 5 entities. I would like to do a full
import on just one entity. Is that possible?
I worked around it temporarily by hand editing the dataimport.properties file
and deleting the delta line for that one entity, and kicking off a delta. But
for
I've got a document with a type field. If the type is 1, I want to boost the
document's relevancy, but type=1 is not a requirement. Types other than 1
should still be returned and scored as normal, just without the boost.
How do I do this?
-jsd-
I want to search two fields for the phrase Call Of Duty. I tried this:
(title:Call of Duty OR subhead:Call of Duty)
No matches, despite the fact that there are many documents that should match.
So I left out the quotes, and it seems to work. But now when I try doing things
like
title:Call of
Ahmet Arslan iorixxx at yahoo.com writes:
(title:Call of Duty OR subhead:Call of Duty)
No matches, despite the fact that there are many documents
that should match.
Field types of title and subhead are important here. Do you use
stopwordfilterfactory with enable
position
On 4/27/10 12:04 PM, Chris Hostetter wrote:
: SEVERE: Could not start SOLR. Check solr/home property
it means something when horribly wrong when starting solr, and since this
is frequently caused by either an incorrect explicit solr/home or an
incorrect implicitly guessed solr home, that is
On 4/26/10 1:18 PM, Siddhant Goel wrote:
Did you by any chance set up multicore? Try passing in the path to the Solr
home directory as -Dsolr.solr.home=/path/to/solr/home while you start Solr.
Nope, no multicore.
I destroyed the index and re-created it from scratch and now it works
fine. No
I have a very simple schema: two integers and two text fields.
fields
field name=answer_id type=integer indexed=true stored=true
required=true /
field name=question type=text indexed=true stored=true/
field name=question_source type=integer indexed=true
stored=true/
field
First, let me just say that DataImportHandler is fantastic. It got my
old mysql-php-xml index rebuild process down from 30 hours to 6 minutes.
I'm trying to use the delta-import functionality now but failing miserably.
Here's my entity tag: (some SELECT statements reduced to increase
Yonik Seeley wrote:
Not sure... I just took the stock solr example, and it worked fine.
I inserted o'meara into example/exampledocs/solr.xml
field name=featuresAdvanced o'meara Full-Text Search
Capabilities using Lucene/field
the indexed everything: ./post.sh *.xml
Then queried in various
Yonik Seeley wrote:
On Thu, Mar 12, 2009 at 1:36 PM, Jon Drukman jdruk...@gmail.com wrote:
is it possible to make solr think that omeara and o'meara are the same
thing?
WordDelimiter would handle it if the document had o'meara (but you
may or may not want the other stuff that comes
is it possible to make solr think that omeara and o'meara are the
same thing?
-jsd-
Otis Gospodnetic wrote:
I'd say: Make sure you don't commit more frequently than the time it takes for your
searcher to warm up, or else you risk searcher overlap and pile-up.
cool. i found a place in our code where we were committing the same
thing twice in very rapid succession. fingers
Otis Gospodnetic wrote:
Jon,
If you can, don't commit on every update and that should help or fully solve
your problem.
is there any sort of heuristic or formula i can apply that can tell me
when to commit? put it in a cron job and fire it once per hour?
there are certain updates that
Otis Gospodnetic wrote:
That should be fine (but apparently isn't), as long as you don't have some very
slow machine or if your caches are are large and configured to copy a lot of
data on commit.
this is becoming more and more problematic. we have periods where we
get 10 of these
I am getting hit by a storm of these once a day or so:
SEVERE: org.apache.solr.common.SolrException: Error opening new
searcher. exceeded limit of maxWarmingSearchers=16, try again later.
I keep bumping up maxWarmingSearchers. It's at 32 now. Is there any
way to figure out what the right
Yonik Seeley wrote:
I'd advise setting it to a very low limit (like 2) and committing less
often. Once you get too many overlapping searchers, things will slow
to a crawl and that will just cause more to pile up.
The root cause is simply too many commits in conjunction with warming
too long.
Julian Davchev wrote:
Hi,
Any documents or something I can read on how locks work and how I can
controll it. When do they occur etc.
Cause only way I got out of this mess was restarting tomcat
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
timed out: SingleInstanceLock:
Vannia Rajan wrote:
On Thu, Jan 29, 2009 at 11:55 PM, Jon Drukman jdruk...@gmail.com wrote:
if i go to /solr/admin/logging, i can set the root log level to WARNING,
which is what i want. however, every time solr restarts, it is set back to
INFO. Is there a way to get the WARNING level
Is there any way to tell Solr that Stephen is the same as Steven and
Steve? Carl and Karl? Bobby/Bob/Robert, and so on...
-jsd-
I am getting this error quite frequently on my Solr installation:
SEVERE: org.apache.solr.common.SolrException: Error opening new
searcher. exceeded limit of maxWarmingSearchers=8, try again later.
I've done some googling but the common explanation of it being related
to autocommit doesn't
Feak, Todd wrote:
Have you looked at how long your warm up is taking?
If it's taking longer to warm up a searcher then it does for you to do
an update, you will be behind the curve and eventually run into this no
matter how big that number.
Most of them say warmupTime=0. It ranges from 0 to
Norberto Meijome wrote:
On Tue, 07 Oct 2008 09:27:30 -0700
Jon Drukman [EMAIL PROTECTED] wrote:
Yep, you can fake it by only using fieldsets (qf) that have a
consistent set of stopwords.
does that mean changing the query or changing the schema?
Jon,
- you change schema.xml to define which
Mike Klaas wrote:
On 6-Oct-08, at 11:20 AM, Jon Drukman wrote:
Chris Hostetter wrote:
It's not a bug in the implementation, it's a side effect of the basic
tenent of how dismax works since it inverts the input and creates a
DisjunctionMaxQuery for each word in the input, any word
Chris Hostetter wrote:
It's not a bug in the implementation, it's a side effect of the basic
tenent of how dismax works since it inverts the input and creates a
DisjunctionMaxQuery for each word in the input, any word that is valid
in at least one of the qf fields generates a should clause
i have a document with the following field
nameSaying goodbye to Norman/name
if i search for saying goodbye to norman with the standard query, it
works fine. if i specify dismax, however, it does not match. here's
the output of debugQuery, which I don't understand at all:
str
Martin Iwanowski wrote:
How can I setup to run Solr as a service, so I don't need to have a SSH
connection open?
The advice that I was given on this very list was to use daemontools. I
set it up and it is really great - starts when the machine boots,
auto-restart on failures, easy to bring
I have a dynamicField declaration:
dynamicField name=*_t type=text indexed=true stored=true/
I want to copy any *_t's into a text field for searching with dismax.
As it is, it appears you can't search dynamicfields this way.
I tried adding a copyField:
copyField source=*_t dest=text/
I do
Sean Timm wrote:
Add echoParams=all to your URL and look for the cat field in one of
the passed parameters. Specifically, in pf and qf. These can be
defaulted in the solrconfig.xml file.
i tried that but the exception prevents solr from returning anything.
but i did look in solrconfig.xml
James liu wrote:
first, u should escape some string like (code by php)
function escapeChars($string) {
$string = str_replace(, amp;, $string);
$string = str_replace(, lt;, $string);
$string = str_replace(, gt;, $string);
$string = str_replace(', apos;, $string);
$string =
Daniel Papasian wrote:
Norberto Meijome wrote:
Thanks Yonik. ok, that matches what I've seen - if i know the actual
name of the field I'm after, I can use it in a query it, but i can't
use the dynamic_field_name_* (with wildcard) in the config.
Is adding support for this something that is
Is there a way to add a field to an existing index without stopping the
server, deleting the index, and reloading every document from scratch?
-jsd-
I just migrated my solr instance to a new server, running RHEL5.2. I
installed java from yum but I suspect it's different from the one I used
to use.
Anyway, my Solr no longer works.
2008-08-18 18:01:12.079::INFO: Logging to STDERR via
org.mortbay.log.StdErrLog
2008-08-18
Jon Drukman wrote:
I just migrated my solr instance to a new server, running RHEL5.2. I
installed java from yum but I suspect it's different from the one I used
to use.
Turns out my instincts were correct. The version from yum does not
work. I installed the official sun jdk and now
Jason Rennie wrote:
On Wed, Aug 13, 2008 at 1:52 PM, Jon Drukman [EMAIL PROTECTED] wrote:
Duh. I should have thought of that. I'm a big fan of djbdns so I'm quite
familiar with daemontools.
Thanks!
:) My pleasure. Was nice to hear recently that DJB is moving toward more
flexible
Jason Rennie wrote:
On Tue, Aug 12, 2008 at 8:49 PM, Jon Drukman [EMAIL PROTECTED] wrote:
1. How do people deal with having solr start when system reboots, manage
the log output, etc. Right now I run it manually under a unix 'screen'
command with a wrapper script that takes care of restarts
1. How do people deal with having solr start when system reboots, manage
the log output, etc. Right now I run it manually under a unix 'screen'
command with a wrapper script that takes care of restarts when it
crashes. That means that only my user can connect to it, and it can't
happen when
Norberto Meijome wrote:
ok well let's say that i can live without john/jon in the short term.
what i really need today is a case insensitive wildcard search with
literal matching (no fancy stemming. bobby is bobby, not bobbi.)
what are my options?
Erik Hatcher wrote:
Jon,
You provided a lot of nice details, thanks for helping us help you :)
The one missing piece is the definition of the text field type. In
Solr's _example_ schema, bobby gets analyzed (stemmed) to
bobbi[1]. When you query for bobby*, the query parser is not running
Erik Hatcher wrote:
No, because the original data is str name=nameBobby Gaza/str, so
Bobby* would match, but not bobby*. string type (in the example
schema, to be clear) does effectively no analysis, leaving the original
string indexed as-is, case and all.
[...]
stemming and wildcard
I am going to store two totally different types of documents in a single
solr instance. Eventually I may separate them into separate instances
but we are a long way from having either the size or traffic to require
that.
I read somewhere that a good approach is to add a 'type' field to the
I am brand new to Solr. I am trying to get a very simple setup running.
I've got just a few fields: name, description, tags. I am only able
to search on the default field (name) however. I tried to set up the
dismax config to search all the fields, but I never get any results on
the other
Yonik Seeley wrote:
field name=id type=integer indexed=true stored=true
required=true /
field name=name type=text indexed=true stored=true/
field name=description type=string indexed=true stored=true/
There is your issue: type string indexes the whole field value as a
single token.
You
Yonik Seeley wrote:
Verify all the fields you want to search on indexed
Verify that the query is being correctly built by adding
debugQuery=true to the request
here is the schema.xml extract:
field name=id type=integer indexed=true stored=true
required=true /
field name=name type=text
61 matches
Mail list logo