Hello,
We are trying to run Solr on JBOSS 4.0.3, and are heaving an issue.
When we deploy the war and start our server we get a
ExceptionInInitializerError.
This is part of the stacktrace:
Caused by: java.lang.RuntimeException: XPathFactory#newInstance() failed to
create an XPathFactory for
I think if you add a field that has an analyzer that creates tokens on
alpha/digit/punctuation boundaries, that should go a long way. Use that both
at index and search time.
For example:
* 3555LHP becomes 3555 LHP
Searching for D3555 becomes D OR 3555, so it matches on token 3555
from 3555LHP.
Hi there,
It looks alot like using Solr's standard WordDelimiterFilter (see the sample
schema.xml) does what you need.
It splits on alphabetical to numeric boundaries and on the various kinds of
intra word delimiters like -, _ or .. You can decide whether the parts
are put together again in
On 5/31/07, Gal Nitzan [EMAIL PROTECTED] wrote:
We have a small index with about 4 million docs.
On this index we have a field tags which is a multiple values field.
Running a facet query on the index with something like:
facet=truefacetField=tagsq=type:video takes about 1 minute.
We have
On 5/31/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
I am trying to override the tokenized attribute of a single FieldType from
the field attribute in schema.xml, but it doesn't seem to work
The tokenized attribute is not settable from the schema, and there
is no reason I can think of why
Thanks for the prompt response. Comments below ...
[EMAIL PROTECTED] wrote on 05/31/2007 10:55:57 AM:
On 5/31/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
I am trying to override the tokenized attribute of a single FieldType
from
the field attribute in schema.xml, but it doesn't seem to
On 5/31/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
You say the tokenized attribute is not settable from the schema, but the
output from IndexSchema.readConfig shows that the properties are indeed
read, and the resulting SchemaField object retains these properties: are
they then ignored?
Hi,
I've had this application running before and not sure what has
changed to cause this error. When trying to do a clean update
(removed index dir and restarted solr) with just a commit/, Solr is
returning a status 1 with this error at the top:
java.io.EOFException: input contained no
Thanks, but I think I'm going to have to work out a different solution. I
have written my own analyzer that does everything I need: it's not a
different analyzer I need but a way to specify that certain fields should
be tokenized and others not -- while still leaving all other options open.
As
OK figured this out. The short of it is, make sure your schema is
always up to date! : )
The schema did not match the xml docs being posted. And because we
had a previous solr update with those docs, even trying to post/
update a commit/ was failing because there was already bad data
: It looks alot like using Solr's standard WordDelimiterFilter (see the
: sample schema.xml) does what you need.
WordDelimiterFilter will only get you so far. it can split the indexed
text of 3555LHP into tokens 3555 and LHP; and the user entered
D3555 into the tokens D and 3555 -- but because
Chris Hostetter [EMAIL PROTECTED] wrote on 05/31/2007 02:28:58
PM:
I'm having a little trouble following this discussion, first off as to
your immediate issue...
: Thanks, but I think I'm going to have to work out a different
solution. I
: have written my own analyzer that does everything
I solved something similar to this by creating a stemmer for part
numbers. Variations like -BN on the end can be treated as inflections
in the part number language, similar to plurals in English.
I used a set of regexes to match and transform, in some cases generating
multiple root part numbers.
-Original Message-
From: Mike Klaas [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 31, 2007 9:07 PM
To: solr-user@lucene.apache.org
Subject: Re: facet question
On 31-May-07, at 1:33 AM, Gal Nitzan wrote:
Hi,
We have a small index with about 4 million docs.
On this index
: Unfortunately, unless I've missed something obvious, the tokenized
: property is not available to classes that extend FieldType: the setArgs()
: method of FieldType strips tokenized and other standard properties away
: before calling the init() method. Yes, of course one could override
:
On 31-May-07, at 1:35 PM, Gal Nitzan wrote:
However, the cache size brings us to the 2GB limit.
If the cardinality of many of the tags is low, you can use HashSet-
based filters (the default size at which a HashSet is used is 3000).
[Gal Nitzan]
I will appreciate a pointer to documentation
-Original Message-
From: Mike Klaas [mailto:[EMAIL PROTECTED]
Sent: Friday, June 01, 2007 12:36 AM
To: solr-user@lucene.apache.org
Subject: Re: facet question
On 31-May-07, at 1:35 PM, Gal Nitzan wrote:
However, the cache size brings us to the 2GB limit.
If the
Thats the thing,Terracotta persists everything it has in memory to the disk
when it overflows(u can set how much u want to use in memory), or when the
server goes offline. When the server comes back the master terracotta simply
loads it back into the memory of the once offline
: Also, I'm still suspicious about your application. You have 1.5M
: distinct tags for 4M documents? That seems quite dense.
it's possible the app is using the filterCache for other things (on other
fields) besies just the tag field ... but that still doesn't explain one
thing...
:
: Thats the thing,Terracotta persists everything it has in memory to the
: disk when it overflows(u can set how much u want to use in memory), or
: when the server goes offline. When the server comes back the master
: terracotta simply loads it back into the memory of the once offline
Sure ..
i have Terracotta to work with Lucene , and it works find with the
RAMDirectory...i am trying to get it to work with SOLR(Hook the
RAMDirectory..)..., when i do, ill post the findings,problems,etc..Thanks for
feedback from everyone.Jeryl Cook
/^\ Pharaoh /^\
21 matches
Mail list logo