we're indexing around 10M records from a mysql database into
a single solr core.
The DataImportHandler needs to join 3 sub-entities to denormalize
the data.
We've run into some troubles for the first 2 attempts, but setting
batchSize="-1" for the dataSource resolved the issues.
Do you need a lo
-robert
2011/4/20 Robert Gründler:
Hi all,
i'm getting the following exception when using highlighting for a field
containing HTMLStripCharFilterFactory:
org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token ...
exceeds length of provided text sized 21
It seems this is a
Hi all,
i'm getting the following exception when using highlighting for a field
containing HTMLStripCharFilterFactory:
org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token
... exceeds length of provided text sized 21
It seems this is a know issue:
https://issues.apache.or
On 18.04.11 09:23, Bill Bell wrote:
It runs delta imports faster. Normally you need to get the Pks that
changed, and then run it through query="" which is slow when you have a
lot of Ids
but the query="" only adds/updates entries. I'm not sure how to delete
entries
by running a query like "
Hi,
when using
http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport to
periodically
run a delta-import, is it necessary to run a separate "normal"
delta-import after it to delete entries
from the index (using deletedPkQuery)?
If so, what's the point of using this method for r
value in the
"popularity" field. All results which have no match in "exact_match"
should use the "popularity" field for scoring.
Is this possible without using a function query ?
thanks.
-robert
On 29.03.11 16:34, Erik Hatcher wrote:
On Mar 29, 2011, at
Hi all,
i'm trying to implement a FunctionQuery using the "bf" parameter of the
DisMaxQueryParser, however, i'm getting an exception:
"Unknown function min in FunctionQuery('min(1,2)', pos=4)"
The request that causes the error looks like this:
http://localhost:2345/solr/main/select?qt=dismax
Hi,
we have 3 solr cores, each of them is running a delta-import every 2
minutes on
a MySQL database.
We've noticed a significant increase of MySQL queries per second, since
we've
started the delta updates.
Before that, the database server received between 50 and 100 queries per
second,
si
10:05 AM, Robert Gründler wrote:
>> Hi again,
>>
>> let's say you have 2 solr Instances, which have both exactly the same
>> configuration (schema, solrconfig, etc).
>>
>> Could it cause any troubles if we import an index from a SQL database on
>&
Hi again,
let's say you have 2 solr Instances, which have both exactly the same
configuration (schema, solrconfig, etc).
Could it cause any troubles if we import an index from a SQL database on solr
instance A, and copy the whole
index to the datadir of solr instance B (both solr instances run
5:43 , Tim Heckman wrote:
> 2010/12/15 Robert Gründler :
>> The data-config.xml looks like this (only 1 entity):
>>
>>
>>
>>
>>
>>
>>
>>> name="sf_unique_id"/>
>>
> What version of Solr are you using?
Solr Specification Version: 1.4.1
Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42
Lucene Specification Version: 2.9.3
Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55
-robert
>
> Adam
>
> 20
Hi,
we're looking for some comparison-benchmarks for importing large tables from a
mysql database (full import).
Currently, a full-import of ~ 8 Million rows from a MySQL database takes around
3 hours, on a QuadCore Machine with 16 GB of
ram and a Raid 10 storage setup. Solr is running on a apa
problem was a bug in the driver that only showed up with very high
> disk load (as is the case when doing imports)
>
We're running freebsd:
RaidController 3ware 9500S-8
Corrupt unit: Raid-10 3725.27GB 256K Stripe Size without BBU
Freebsd 7.2, UFS Filesystem.
> /Sven
>
&g
Our sysadmin speculates that maybe the chunk size of our raid/harddisks
and the segment size of the lucene index does not play well together.
Does the lucene segment size affect how the data is written to the disk?
thanks for your help.
-robert
>
> Best
> Erick
>
&
Hi,
we have a serious harddisk problem, and it's definitely related to a
full-import from a relational
database into a solr index.
The first time it happened on our development server, where the raidcontroller
crashed during a full-import
of ~ 8 Million documents. This happened 2 weeks ago, and
e for documents that exactly match on 'author_exact'. I assume this is
> ok.
>
> I can't see a way to do it without functionqueries at the moment, which
> doesn't mean there isn't any.
>
> Hope that helps,
>
> Geert-Jan
>
>
>
>
>
>
Hi,
we have a requirement for one of our search results which has a quite complex
sorting strategy. Let me explain the document first, using an example:
The document is a book. It has several indexed text fields: Title, Author,
Distributor. It has two integer columns, where one reflects the num
Hi,
i'm suddenly getting a LockReleaseFailedException when starting a full-import
using the Dataimporthandler:
org.apache.lucene.store.LockReleaseFailedException: Cannot forcefully unlock a
NativeFSLock which is held by another indexer component
This worked without problems until just now. Is
Hi,
is there a way to make solr respect the order of token matches when the query
is a multi-term string?
Here's an example:
Query String: "John C"
Indexed Strings:
- "John Cage"
- "Cargill John"
This will return both indexed strings as a result. However, "Cargill John"
should not match in
it seems adding the '+' (required) operator to each term in a multi-term query
does the trick:
http://lucene.apache.org/java/2_4_0/queryparsersyntax.html#+
ie: edgytext2:(+Martin +Sco)
-robert
On Nov 16, 2010, at 8:52 PM, Robert Gründler wrote:
> thanks for the explanat
f the index includes multi-word tokens with
> internal whitespace, they will never match. But the standard query parser
> doesn't "pre-tokenize" like this, it passes the whole phrase to the index
> intact.
>
> Robert Gründler wrote:
>>> Did you run your
Hi again,
we're coming closer to the rollout of our newly created solr/lucene based
search, and i'm wondering
how people handle changes to their schema on live systems.
In our case, we have 3 cores (ie. A,B,C), where the largest one takes about 1.5
hours for a full dataimport from the relation
>
> Did you run your query without using () and "" operators? If yes can you try
> this?
> &q=edgytext:(Mr Scorsese) OR edgytext2:"Mr Scorsese"^2.0
I didn't use () and "" in my query before. Using the query with those operators
works now, stopwords are thrown out as the should, thanks.
However,
esn't produce any EdgeNGram that would match "Bill Cl", so
> why is it even in the results?
>
> Thanks.
>
> --- On Thu, 11/11/10, Ahmet Arslan wrote:
>
>> You can add an additional field, with
>> using KeywordTokenizerFactory instead of
>> Wh
m.
> Will check the thread you mention.
>
> Best
>
> Nick
>
> On 11 Nov 2010, at 18:13, Robert Gründler wrote:
>
>> I've posted a ConcaFilter in my previous mail which does concatenate tokens.
>> This works fine, but i
>> realized that what
>
> Many thanks
>
> Nick
>
> On 11 Nov 2010, at 00:23, Robert Gründler wrote:
>
>>
>> On Nov 11, 2010, at 1:12 AM, Jonathan Rochkind wrote:
>>
>>> Are you sure you really want to throw out stopwords for your use case? I
>>> don
l"
>
> You can even apply boost so that begins with matches comes first.
>
> --- On Thu, 11/11/10, Robert Gründler wrote:
>
>> From: Robert Gründler
>> Subject: EdgeNGram relevancy
>> To: solr-user@lucene.apache.org
>> Date: Thursday, November 11
Hi,
consider the following fieldtype (used for autocompletion):
This works fine as long as the query string is a single word. For multiple
words, the ranking is weird though.
Example:
Que
edgengram it.
>
> If you include whitespace in the token, then when making your queries for
> auto-complete, be sure to use a query parser that doesn't do
> "pre-tokenization", the 'field' query parser should work well for this.
>
> Jonathan
>
Hi,
i've created the following filterchain in a field type, the idea is to use it
for autocompletion purposes:
With that kind of filterchain, the EdgeNGramFilterFactory will receive multiple
tokens on input strings with whitespaces in it. This leads to the following
results:
I
Hi all,
we had a severe problem with our raidcontroller on one of our servers today
during importing a table with ~8 million rows into a solr index. After
importing about 4 million
documents, our server shutdown, and failed to restart due to a corrupt raid
disk.
The Solr data import was the on
32 matches
Mail list logo