Re: Indexing 20M documents from MySQL with DIH

2011-04-21 Thread Robert Gründler
we're indexing around 10M records from a mysql database into a single solr core. The DataImportHandler needs to join 3 sub-entities to denormalize the data. We've run into some troubles for the first 2 attempts, but setting batchSize="-1" for the dataSource resolved the issues. Do you need a lo

Re: HTMLStripCharFilterFactory, highlighting and InvalidTokenOffsetsException

2011-04-21 Thread Robert Gründler
-robert 2011/4/20 Robert Gründler: Hi all, i'm getting the following exception when using highlighting for a field containing HTMLStripCharFilterFactory: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token ... exceeds length of provided text sized 21 It seems this is a

HTMLStripCharFilterFactory, highlighting and InvalidTokenOffsetsException

2011-04-20 Thread Robert Gründler
Hi all, i'm getting the following exception when using highlighting for a field containing HTMLStripCharFilterFactory: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token ... exceeds length of provided text sized 21 It seems this is a know issue: https://issues.apache.or

Re: DataImportHandlerDeltaQueryViaFullImport and delete query

2011-04-18 Thread Robert Gründler
On 18.04.11 09:23, Bill Bell wrote: It runs delta imports faster. Normally you need to get the Pks that changed, and then run it through query="" which is slow when you have a lot of Ids but the query="" only adds/updates entries. I'm not sure how to delete entries by running a query like "

DataImportHandlerDeltaQueryViaFullImport and delete query

2011-04-18 Thread Robert Gründler
Hi, when using http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport to periodically run a delta-import, is it necessary to run a separate "normal" delta-import after it to delete entries from the index (using deletedPkQuery)? If so, what's the point of using this method for r

Conditional Scoring (was: Re: DisMaxQueryParser: Unknown function min in FunctionQuery)

2011-03-29 Thread Robert Gründler
value in the "popularity" field. All results which have no match in "exact_match" should use the "popularity" field for scoring. Is this possible without using a function query ? thanks. -robert On 29.03.11 16:34, Erik Hatcher wrote: On Mar 29, 2011, at

DisMaxQueryParser: Unknown function min in FunctionQuery

2011-03-29 Thread Robert Gründler
Hi all, i'm trying to implement a FunctionQuery using the "bf" parameter of the DisMaxQueryParser, however, i'm getting an exception: "Unknown function min in FunctionQuery('min(1,2)', pos=4)" The request that causes the error looks like this: http://localhost:2345/solr/main/select?qt=dismax

MySQL queries high when using delta-import

2011-03-14 Thread Robert Gründler
Hi, we have 3 solr cores, each of them is running a delta-import every 2 minutes on a MySQL database. We've noticed a significant increase of MySQL queries per second, since we've started the delta updates. Before that, the database server received between 50 and 100 queries per second, si

Re: Copying the index from one solr instance to another

2010-12-15 Thread Robert Gründler
10:05 AM, Robert Gründler wrote: >> Hi again, >> >> let's say you have 2 solr Instances, which have both exactly the same >> configuration (schema, solrconfig, etc). >> >> Could it cause any troubles if we import an index from a SQL database on >&

Copying the index from one solr instance to another

2010-12-15 Thread Robert Gründler
Hi again, let's say you have 2 solr Instances, which have both exactly the same configuration (schema, solrconfig, etc). Could it cause any troubles if we import an index from a SQL database on solr instance A, and copy the whole index to the datadir of solr instance B (both solr instances run

Re: Dataimport performance

2010-12-15 Thread Robert Gründler
5:43 , Tim Heckman wrote: > 2010/12/15 Robert Gründler : >> The data-config.xml looks like this (only 1 entity): >> >> >> >> >> >> >> >>> name="sf_unique_id"/> >>

Re: Dataimport performance

2010-12-15 Thread Robert Gründler
> What version of Solr are you using? Solr Specification Version: 1.4.1 Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42 Lucene Specification Version: 2.9.3 Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55 -robert > > Adam > > 20

Dataimport performance

2010-12-15 Thread Robert Gründler
Hi, we're looking for some comparison-benchmarks for importing large tables from a mysql database (full import). Currently, a full-import of ~ 8 Million rows from a MySQL database takes around 3 hours, on a QuadCore Machine with 16 GB of ram and a Raid 10 storage setup. Solr is running on a apa

Re: Dataimport destroys our harddisks

2010-12-02 Thread Robert Gründler
problem was a bug in the driver that only showed up with very high > disk load (as is the case when doing imports) > We're running freebsd: RaidController 3ware 9500S-8 Corrupt unit: Raid-10 3725.27GB 256K Stripe Size without BBU Freebsd 7.2, UFS Filesystem. > /Sven > &g

Re: Dataimport destroys our harddisks

2010-12-02 Thread Robert Gründler
Our sysadmin speculates that maybe the chunk size of our raid/harddisks and the segment size of the lucene index does not play well together. Does the lucene segment size affect how the data is written to the disk? thanks for your help. -robert > > Best > Erick > &

Dataimport destroys our harddisks

2010-12-02 Thread Robert Gründler
Hi, we have a serious harddisk problem, and it's definitely related to a full-import from a relational database into a solr index. The first time it happened on our development server, where the raidcontroller crashed during a full-import of ~ 8 Million documents. This happened 2 weeks ago, and

Re: Is this sort order possible in a single query?

2010-11-24 Thread Robert Gründler
e for documents that exactly match on 'author_exact'. I assume this is > ok. > > I can't see a way to do it without functionqueries at the moment, which > doesn't mean there isn't any. > > Hope that helps, > > Geert-Jan > > > > > >

Is this sort order possible in a single query?

2010-11-24 Thread Robert Gründler
Hi, we have a requirement for one of our search results which has a quite complex sorting strategy. Let me explain the document first, using an example: The document is a book. It has several indexed text fields: Title, Author, Distributor. It has two integer columns, where one reflects the num

LockReleaseFailedException

2010-11-18 Thread Robert Gründler
Hi, i'm suddenly getting a LockReleaseFailedException when starting a full-import using the Dataimporthandler: org.apache.lucene.store.LockReleaseFailedException: Cannot forcefully unlock a NativeFSLock which is held by another indexer component This worked without problems until just now. Is

Respect token order in matches

2010-11-18 Thread Robert Gründler
Hi, is there a way to make solr respect the order of token matches when the query is a multi-term string? Here's an example: Query String: "John C" Indexed Strings: - "John Cage" - "Cargill John" This will return both indexed strings as a result. However, "Cargill John" should not match in

Re: EdgeNGram relevancy

2010-11-16 Thread Robert Gründler
it seems adding the '+' (required) operator to each term in a multi-term query does the trick: http://lucene.apache.org/java/2_4_0/queryparsersyntax.html#+ ie: edgytext2:(+Martin +Sco) -robert On Nov 16, 2010, at 8:52 PM, Robert Gründler wrote: > thanks for the explanat

Re: EdgeNGram relevancy

2010-11-16 Thread Robert Gründler
f the index includes multi-word tokens with > internal whitespace, they will never match. But the standard query parser > doesn't "pre-tokenize" like this, it passes the whole phrase to the index > intact. > > Robert Gründler wrote: >>> Did you run your

Best practices to rebuild index on live system

2010-11-11 Thread Robert Gründler
Hi again, we're coming closer to the rollout of our newly created solr/lucene based search, and i'm wondering how people handle changes to their schema on live systems. In our case, we have 3 cores (ie. A,B,C), where the largest one takes about 1.5 hours for a full dataimport from the relation

Re: EdgeNGram relevancy

2010-11-11 Thread Robert Gründler
> > Did you run your query without using () and "" operators? If yes can you try > this? > &q=edgytext:(Mr Scorsese) OR edgytext2:"Mr Scorsese"^2.0 I didn't use () and "" in my query before. Using the query with those operators works now, stopwords are thrown out as the should, thanks. However,

Re: EdgeNGram relevancy

2010-11-11 Thread Robert Gründler
esn't produce any EdgeNGram that would match "Bill Cl", so > why is it even in the results? > > Thanks. > > --- On Thu, 11/11/10, Ahmet Arslan wrote: > >> You can add an additional field, with >> using KeywordTokenizerFactory instead of >> Wh

Re: Concatenate multiple tokens into one

2010-11-11 Thread Robert Gründler
m. > Will check the thread you mention. > > Best > > Nick > > On 11 Nov 2010, at 18:13, Robert Gründler wrote: > >> I've posted a ConcaFilter in my previous mail which does concatenate tokens. >> This works fine, but i >> realized that what

Re: Concatenate multiple tokens into one

2010-11-11 Thread Robert Gründler
> > Many thanks > > Nick > > On 11 Nov 2010, at 00:23, Robert Gründler wrote: > >> >> On Nov 11, 2010, at 1:12 AM, Jonathan Rochkind wrote: >> >>> Are you sure you really want to throw out stopwords for your use case? I >>> don&#

Re: EdgeNGram relevancy

2010-11-11 Thread Robert Gründler
l" > > You can even apply boost so that begins with matches comes first. > > --- On Thu, 11/11/10, Robert Gründler wrote: > >> From: Robert Gründler >> Subject: EdgeNGram relevancy >> To: solr-user@lucene.apache.org >> Date: Thursday, November 11

EdgeNGram relevancy

2010-11-11 Thread Robert Gründler
Hi, consider the following fieldtype (used for autocompletion): This works fine as long as the query string is a single word. For multiple words, the ranking is weird though. Example: Que

Re: Concatenate multiple tokens into one

2010-11-10 Thread Robert Gründler
edgengram it. > > If you include whitespace in the token, then when making your queries for > auto-complete, be sure to use a query parser that doesn't do > "pre-tokenization", the 'field' query parser should work well for this. > > Jonathan >

Concatenate multiple tokens into one

2010-11-10 Thread Robert Gründler
Hi, i've created the following filterchain in a field type, the idea is to use it for autocompletion purposes: With that kind of filterchain, the EdgeNGramFilterFactory will receive multiple tokens on input strings with whitespaces in it. This leads to the following results: I

Dataimporthandler crashed raidcontroller

2010-11-04 Thread Robert Gründler
Hi all, we had a severe problem with our raidcontroller on one of our servers today during importing a table with ~8 million rows into a solr index. After importing about 4 million documents, our server shutdown, and failed to restart due to a corrupt raid disk. The Solr data import was the on