Problem with Solr not finding a class that is in lucene-analyzers.jar

2012-07-10 Thread Mike O'Leary
I have been running Solr with Tomcat, and I recently wrote a Quartz program that starts and stops Tomcat, starts Solr indexing jobs, and does a few other things. When I start Tomcat programmatically in this way, Solr starts initializing, and when it hits the text_ws field type in schema.xml, it

Writing index files that have the right owner

2012-06-15 Thread Mike O'Leary
I have been putting together an application using Quartz to run several indexing jobs in sequence using SolrJ and Tomcat on Windows. I would like the Quartz job to do the following: 1. Delete index directories from the cores so each indexing job starts fresh with empty indexes to populate

DocumentWriterPerThread sample code?

2012-04-16 Thread Mike O'Leary
Does anyone know of sample code that illustrates how to use the DocumentWriterPerThread class in indexing? Thanks, Mike

RE: waitFlush and waitSearcher with SolrServer.add(docs, commitWithinMs)

2012-04-05 Thread Mike O'Leary
ode. What commitWithin time are you using? Best Erick On Wed, Apr 4, 2012 at 7:50 PM, Mike O'Leary wrote: > I am indexing some database contents using add(docs, commitWithinMs), and > those add calls are taking over 80% of the time once the database begins > returning results. I

RE: waitFlush and waitSearcher with SolrServer.add(docs, commitWithinMs)

2012-04-04 Thread Mike O'Leary
, Mike O'Leary wrote: > If you index a set of documents with SolrJ and use > StreamingUpdateSolrServer.add(Collection docs, int > commitWithinMs), it will perform a commit within the time specified, and it > seems to use default values for waitFlush and waitSearcher. > >

waitFlush and waitSearcher with SolrServer.add(docs, commitWithinMs)

2012-04-04 Thread Mike O'Leary
If you index a set of documents with SolrJ and use StreamingUpdateSolrServer.add(Collection docs, int commitWithinMs), it will perform a commit within the time specified, and it seems to use default values for waitFlush and waitSearcher. Is there a place where you can specify different values fo

SolrJ updating indexed documents?

2012-04-02 Thread Mike O'Leary
I am working on a component for indexing documents from a database that contains medical records. The information is organized across several tables and I am supposed to index records for varying sizes of sets of patients for others to do IR experiments with. Each patient record has one or more

RE: Including an attribute value from a higher level entity when using DIH to index an XML file

2012-03-12 Thread Mike O'Leary
r what seems like a small problem, and I wonder how much it would slow down the importing of a large XML file. Are there any other ways of handling cases like this, where an attribute of an outer element is to be included in an index document that corresponds to an element nested inside it?

Including an attribute value from a higher level entity when using DIH to index an XML file

2012-03-02 Thread Mike O'Leary
I have an XML file that I would like to index, that has a structure similar to this: [message text] ... ... I would like to have the documents in the index correspond to the messages in the xml file, and have the user's [id-num] value stored as a field in each of the user's d

RE: Indexing taking so much time to complete.

2012-02-25 Thread Mike O'Leary
What's your secret? OK, that question is not the kind recommended in the UsingMailingLists suggestions, so I will write again soon with a description of my data and what I am trying to do, and ask more specific questions. And I don't mean to hijack the thread, but I am in the same boat as the p

Is there a way to write a DataImportHandler deltaQuery that compares contents still to be imported to contents in the index?

2012-02-22 Thread Mike O'Leary
I am working on indexing the contents of a database that I don't have permission to alter. In particular, the DataImportHandler examples that show how to specify a deltaQuery attribute value show database tables that have a last_modified column, and it compares these values with last_index_time

RE: Recovering from database connection resets in DataimportHandler

2012-02-22 Thread Mike O'Leary
Best Erick On Sat, Feb 11, 2012 at 2:40 AM, Mike O'Leary wrote: > I am trying to use Solr's DataImportHandler to index a large number of > database records in a SQL Server database that is owned and managed by a > group we are collaborating with. The indexing jobs I have run

Do nested entities have a representation in Solr indexes?

2012-02-22 Thread Mike O'Leary
The data-config.xml file that I have for indexing database contents has nested entity nodes within a document node, and each of the entities contains field nodes. Lucene indexes consist of documents that contain fields. What about entities? If you change the way entities are structured in a data

Using nested entities in FileDataSource import of xml file contents

2012-02-17 Thread Mike O'Leary
Can anybody help me understand the right way to define a data-config.xml file with nested entities for indexing the contents of an XML file? I used this data-config.xml file to index a database containing sample patient records:

Recovering from database connection resets in DataimportHandler

2012-02-10 Thread Mike O'Leary
I am trying to use Solr's DataImportHandler to index a large number of database records in a SQL Server database that is owned and managed by a group we are collaborating with. The indexing jobs I have run so far, except for the initial very small test runs, have failed due to database connectio

Setting up logging for a Solr project that isn't in tomcat/webapps/solr

2012-02-10 Thread Mike O'Leary
I set up a Solr project to run with Tomcat for indexing contents of a database by following a web tutorial that described how to put the project directory anywhere you want and then put a file called .xml in the tomcat/conf/Catalina/localhost directory that contains contents like this: I

Getting started with indexing a database

2012-01-09 Thread Mike O'Leary
I am trying to index the contents of a database for the first time, and I am only getting the primary key of the table represented by the top level entity in my data-config.xml file to be indexed. The database I am starting with has three tables: The table called docs has columns called doc_id,

Identifying common text in documents

2011-12-24 Thread Mike O'Leary
I am looking for a way to identify blocks of text that occur in several documents in a corpus for a research project with electronic medical records. They can be copied and pasted sections inserted into another document, text from a previous email in the corpus that is repeated in a follow-up em

Images for the DataImportHandler page

2011-12-09 Thread Mike O'Leary
There is some very useful information on the http://wiki.apache.org/solr/DataImportHandler page about indexing database contents, but the page contains three images whose links are broken. The descriptions of those images sound like it would be quite handy to see them in the page. Could someone