I have been running Solr with Tomcat, and I recently wrote a Quartz program
that starts and stops Tomcat, starts Solr indexing jobs, and does a few other
things. When I start Tomcat programmatically in this way, Solr starts
initializing, and when it hits the text_ws field type in schema.xml, it
I have been putting together an application using Quartz to run several
indexing jobs in sequence using SolrJ and Tomcat on Windows. I would like the
Quartz job to do the following:
1. Delete index directories from the cores so each indexing job starts
fresh with empty indexes to populate
Does anyone know of sample code that illustrates how to use the
DocumentWriterPerThread class in indexing?
Thanks,
Mike
ode.
What commitWithin time are you using?
Best
Erick
On Wed, Apr 4, 2012 at 7:50 PM, Mike O'Leary wrote:
> I am indexing some database contents using add(docs, commitWithinMs), and
> those add calls are taking over 80% of the time once the database begins
> returning results. I
, Mike O'Leary wrote:
> If you index a set of documents with SolrJ and use
> StreamingUpdateSolrServer.add(Collection docs, int
> commitWithinMs), it will perform a commit within the time specified, and it
> seems to use default values for waitFlush and waitSearcher.
>
>
If you index a set of documents with SolrJ and use
StreamingUpdateSolrServer.add(Collection docs, int
commitWithinMs),
it will perform a commit within the time specified, and it seems to use default
values for waitFlush and waitSearcher.
Is there a place where you can specify different values fo
I am working on a component for indexing documents from a database that
contains medical records. The information is organized across several tables
and I am supposed to index records for varying sizes of sets of patients for
others to do IR experiments with. Each patient record has one or more
r what seems
like a small problem, and I wonder how much it would slow down the importing of
a large XML file.
Are there any other ways of handling cases like this, where an attribute of an
outer element is to be included in an index document that corresponds to an
element nested inside it?
I have an XML file that I would like to index, that has a structure similar to
this:
[message text]
...
...
I would like to have the documents in the index correspond to the messages in
the xml file, and have the user's [id-num] value stored as a field in each of
the user's d
What's your secret?
OK, that question is not the kind recommended in the UsingMailingLists
suggestions, so I will write again soon with a description of my data and what
I am trying to do, and ask more specific questions. And I don't mean to hijack
the thread, but I am in the same boat as the p
I am working on indexing the contents of a database that I don't have
permission to alter. In particular, the DataImportHandler examples that show
how to specify a deltaQuery attribute value show database tables that have a
last_modified column, and it compares these values with last_index_time
Best
Erick
On Sat, Feb 11, 2012 at 2:40 AM, Mike O'Leary wrote:
> I am trying to use Solr's DataImportHandler to index a large number of
> database records in a SQL Server database that is owned and managed by a
> group we are collaborating with. The indexing jobs I have run
The data-config.xml file that I have for indexing database contents has nested
entity nodes within a document node, and each of the entities contains field
nodes. Lucene indexes consist of documents that contain fields. What about
entities? If you change the way entities are structured in a data
Can anybody help me understand the right way to define a data-config.xml file
with nested entities for indexing the contents of an XML file?
I used this data-config.xml file to index a database containing sample patient
records:
I am trying to use Solr's DataImportHandler to index a large number of database
records in a SQL Server database that is owned and managed by a group we are
collaborating with. The indexing jobs I have run so far, except for the initial
very small test runs, have failed due to database connectio
I set up a Solr project to run with Tomcat for indexing contents of a database
by following a web tutorial that described how to put the project directory
anywhere you want and then put a file called .xml in the
tomcat/conf/Catalina/localhost directory that contains contents like this:
I
I am trying to index the contents of a database for the first time, and I am
only getting the primary key of the table represented by the top level entity
in my data-config.xml file to be indexed. The database I am starting with has
three tables:
The table called docs has columns called doc_id,
I am looking for a way to identify blocks of text that occur in several
documents in a corpus for a research project with electronic medical records.
They can be copied and pasted sections inserted into another document, text
from a previous email in the corpus that is repeated in a follow-up em
There is some very useful information on the
http://wiki.apache.org/solr/DataImportHandler page about indexing database
contents, but the page contains three images whose links are broken. The
descriptions of those images sound like it would be quite handy to see them in
the page. Could someone
19 matches
Mail list logo