Walter,
Thanks! You bring up a very important 'commit' problem which I had
not thought about. So I am running a DIH that is wiping out part of
the index (ie all animals), then re-indexing/re-importing. I have
another DIH that is wiping out part if the index (minerals), then
My personal approach would be to take DIH
out of the mix entirely and do the whole thing in SolrJ
where you can exercise control to whatever degree
you want. DIH is a fine tool, but sometimes it's wrong
for a particular situation.
Here's some code to get you started if you want to
go that route.
Erik,
Thanks for all the help, what a great community.
Unfortunately the 2 data sets I want to use the DIH for change a ton and are
changed by a web app accessible to a number of people, as well as a few other
internal server applications. Since the data sets were a small figured
Sure, you need to define the appropriate delete query for each DIH entry.
Best
Erick
On Fri, Oct 5, 2012 at 5:40 PM, Billy Newman newman...@gmail.com wrote:
Does DIH support only deleting/re-indexing docs of a certain type?
I.E. can I have a DIH for type:vegetable and another for type:mineral
Right. You define three update handlers, something like /update-animal,
/update-mineral, and /update-vegetable. Each one has a separate DIH config.
Each config deletes documents of that type and loads documents of that type.
You will not want to run them at the same time, because a commit in
The very first question is what form are your XML docs in?
Solr does NOT index arbitrary XML, so I'm guessing
you're using DIH and some of the xml stuff there. Do note
that the XSLT is a subset of the full capabilities
Second, I'd recommend you just put it all in a single index, it'll be
Erick,
I did mention using the DIH to index the first two datasets, that is
where my the root of my problem lies.
I do see the benefit of one index. However the question still
remains, can I use the DIH to index xml from data set 1 and 2, every
15 minutes or so (full index) without wiping out
DIH always gives me indigestion.
Couple of things:
See the 'clean' parameter here for full import:
http://wiki.apache.org/solr/DataImportHandler
it defaults to true. I think if you set it to false
_and_ assuming that your uniqueKey is
defined, it should work OK.
The other approach would be
Using the same unique key doesn't handle documents which disappear from one
indexing to the next.
Instead, add a field for the type of item, like type:animal, type:vegetable, or
type:mineral. Then the query used to clean up before indexing can delete all
items of that type.
wunder
On Oct 5,
Does DIH support only deleting/re-indexing docs of a certain type?
I.E. can I have a DIH for type:vegetable and another for type:mineral
and each only deletes/recreates the right types?
Thanks.
On Fri, Oct 5, 2012 at 1:04 PM, Walter Underwood wun...@wunderwood.org wrote:
Using the same unique
keep in mind that everytime a commit is done all the caches are thrown
away. If updates for each of these indexes happen at different time
then the caches get invalidated each time you commit. so in that case
smaller index helps
On Wed, Jul 8, 2009 at 4:55 PM, Tim Selltrs...@gmail.com wrote:
It will depend on how much total volume you have. If you are discussing
millions and millions of records, I'd say use multicore and shards.
On Wed, Jul 8, 2009 at 5:25 AM, Tim Sell trs...@gmail.com wrote:
Hi,
I am wondering if it is common to have just one very large index, or
multiple
Hi,
I am wondering if it is common to have just one very large index, or
multiple smaller indexes specialized for different content types.
We currently have multiple smaller indexes, although one of them is
much larger then the others. We are considering merging them, to allow
the convenience of
13 matches
Mail list logo