Am 27.11.2013 09:58, schrieb Paul Libbrecht:
Thomas,

our experience with Curriki.org is that evaluating what I call the
"related documents" is a procedure that needs access to the complete
content and thus is run at the DB level and no thte sold-level.

For example, if a user changes a part of its name, we need to reindex
all of his resources. Sure we could try to run a solr query for this,
and maybe add index fields for it, but we felt it better to run this
on the index-trigger side, the thing in our (XWiki) wiki which
listens to changes and requests the reindexing of a few documents
(including deletions).

For the maintenance operation, the same issue has appeared. So, if
the indexer or listener or solr has been down for a few minutes or
hours, we'd need to reindex not only all changed documents but all
changed documents and their related documents.

If you are able to work through your solution that would be
solr-only,  to write down all depends-on at index time, it means you
would index-update all "inverse related" documents every time that
changes. For the relation above (documents of a user), it means the
user documents needs reindexing every time a new document is added. I
wonder if this makes a scale difference.

I think both use-cases differ a bit. On index-time of my master document I have all information of dependent documents ready. So instead of committing one document I commit - lets say - four.

In your case you have to query to get all documents of a user first.

Here is a more detailed use-case. I have metadata in 1 to n languages to describe a document (e.g. journal article).

I commit a master document in a specified default language to SOLR and one document for every language I have metadata for. If a user adds or removes metadata (e.g. abstract in French) there is one document more or one document less in SOLR. So their number changes and I want stalled data to be kept in the index.

A similar use case: I have article documents with authors. I create "author" documents for every article. If someone adds or removes an author I need to track that change. These "dump" author documents are used for an alphabetical person index and hold a unique field that is used to group them but these documents exists only as long as their master documents do.

My two use-cases are quite similar so I would like these "weak" documents functionality somehow.

SOLR knows if a document is added with id=foo it have to replace a document that matches id:"foo". If I can change this behavior to dependsOn:"foo" I am done. :-D

regards

Thomas

Reply via email to