Hi All,

Is there a way to retrieve the documents being imported in a dataimport
request from a EventListener configured to run at onImportEnd?
I need to get the set of values of the field:content of all the documents
imported to perform an enhancement task. Is there a way to retrieve the
documents imported in dataimport from my EventListener?

My example data-config is as below;
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/test" user="usr1" password="pass1"
batchSize="1" />
    <document name="stanboldata"
onImportEnd="com.solr.stanbol.processor.StanbolEventListener">
        <entity name="stanbolrequest" query="SELECT * FROM documents">
            <field column="id" name="id" />
            <field column="content" name="content" />
     <field column="title" name="title" />
        </entity>
    </document>
</dataConfig>

Currently what I do is as below;

1. The /dataimport requesthandler is configured with a custom
UpdateRequestProcessor which intercepts the documents being imported, gets
the value of the field I want and updates a static Map<long, String>
contents in my custom EventListener class with the documentID and the
content String.
2. At the end of the import process the StanbolEventListener is triggered
onImportEnd of the dataimport; in the onEvent(Context cntxt) method, the
contents Map is iterated and all the content field values are sent to an
external Server to be enhanced.
3. The documents with the IDs (keys of contents Map) are updated with the
enhanced fields and committed.

This mechanism works fine for a single dataimport process at a time.
But when there are concurrent dataimport requests, the system behaves
abruptly. I suspect the static Map<long,String>contents is updated abruptly
by concurrent update requests initiated by the dataimport process. To make
the contents Map thread safe, I used a ConcurrentHashMap implementation.
However I still get abrupt results in the update process.

What I'm looking for is an alternative to bypass data concurrency handling
in EventListeners.
I think this can be achieved if the whole dataimport process is executed as
a single transaction and at the onImportEnd EventListener, all the
documents imported are retrieved to get the content field of each document.
Is there a way to access the set of documents imported in the
onEvent(Context context) method of EventListener in Solr? Can I use the
Context object to access my documents?

Any suggestions

Reply via email to