Hi stanbolers,
I'm in the middle of the process of creating a custom dbpedia index for
Stanbol, using some 24 dumps from dbpedia 3.7, english and portuguese,
and some custom mappings (in specific some special treating for
Portuguese text plus some additional properties I'd like to see indexed).
I'm following this file:
http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/indexing/dbpedia/README.md
and this for processing the broken images_en file
http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/indexing/dbpedia/fetch_prepare.sh
The process went well up to the point after all triples (some 80M) where
loaded into tdb.
The problem is that the process stops after that and outputs a
Exception in thread "Thread-3" java.lang.IllegalStateException: The file
with the Entity Scores is missing
at
org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.initialise(LineBasedEntityIterator.java:424)
...
10:12:22,077 [Thread-2] INFO solryard.SolrYardIndexingDestination -
... create SolrYard
And nothing more happens.
Of course, the file is missing because I didn't need it, since I want to
index all entities. I tried to generate it anyway once but after a lot
of time of processing it failed with some outOfMem exception (I think in
the process of sorting).
Is there a way to instruct the indexer to ignore the Entity Scores file?
Or write some simple one in a way that says "all entities are to be
indexed"?
Thanks, I can send the complete log if it is needed.
Best,
Alex