Hi stanbolers,

I'm in the middle of the process of creating a custom dbpedia index for Stanbol, using some 24 dumps from dbpedia 3.7, english and portuguese, and some custom mappings (in specific some special treating for Portuguese text plus some additional properties I'd like to see indexed).

I'm following this file:

http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/indexing/dbpedia/README.md

and this for processing the broken images_en file

http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/indexing/dbpedia/fetch_prepare.sh

The process went well up to the point after all triples (some 80M) where loaded into tdb.

The problem is that the process stops after that and outputs a

Exception in thread "Thread-3" java.lang.IllegalStateException: The file with the Entity Scores is missing at org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.initialise(LineBasedEntityIterator.java:424)
...
10:12:22,077 [Thread-2] INFO solryard.SolrYardIndexingDestination - ... create SolrYard

And nothing more happens.

Of course, the file is missing because I didn't need it, since I want to index all entities. I tried to generate it anyway once but after a lot of time of processing it failed with some outOfMem exception (I think in the process of sorting).

Is there a way to instruct the indexer to ignore the Entity Scores file? Or write some simple one in a way that says "all entities are to be indexed"?

Thanks, I can send the complete log if it is needed.
Best,
Alex

Reply via email to