Hi i'm trying to use the KeywordLinking as Rupert suggested me earlier. I've done the solr indexes as in the tutorial and they seems to be ok (i looked inside them with Luke), i've copied them in ROOT/sling/dataset, and then installed the generated bundle via the console.
Now i have a strange error: seems like stanbol is not actually load my indexes, or for some reason it has not activated the yard java.lang.IllegalStateException: Unable to initialize the Cache with Yard > <SITE-NAME> Index! This is usually caused by Errors while reading the Cache > Configuration from the Yard. > at > org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214) > at > org.apache.stanbol.entityhub.core.site.CacheImpl.findRepresentation(CacheImpl.java:331) > ... > Caused by: org.apache.stanbol.entityhub.servicesapi.yard.YardException: > The SolrIndex '<SITE-NAME>' for SolrYard '<SITE-NAME> Index' is currently > not active! > ... > does anyone has suggestion on this? i have two other related questions: 1) how can i start stanbol with specific config activated? 2) is there any way to manage the deploy/activation via some kind of rest interface? (for example curl? it could be helpful for doing some automatization... ) thanks in advance, Alfredo 2012/3/22 seralf <[email protected]> > Thanks very much Rupert, you help me a lot in clarify my ideas :-) > > i think i'll try to follow your suggestion, and try to use my thesaurus > with the workflow option 2) > i already use solr either, so it's probably the best choice for my needs, > indeed > > on the other hand i'm still interested on give a try on opennlp italian > model construction, but i can to my experiments externally, as i correct > understand. > > thanks very much, i'll try to make some progress > Alfredo > > > > 2012/3/22 Rupert Westenthaler <[email protected]> > >> Hi Alfredo >> >> On 22.03.2012, at 12:24, seralf wrote: >> >> > Hi i'm new to stambol, i'm reading the documentation and examples, and >> i'd >> > like to start some testing with it on italian language, if it's >> possible. >> > >> > Could someone give me some hint regarding the steps to try to costruct >> my >> > model (Italian) and configure it inside the platform? I suppose it's >> > possible and it should be not very far to the steps taken for construct >> > -let's say- the Spanish integration. >> > What i need to do? I know it could sound a very generic question, but >> it's >> > not so clear from the documentation, so i need help. >> > For my test i would like to be able to use a text corpora from the >> database >> > of a client, and a skos thesaurus from the same domain. >> > >> > thanks in advance for every help (suggestions, code examples, ideas, >> etc) >> > >> >> In principle there are two different workflows how to extract Entities >> form Text >> >> (1) NamedEntityExtraction (NER) [3] => NamedEntityLinking [4] >> (2) KeywordLinking [5] >> >> >> (1) requires a OpenNLP [1] NER model for the language of your documents. >> However currently there are no models for the italian language distributed >> by OpenNLP. This would require you to build your own models. For more >> information on how to do that please see the documentation of OpenNLP [1]. >> As soon as you have such models you need only copy them into the >> {stanbol-workingdir}/sling/datafiles folder. If they follow the naming >> scheme used by OpenNLP ("{lang}-ner-{type}.bin" e.g. "it-ner.location.bin" >> for the model that detects locations for italian) Stanbol will pick them up >> automatically. >> >> (2) directly matches words of the text with labels of entities within the >> controlled vocabulary. This process can be improved by Natural Langauge >> Processing (e.g. Part-of-Speech tagging) but this is not a requirement. >> Typically this works fine for datasets that contain named entities such as >> concepts of an thesaurus; contacts of an company, projects, products … It >> does not work well with datasets that contains entities with labels that >> are also used as common words in the given language as this will result in >> a lot of false positives. >> >> Based on the information you provided on you use case I suggest that (2) >> should work just fine for you. This user scenario [2] should provide you >> will all the needed information on how to configure Stanbol for your use >> case. >> >> I hope this helps. If you have any further questions feel free to ask >> >> best >> Rupert Westenthaler >> >> [1] http://opennlp.apache.org/ >> [2] http://incubator.apache.org/stanbol/docs/trunk/customvocabulary.html >> >> [3] >> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentityextractionengine.html >> [4] >> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentitytaggingengine.html >> [5] >> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.html >> >> > cheers, >> > Alfredo Serafini >> >> >
