Re: stambol with italian language

Michel Benevento Wed, 28 Mar 2012 08:09:35 -0700

Have you started your installed bundle in the admin console? Click the little 
triangle next to it so it becomes a square and the status message updates.


Michel


On 28 mrt. 2012, at 17:02, seralf wrote:

> Hi i'm trying to use the KeywordLinking as Rupert suggested me earlier.
> I've done the solr indexes as in the tutorial and they seems to be ok (i
> looked inside them with Luke), i've copied them in ROOT/sling/dataset, and
> then installed the generated bundle via the console.
> 
> Now i have a strange error: seems like stanbol is not actually load my
> indexes, or for some reason it has not activated the yard
> 
> java.lang.IllegalStateException: Unable to initialize the Cache with Yard
>> <SITE-NAME> Index! This is usually caused by Errors while reading the Cache
>> Configuration from the Yard.
>>    at
>> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
>>    at
>> org.apache.stanbol.entityhub.core.site.CacheImpl.findRepresentation(CacheImpl.java:331)
>>    ...
>> Caused by: org.apache.stanbol.entityhub.servicesapi.yard.YardException:
>> The SolrIndex '<SITE-NAME>' for SolrYard '<SITE-NAME> Index' is currently
>> not active!
>>    ...
>> 
> 
> does anyone has suggestion on this?
> 
> i have two other related questions:
> 1) how can i start stanbol with specific config activated?
> 2) is there any way to manage the deploy/activation via some kind of rest
> interface? (for example curl? it could be helpful for doing some
> automatization... )
> 
> thanks in advance,
> Alfredo
> 
> 
> 
> 
> 2012/3/22 seralf <[email protected]>
> 
>> Thanks very much Rupert, you help me a lot in clarify my ideas :-)
>> 
>> i think i'll try to follow your suggestion, and try to use my thesaurus
>> with the workflow option 2)
>> i already use solr either, so it's probably the best choice for my needs,
>> indeed
>> 
>> on the other hand i'm still interested on give a try on opennlp italian
>> model construction, but i can to my experiments externally, as i correct
>> understand.
>> 
>> thanks very much, i'll try to make some progress
>> Alfredo
>> 
>> 
>> 
>> 2012/3/22 Rupert Westenthaler <[email protected]>
>> 
>>> Hi Alfredo
>>> 
>>> On 22.03.2012, at 12:24, seralf wrote:
>>> 
>>>> Hi i'm new to stambol, i'm reading the documentation and examples, and
>>> i'd
>>>> like to start some testing with it on italian language, if it's
>>> possible.
>>>> 
>>>> Could someone give me some hint regarding the steps to try to costruct
>>> my
>>>> model (Italian) and configure it inside the platform? I suppose it's
>>>> possible and it should be not very far to the steps taken for construct
>>>> -let's say- the Spanish integration.
>>>> What i need to do? I know it could sound a very generic question, but
>>> it's
>>>> not so clear from the documentation, so i need help.
>>>> For my test i would like to be able to use a text corpora from the
>>> database
>>>> of a client, and a skos thesaurus from the same domain.
>>>> 
>>>> thanks in advance for every help (suggestions, code examples, ideas,
>>> etc)
>>>> 
>>> 
>>> In principle there are two different workflows how to extract Entities
>>> form Text
>>> 
>>> (1) NamedEntityExtraction (NER) [3] => NamedEntityLinking [4]
>>> (2) KeywordLinking [5]
>>> 
>>> 
>>> (1) requires a OpenNLP [1] NER model for the language of your documents.
>>> However currently there are no models for the italian language distributed
>>> by OpenNLP. This would require you to build your own models. For more
>>> information on how to do that please see the documentation of OpenNLP [1].
>>> As soon as you have such models you need only copy them into the
>>> {stanbol-workingdir}/sling/datafiles folder. If they follow the naming
>>> scheme used by OpenNLP ("{lang}-ner-{type}.bin" e.g. "it-ner.location.bin"
>>> for the model that detects locations for italian) Stanbol will pick them up
>>> automatically.
>>> 
>>> (2) directly matches words of the text with labels of entities within the
>>> controlled vocabulary. This process can be improved by Natural Langauge
>>> Processing (e.g. Part-of-Speech tagging) but this is not a requirement.
>>> Typically this works fine for datasets that contain named entities such as
>>> concepts of an thesaurus; contacts of an company, projects, products … It
>>> does not work well with datasets that contains entities with labels that
>>> are also used as common words in the given language as this will result in
>>> a lot of false positives.
>>> 
>>> Based on the information you provided on you use case I suggest that (2)
>>> should work just fine for you. This user scenario [2] should provide you
>>> will all the needed information on how to configure Stanbol for your use
>>> case.
>>> 
>>> I hope this helps. If you have any further questions feel free to ask
>>> 
>>> best
>>> Rupert Westenthaler
>>> 
>>> [1] http://opennlp.apache.org/
>>> [2] http://incubator.apache.org/stanbol/docs/trunk/customvocabulary.html
>>> 
>>> [3]
>>> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentityextractionengine.html
>>> [4]
>>> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentitytaggingengine.html
>>> [5]
>>> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.html
>>> 
>>>> cheers,
>>>> Alfredo Serafini
>>> 
>>> 
>>

Re: stambol with italian language

Reply via email to