Re: stambol with italian language

[email protected] Wed, 28 Mar 2012 08:35:38 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've seen this happen loading custom vocabularies built by the Generic RDF 
Indexer and I'm honestly still not sure of why. In my case, restarting the 
custom bundle and the Solr Yard bundle seemed to make it work. I imagine that 
restarting Stanbol would do the same. Perhaps there is some subtle error in the 
building of the custom bundle that makes it possible for a Solr index service 
to be created but not started?


As to managing configuration, you may want to follow:

https://issues.apache.org/jira/browse/STANBOL-529

which offers a future way to provide configuration at startup. I'm not familiar 
enough with the Sling Launcher system to know how difficult it would be to 
directly expose deployment via REST, but it might be more feasible using the 
Apache Felix Web Console which is normally included in Stanbol builds:

http://felix.apache.org/site/web-console-restful-api.html
http://felix.apache.org/site/apache-felix-web-console.html#ApacheFelixWebConsole-RESTfulAPI

- ---
A. Soroka
Software & Systems Engineering :: Online Library Environment
the University of Virginia Library

On Mar 28, 2012, at 11:24 AM, seralf wrote:

> yes i have already started the bundle, but if i search from the web
> interface or via a command line like:
> curl -X POST -d "name=roma*&limit=10&offset=0"
> http://localhost:8080/entityhub/site/<SITE-NAME>/find
> 
> i have the error i pasted.
> 
> Any suggestions? maybe i miss some configuration step?
> 
> 2012/3/28 Michel Benevento <[email protected]>
> 
>> Have you started your installed bundle in the admin console? Click the
>> little triangle next to it so it becomes a square and the status message
>> updates.
>> 
>> Michel
>> 
>> 
>> On 28 mrt. 2012, at 17:02, seralf wrote:
>> 
>>> Hi i'm trying to use the KeywordLinking as Rupert suggested me earlier.
>>> I've done the solr indexes as in the tutorial and they seems to be ok (i
>>> looked inside them with Luke), i've copied them in ROOT/sling/dataset,
>> and
>>> then installed the generated bundle via the console.
>>> 
>>> Now i have a strange error: seems like stanbol is not actually load my
>>> indexes, or for some reason it has not activated the yard
>>> 
>>> java.lang.IllegalStateException: Unable to initialize the Cache with Yard
>>>> <SITE-NAME> Index! This is usually caused by Errors while reading the
>> Cache
>>>> Configuration from the Yard.
>>>>   at
>>>> 
>> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
>>>>   at
>>>> 
>> org.apache.stanbol.entityhub.core.site.CacheImpl.findRepresentation(CacheImpl.java:331)
>>>>   ...
>>>> Caused by: org.apache.stanbol.entityhub.servicesapi.yard.YardException:
>>>> The SolrIndex '<SITE-NAME>' for SolrYard '<SITE-NAME> Index' is
>> currently
>>>> not active!
>>>>   ...
>>>> 
>>> 
>>> does anyone has suggestion on this?
>>> 
>>> i have two other related questions:
>>> 1) how can i start stanbol with specific config activated?
>>> 2) is there any way to manage the deploy/activation via some kind of rest
>>> interface? (for example curl? it could be helpful for doing some
>>> automatization... )
>>> 
>>> thanks in advance,
>>> Alfredo
>>> 
>>> 
>>> 
>>> 
>>> 2012/3/22 seralf <[email protected]>
>>> 
>>>> Thanks very much Rupert, you help me a lot in clarify my ideas :-)
>>>> 
>>>> i think i'll try to follow your suggestion, and try to use my thesaurus
>>>> with the workflow option 2)
>>>> i already use solr either, so it's probably the best choice for my
>> needs,
>>>> indeed
>>>> 
>>>> on the other hand i'm still interested on give a try on opennlp italian
>>>> model construction, but i can to my experiments externally, as i correct
>>>> understand.
>>>> 
>>>> thanks very much, i'll try to make some progress
>>>> Alfredo
>>>> 
>>>> 
>>>> 
>>>> 2012/3/22 Rupert Westenthaler <[email protected]>
>>>> 
>>>>> Hi Alfredo
>>>>> 
>>>>> On 22.03.2012, at 12:24, seralf wrote:
>>>>> 
>>>>>> Hi i'm new to stambol, i'm reading the documentation and examples, and
>>>>> i'd
>>>>>> like to start some testing with it on italian language, if it's
>>>>> possible.
>>>>>> 
>>>>>> Could someone give me some hint regarding the steps to try to costruct
>>>>> my
>>>>>> model (Italian) and configure it inside the platform? I suppose it's
>>>>>> possible and it should be not very far to the steps taken for
>> construct
>>>>>> -let's say- the Spanish integration.
>>>>>> What i need to do? I know it could sound a very generic question, but
>>>>> it's
>>>>>> not so clear from the documentation, so i need help.
>>>>>> For my test i would like to be able to use a text corpora from the
>>>>> database
>>>>>> of a client, and a skos thesaurus from the same domain.
>>>>>> 
>>>>>> thanks in advance for every help (suggestions, code examples, ideas,
>>>>> etc)
>>>>>> 
>>>>> 
>>>>> In principle there are two different workflows how to extract Entities
>>>>> form Text
>>>>> 
>>>>> (1) NamedEntityExtraction (NER) [3] => NamedEntityLinking [4]
>>>>> (2) KeywordLinking [5]
>>>>> 
>>>>> 
>>>>> (1) requires a OpenNLP [1] NER model for the language of your
>> documents.
>>>>> However currently there are no models for the italian language
>> distributed
>>>>> by OpenNLP. This would require you to build your own models. For more
>>>>> information on how to do that please see the documentation of OpenNLP
>> [1].
>>>>> As soon as you have such models you need only copy them into the
>>>>> {stanbol-workingdir}/sling/datafiles folder. If they follow the naming
>>>>> scheme used by OpenNLP ("{lang}-ner-{type}.bin" e.g.
>> "it-ner.location.bin"
>>>>> for the model that detects locations for italian) Stanbol will pick
>> them up
>>>>> automatically.
>>>>> 
>>>>> (2) directly matches words of the text with labels of entities within
>> the
>>>>> controlled vocabulary. This process can be improved by Natural Langauge
>>>>> Processing (e.g. Part-of-Speech tagging) but this is not a requirement.
>>>>> Typically this works fine for datasets that contain named entities
>> such as
>>>>> concepts of an thesaurus; contacts of an company, projects, products …
>> It
>>>>> does not work well with datasets that contains entities with labels
>> that
>>>>> are also used as common words in the given language as this will
>> result in
>>>>> a lot of false positives.
>>>>> 
>>>>> Based on the information you provided on you use case I suggest that
>> (2)
>>>>> should work just fine for you. This user scenario [2] should provide
>> you
>>>>> will all the needed information on how to configure Stanbol for your
>> use
>>>>> case.
>>>>> 
>>>>> I hope this helps. If you have any further questions feel free to ask
>>>>> 
>>>>> best
>>>>> Rupert Westenthaler
>>>>> 
>>>>> [1] http://opennlp.apache.org/
>>>>> [2]
>> http://incubator.apache.org/stanbol/docs/trunk/customvocabulary.html
>>>>> 
>>>>> [3]
>>>>> 
>> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentityextractionengine.html
>>>>> [4]
>>>>> 
>> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentitytaggingengine.html
>>>>> [5]
>>>>> 
>> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.html
>>>>> 
>>>>>> cheers,
>>>>>> Alfredo Serafini
>>>>> 
>>>>> 
>>>> 
>> 
>> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJPcy+pAAoJEATpPYSyaoIkouwH/imt4ERphKHGc6tXrkLQIFWJ
TclWGjyCjoT1GgOr2OGjwfTS9xmcbsn3mYwfv+tuxNj2FfXfi4OfoVza6z7tZeUZ
WdH4+cmq+4Lg+7lt+Pbt2narYWhvUCg2Dths8tdj8nPtJSEEd2KfW5DQqnwq/CfA
uqOAN5zEb9rsy5gTGzSNxX66fpnM1t7XWHs2gmoD17rfmnJEQBc3l+a6rnLJdnFX
vABg2gEiYt5YGaZRG4V1oVC5SqEoZlysix/tkZyWcFMvXN+nvePbMDhaqBwjWc5k
719uf4gW66Xf7V8zeWgwQcXlNICAebyXnsGiPqkeUaZa4nhm6v+G+FT4Ho/R4lk=
=10yV
-----END PGP SIGNATURE-----

Re: stambol with italian language

Reply via email to