Re: stambol with italian language

Rupert Westenthaler Wed, 28 Mar 2012 10:48:49 -0700

Hi,

and sorry for the late response ...


>> ~/sw/stanbol/launchers$ java -Xmx1024m -jar
>> full/target/org.apache.stanbol.launchers.full-0.9.0-incubating-SNAPSHOT.jar
>> start -c ../sling *doesn't work for me*
>>
>> ~/sw/stanbol/launchers$ java -Xmx1024m -jar
>> full/target/org.apache.stanbol.launchers.full-0.9.0-incubating-SNAPSHOT.jar
>> start *works*
>>

the reason for that is that in a lot of cases the default sling folder
"/sling" is hardcoded in Stanbol.
One such example is the MainDataFileProvider that serves the files
located in /sling/datafiles.

I consider this behavior as a bug and created
https://issues.apache.org/jira/browse/STANBOL-561

While doing that I will also change the default from
"{wokring-dir}/sling" to "{working-dir}/stanbol"  (see
https://issues.apache.org/jira/browse/STANBOL-562)

In the meantime you will need to use the default

best
Rupert

>
>
> thanks,
> Alfredo
>
> 2012/3/28 [email protected] <[email protected]>
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> I've seen this happen loading custom vocabularies built by the Generic RDF
>> Indexer and I'm honestly still not sure of why. In my case, restarting the
>> custom bundle and the Solr Yard bundle seemed to make it work. I imagine
>> that restarting Stanbol would do the same. Perhaps there is some subtle
>> error in the building of the custom bundle that makes it possible for a
>> Solr index service to be created but not started?
>>
>> As to managing configuration, you may want to follow:
>>
>> https://issues.apache.org/jira/browse/STANBOL-529
>>
>> which offers a future way to provide configuration at startup. I'm not
>> familiar enough with the Sling Launcher system to know how difficult it
>> would be to directly expose deployment via REST, but it might be more
>> feasible using the Apache Felix Web Console which is normally included in
>> Stanbol builds:
>>
>> http://felix.apache.org/site/web-console-restful-api.html
>>
>> http://felix.apache.org/site/apache-felix-web-console.html#ApacheFelixWebConsole-RESTfulAPI
>>
>> - ---
>> A. Soroka
>> Software & Systems Engineering :: Online Library Environment
>> the University of Virginia Library
>>
>> On Mar 28, 2012, at 11:24 AM, seralf wrote:
>>
>> > yes i have already started the bundle, but if i search from the web
>> > interface or via a command line like:
>> > curl -X POST -d "name=roma*&limit=10&offset=0"
>> > http://localhost:8080/entityhub/site/<SITE-NAME>/find
>> >
>> > i have the error i pasted.
>> >
>> > Any suggestions? maybe i miss some configuration step?
>> >
>> > 2012/3/28 Michel Benevento <[email protected]>
>> >
>> >> Have you started your installed bundle in the admin console? Click the
>> >> little triangle next to it so it becomes a square and the status message
>> >> updates.
>> >>
>> >> Michel
>> >>
>> >>
>> >> On 28 mrt. 2012, at 17:02, seralf wrote:
>> >>
>> >>> Hi i'm trying to use the KeywordLinking as Rupert suggested me earlier.
>> >>> I've done the solr indexes as in the tutorial and they seems to be ok
>> (i
>> >>> looked inside them with Luke), i've copied them in ROOT/sling/dataset,
>> >> and
>> >>> then installed the generated bundle via the console.
>> >>>
>> >>> Now i have a strange error: seems like stanbol is not actually load my
>> >>> indexes, or for some reason it has not activated the yard
>> >>>
>> >>> java.lang.IllegalStateException: Unable to initialize the Cache with
>> Yard
>> >>>> <SITE-NAME> Index! This is usually caused by Errors while reading the
>> >> Cache
>> >>>> Configuration from the Yard.
>> >>>>   at
>> >>>>
>> >>
>> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
>> >>>>   at
>> >>>>
>> >>
>> org.apache.stanbol.entityhub.core.site.CacheImpl.findRepresentation(CacheImpl.java:331)
>> >>>>   ...
>> >>>> Caused by:
>> org.apache.stanbol.entityhub.servicesapi.yard.YardException:
>> >>>> The SolrIndex '<SITE-NAME>' for SolrYard '<SITE-NAME> Index' is
>> >> currently
>> >>>> not active!
>> >>>>   ...
>> >>>>
>> >>>
>> >>> does anyone has suggestion on this?
>> >>>
>> >>> i have two other related questions:
>> >>> 1) how can i start stanbol with specific config activated?
>> >>> 2) is there any way to manage the deploy/activation via some kind of
>> rest
>> >>> interface? (for example curl? it could be helpful for doing some
>> >>> automatization... )
>> >>>
>> >>> thanks in advance,
>> >>> Alfredo
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> 2012/3/22 seralf <[email protected]>
>> >>>
>> >>>> Thanks very much Rupert, you help me a lot in clarify my ideas :-)
>> >>>>
>> >>>> i think i'll try to follow your suggestion, and try to use my
>> thesaurus
>> >>>> with the workflow option 2)
>> >>>> i already use solr either, so it's probably the best choice for my
>> >> needs,
>> >>>> indeed
>> >>>>
>> >>>> on the other hand i'm still interested on give a try on opennlp
>> italian
>> >>>> model construction, but i can to my experiments externally, as i
>> correct
>> >>>> understand.
>> >>>>
>> >>>> thanks very much, i'll try to make some progress
>> >>>> Alfredo
>> >>>>
>> >>>>
>> >>>>
>> >>>> 2012/3/22 Rupert Westenthaler <[email protected]>
>> >>>>
>> >>>>> Hi Alfredo
>> >>>>>
>> >>>>> On 22.03.2012, at 12:24, seralf wrote:
>> >>>>>
>> >>>>>> Hi i'm new to stambol, i'm reading the documentation and examples,
>> and
>> >>>>> i'd
>> >>>>>> like to start some testing with it on italian language, if it's
>> >>>>> possible.
>> >>>>>>
>> >>>>>> Could someone give me some hint regarding the steps to try to
>> costruct
>> >>>>> my
>> >>>>>> model (Italian) and configure it inside the platform? I suppose it's
>> >>>>>> possible and it should be not very far to the steps taken for
>> >> construct
>> >>>>>> -let's say- the Spanish integration.
>> >>>>>> What i need to do? I know it could sound a very generic question,
>> but
>> >>>>> it's
>> >>>>>> not so clear from the documentation, so i need help.
>> >>>>>> For my test i would like to be able to use a text corpora from the
>> >>>>> database
>> >>>>>> of a client, and a skos thesaurus from the same domain.
>> >>>>>>
>> >>>>>> thanks in advance for every help (suggestions, code examples, ideas,
>> >>>>> etc)
>> >>>>>>
>> >>>>>
>> >>>>> In principle there are two different workflows how to extract
>> Entities
>> >>>>> form Text
>> >>>>>
>> >>>>> (1) NamedEntityExtraction (NER) [3] => NamedEntityLinking [4]
>> >>>>> (2) KeywordLinking [5]
>> >>>>>
>> >>>>>
>> >>>>> (1) requires a OpenNLP [1] NER model for the language of your
>> >> documents.
>> >>>>> However currently there are no models for the italian language
>> >> distributed
>> >>>>> by OpenNLP. This would require you to build your own models. For more
>> >>>>> information on how to do that please see the documentation of OpenNLP
>> >> [1].
>> >>>>> As soon as you have such models you need only copy them into the
>> >>>>> {stanbol-workingdir}/sling/datafiles folder. If they follow the
>> naming
>> >>>>> scheme used by OpenNLP ("{lang}-ner-{type}.bin" e.g.
>> >> "it-ner.location.bin"
>> >>>>> for the model that detects locations for italian) Stanbol will pick
>> >> them up
>> >>>>> automatically.
>> >>>>>
>> >>>>> (2) directly matches words of the text with labels of entities within
>> >> the
>> >>>>> controlled vocabulary. This process can be improved by Natural
>> Langauge
>> >>>>> Processing (e.g. Part-of-Speech tagging) but this is not a
>> requirement.
>> >>>>> Typically this works fine for datasets that contain named entities
>> >> such as
>> >>>>> concepts of an thesaurus; contacts of an company, projects, products
>> …
>> >> It
>> >>>>> does not work well with datasets that contains entities with labels
>> >> that
>> >>>>> are also used as common words in the given language as this will
>> >> result in
>> >>>>> a lot of false positives.
>> >>>>>
>> >>>>> Based on the information you provided on you use case I suggest that
>> >> (2)
>> >>>>> should work just fine for you. This user scenario [2] should provide
>> >> you
>> >>>>> will all the needed information on how to configure Stanbol for your
>> >> use
>> >>>>> case.
>> >>>>>
>> >>>>> I hope this helps. If you have any further questions feel free to ask
>> >>>>>
>> >>>>> best
>> >>>>> Rupert Westenthaler
>> >>>>>
>> >>>>> [1] http://opennlp.apache.org/
>> >>>>> [2]
>> >> http://incubator.apache.org/stanbol/docs/trunk/customvocabulary.html
>> >>>>>
>> >>>>> [3]
>> >>>>>
>> >>
>> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentityextractionengine.html
>> >>>>> [4]
>> >>>>>
>> >>
>> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentitytaggingengine.html
>> >>>>> [5]
>> >>>>>
>> >>
>> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.html
>> >>>>>
>> >>>>>> cheers,
>> >>>>>> Alfredo Serafini
>> >>>>>
>> >>>>>
>> >>>>
>> >>
>> >>
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
>> Comment: GPGTools - http://gpgtools.org
>>
>> iQEcBAEBAgAGBQJPcy+pAAoJEATpPYSyaoIkouwH/imt4ERphKHGc6tXrkLQIFWJ
>> TclWGjyCjoT1GgOr2OGjwfTS9xmcbsn3mYwfv+tuxNj2FfXfi4OfoVza6z7tZeUZ
>> WdH4+cmq+4Lg+7lt+Pbt2narYWhvUCg2Dths8tdj8nPtJSEEd2KfW5DQqnwq/CfA
>> uqOAN5zEb9rsy5gTGzSNxX66fpnM1t7XWHs2gmoD17rfmnJEQBc3l+a6rnLJdnFX
>> vABg2gEiYt5YGaZRG4V1oVC5SqEoZlysix/tkZyWcFMvXN+nvePbMDhaqBwjWc5k
>> 719uf4gW66Xf7V8zeWgwQcXlNICAebyXnsGiPqkeUaZa4nhm6v+G+FT4Ho/R4lk=
>> =10yV
>> -----END PGP SIGNATURE-----
>>



-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: stambol with italian language

Reply via email to