Re: stambol with italian language

seralf Wed, 28 Mar 2012 16:08:12 -0700

ok, thanks very much

2012/3/28 Rupert Westenthaler <[email protected]>


> Hi,
>
> and sorry for the late response ...
>
> >> ~/sw/stanbol/launchers$ java -Xmx1024m -jar
> >>
> full/target/org.apache.stanbol.launchers.full-0.9.0-incubating-SNAPSHOT.jar
> >> start -c ../sling *doesn't work for me*
> >>
> >> ~/sw/stanbol/launchers$ java -Xmx1024m -jar
> >>
> full/target/org.apache.stanbol.launchers.full-0.9.0-incubating-SNAPSHOT.jar
> >> start *works*
> >>
>
> the reason for that is that in a lot of cases the default sling folder
> "/sling" is hardcoded in Stanbol.
> One such example is the MainDataFileProvider that serves the files
> located in /sling/datafiles.
>
> I consider this behavior as a bug and created
> https://issues.apache.org/jira/browse/STANBOL-561
>
> While doing that I will also change the default from
> "{wokring-dir}/sling" to "{working-dir}/stanbol"  (see
> https://issues.apache.org/jira/browse/STANBOL-562)
>
> In the meantime you will need to use the default
>
> best
> Rupert
>
> >
> >
> > thanks,
> > Alfredo
> >
> > 2012/3/28 [email protected] <[email protected]>
> >
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA1
> >>
> >> I've seen this happen loading custom vocabularies built by the Generic
> RDF
> >> Indexer and I'm honestly still not sure of why. In my case, restarting
> the
> >> custom bundle and the Solr Yard bundle seemed to make it work. I imagine
> >> that restarting Stanbol would do the same. Perhaps there is some subtle
> >> error in the building of the custom bundle that makes it possible for a
> >> Solr index service to be created but not started?
> >>
> >> As to managing configuration, you may want to follow:
> >>
> >> https://issues.apache.org/jira/browse/STANBOL-529
> >>
> >> which offers a future way to provide configuration at startup. I'm not
> >> familiar enough with the Sling Launcher system to know how difficult it
> >> would be to directly expose deployment via REST, but it might be more
> >> feasible using the Apache Felix Web Console which is normally included
> in
> >> Stanbol builds:
> >>
> >> http://felix.apache.org/site/web-console-restful-api.html
> >>
> >>
> http://felix.apache.org/site/apache-felix-web-console.html#ApacheFelixWebConsole-RESTfulAPI
> >>
> >> - ---
> >> A. Soroka
> >> Software & Systems Engineering :: Online Library Environment
> >> the University of Virginia Library
> >>
> >> On Mar 28, 2012, at 11:24 AM, seralf wrote:
> >>
> >> > yes i have already started the bundle, but if i search from the web
> >> > interface or via a command line like:
> >> > curl -X POST -d "name=roma*&limit=10&offset=0"
> >> > http://localhost:8080/entityhub/site/<SITE-NAME>/find
> >> >
> >> > i have the error i pasted.
> >> >
> >> > Any suggestions? maybe i miss some configuration step?
> >> >
> >> > 2012/3/28 Michel Benevento <[email protected]>
> >> >
> >> >> Have you started your installed bundle in the admin console? Click
> the
> >> >> little triangle next to it so it becomes a square and the status
> message
> >> >> updates.
> >> >>
> >> >> Michel
> >> >>
> >> >>
> >> >> On 28 mrt. 2012, at 17:02, seralf wrote:
> >> >>
> >> >>> Hi i'm trying to use the KeywordLinking as Rupert suggested me
> earlier.
> >> >>> I've done the solr indexes as in the tutorial and they seems to be
> ok
> >> (i
> >> >>> looked inside them with Luke), i've copied them in
> ROOT/sling/dataset,
> >> >> and
> >> >>> then installed the generated bundle via the console.
> >> >>>
> >> >>> Now i have a strange error: seems like stanbol is not actually load
> my
> >> >>> indexes, or for some reason it has not activated the yard
> >> >>>
> >> >>> java.lang.IllegalStateException: Unable to initialize the Cache with
> >> Yard
> >> >>>> <SITE-NAME> Index! This is usually caused by Errors while reading
> the
> >> >> Cache
> >> >>>> Configuration from the Yard.
> >> >>>>   at
> >> >>>>
> >> >>
> >>
> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
> >> >>>>   at
> >> >>>>
> >> >>
> >>
> org.apache.stanbol.entityhub.core.site.CacheImpl.findRepresentation(CacheImpl.java:331)
> >> >>>>   ...
> >> >>>> Caused by:
> >> org.apache.stanbol.entityhub.servicesapi.yard.YardException:
> >> >>>> The SolrIndex '<SITE-NAME>' for SolrYard '<SITE-NAME> Index' is
> >> >> currently
> >> >>>> not active!
> >> >>>>   ...
> >> >>>>
> >> >>>
> >> >>> does anyone has suggestion on this?
> >> >>>
> >> >>> i have two other related questions:
> >> >>> 1) how can i start stanbol with specific config activated?
> >> >>> 2) is there any way to manage the deploy/activation via some kind of
> >> rest
> >> >>> interface? (for example curl? it could be helpful for doing some
> >> >>> automatization... )
> >> >>>
> >> >>> thanks in advance,
> >> >>> Alfredo
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> 2012/3/22 seralf <[email protected]>
> >> >>>
> >> >>>> Thanks very much Rupert, you help me a lot in clarify my ideas :-)
> >> >>>>
> >> >>>> i think i'll try to follow your suggestion, and try to use my
> >> thesaurus
> >> >>>> with the workflow option 2)
> >> >>>> i already use solr either, so it's probably the best choice for my
> >> >> needs,
> >> >>>> indeed
> >> >>>>
> >> >>>> on the other hand i'm still interested on give a try on opennlp
> >> italian
> >> >>>> model construction, but i can to my experiments externally, as i
> >> correct
> >> >>>> understand.
> >> >>>>
> >> >>>> thanks very much, i'll try to make some progress
> >> >>>> Alfredo
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> 2012/3/22 Rupert Westenthaler <[email protected]>
> >> >>>>
> >> >>>>> Hi Alfredo
> >> >>>>>
> >> >>>>> On 22.03.2012, at 12:24, seralf wrote:
> >> >>>>>
> >> >>>>>> Hi i'm new to stambol, i'm reading the documentation and
> examples,
> >> and
> >> >>>>> i'd
> >> >>>>>> like to start some testing with it on italian language, if it's
> >> >>>>> possible.
> >> >>>>>>
> >> >>>>>> Could someone give me some hint regarding the steps to try to
> >> costruct
> >> >>>>> my
> >> >>>>>> model (Italian) and configure it inside the platform? I suppose
> it's
> >> >>>>>> possible and it should be not very far to the steps taken for
> >> >> construct
> >> >>>>>> -let's say- the Spanish integration.
> >> >>>>>> What i need to do? I know it could sound a very generic question,
> >> but
> >> >>>>> it's
> >> >>>>>> not so clear from the documentation, so i need help.
> >> >>>>>> For my test i would like to be able to use a text corpora from
> the
> >> >>>>> database
> >> >>>>>> of a client, and a skos thesaurus from the same domain.
> >> >>>>>>
> >> >>>>>> thanks in advance for every help (suggestions, code examples,
> ideas,
> >> >>>>> etc)
> >> >>>>>>
> >> >>>>>
> >> >>>>> In principle there are two different workflows how to extract
> >> Entities
> >> >>>>> form Text
> >> >>>>>
> >> >>>>> (1) NamedEntityExtraction (NER) [3] => NamedEntityLinking [4]
> >> >>>>> (2) KeywordLinking [5]
> >> >>>>>
> >> >>>>>
> >> >>>>> (1) requires a OpenNLP [1] NER model for the language of your
> >> >> documents.
> >> >>>>> However currently there are no models for the italian language
> >> >> distributed
> >> >>>>> by OpenNLP. This would require you to build your own models. For
> more
> >> >>>>> information on how to do that please see the documentation of
> OpenNLP
> >> >> [1].
> >> >>>>> As soon as you have such models you need only copy them into the
> >> >>>>> {stanbol-workingdir}/sling/datafiles folder. If they follow the
> >> naming
> >> >>>>> scheme used by OpenNLP ("{lang}-ner-{type}.bin" e.g.
> >> >> "it-ner.location.bin"
> >> >>>>> for the model that detects locations for italian) Stanbol will
> pick
> >> >> them up
> >> >>>>> automatically.
> >> >>>>>
> >> >>>>> (2) directly matches words of the text with labels of entities
> within
> >> >> the
> >> >>>>> controlled vocabulary. This process can be improved by Natural
> >> Langauge
> >> >>>>> Processing (e.g. Part-of-Speech tagging) but this is not a
> >> requirement.
> >> >>>>> Typically this works fine for datasets that contain named entities
> >> >> such as
> >> >>>>> concepts of an thesaurus; contacts of an company, projects,
> products
> >> …
> >> >> It
> >> >>>>> does not work well with datasets that contains entities with
> labels
> >> >> that
> >> >>>>> are also used as common words in the given language as this will
> >> >> result in
> >> >>>>> a lot of false positives.
> >> >>>>>
> >> >>>>> Based on the information you provided on you use case I suggest
> that
> >> >> (2)
> >> >>>>> should work just fine for you. This user scenario [2] should
> provide
> >> >> you
> >> >>>>> will all the needed information on how to configure Stanbol for
> your
> >> >> use
> >> >>>>> case.
> >> >>>>>
> >> >>>>> I hope this helps. If you have any further questions feel free to
> ask
> >> >>>>>
> >> >>>>> best
> >> >>>>> Rupert Westenthaler
> >> >>>>>
> >> >>>>> [1] http://opennlp.apache.org/
> >> >>>>> [2]
> >> >> http://incubator.apache.org/stanbol/docs/trunk/customvocabulary.html
> >> >>>>>
> >> >>>>> [3]
> >> >>>>>
> >> >>
> >>
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentityextractionengine.html
> >> >>>>> [4]
> >> >>>>>
> >> >>
> >>
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentitytaggingengine.html
> >> >>>>> [5]
> >> >>>>>
> >> >>
> >>
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.html
> >> >>>>>
> >> >>>>>> cheers,
> >> >>>>>> Alfredo Serafini
> >> >>>>>
> >> >>>>>
> >> >>>>
> >> >>
> >> >>
> >>
> >> -----BEGIN PGP SIGNATURE-----
> >> Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
> >> Comment: GPGTools - http://gpgtools.org
> >>
> >> iQEcBAEBAgAGBQJPcy+pAAoJEATpPYSyaoIkouwH/imt4ERphKHGc6tXrkLQIFWJ
> >> TclWGjyCjoT1GgOr2OGjwfTS9xmcbsn3mYwfv+tuxNj2FfXfi4OfoVza6z7tZeUZ
> >> WdH4+cmq+4Lg+7lt+Pbt2narYWhvUCg2Dths8tdj8nPtJSEEd2KfW5DQqnwq/CfA
> >> uqOAN5zEb9rsy5gTGzSNxX66fpnM1t7XWHs2gmoD17rfmnJEQBc3l+a6rnLJdnFX
> >> vABg2gEiYt5YGaZRG4V1oVC5SqEoZlysix/tkZyWcFMvXN+nvePbMDhaqBwjWc5k
> >> 719uf4gW66Xf7V8zeWgwQcXlNICAebyXnsGiPqkeUaZa4nhm6v+G+FT4Ho/R4lk=
> >> =10yV
> >> -----END PGP SIGNATURE-----
> >>
>
>
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: stambol with italian language

Reply via email to