Re: stambol with italian language

seralf Wed, 28 Mar 2012 08:24:52 -0700

yes i have already started the bundle, but if i search from the web
interface or via a command line like:
curl -X POST -d "name=roma*&limit=10&offset=0"
http://localhost:8080/entityhub/site/<SITE-NAME>/find


i have the error i pasted.

Any suggestions? maybe i miss some configuration step?

2012/3/28 Michel Benevento <[email protected]>

> Have you started your installed bundle in the admin console? Click the
> little triangle next to it so it becomes a square and the status message
> updates.
>
> Michel
>
>
> On 28 mrt. 2012, at 17:02, seralf wrote:
>
> > Hi i'm trying to use the KeywordLinking as Rupert suggested me earlier.
> > I've done the solr indexes as in the tutorial and they seems to be ok (i
> > looked inside them with Luke), i've copied them in ROOT/sling/dataset,
> and
> > then installed the generated bundle via the console.
> >
> > Now i have a strange error: seems like stanbol is not actually load my
> > indexes, or for some reason it has not activated the yard
> >
> > java.lang.IllegalStateException: Unable to initialize the Cache with Yard
> >> <SITE-NAME> Index! This is usually caused by Errors while reading the
> Cache
> >> Configuration from the Yard.
> >>    at
> >>
> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
> >>    at
> >>
> org.apache.stanbol.entityhub.core.site.CacheImpl.findRepresentation(CacheImpl.java:331)
> >>    ...
> >> Caused by: org.apache.stanbol.entityhub.servicesapi.yard.YardException:
> >> The SolrIndex '<SITE-NAME>' for SolrYard '<SITE-NAME> Index' is
> currently
> >> not active!
> >>    ...
> >>
> >
> > does anyone has suggestion on this?
> >
> > i have two other related questions:
> > 1) how can i start stanbol with specific config activated?
> > 2) is there any way to manage the deploy/activation via some kind of rest
> > interface? (for example curl? it could be helpful for doing some
> > automatization... )
> >
> > thanks in advance,
> > Alfredo
> >
> >
> >
> >
> > 2012/3/22 seralf <[email protected]>
> >
> >> Thanks very much Rupert, you help me a lot in clarify my ideas :-)
> >>
> >> i think i'll try to follow your suggestion, and try to use my thesaurus
> >> with the workflow option 2)
> >> i already use solr either, so it's probably the best choice for my
> needs,
> >> indeed
> >>
> >> on the other hand i'm still interested on give a try on opennlp italian
> >> model construction, but i can to my experiments externally, as i correct
> >> understand.
> >>
> >> thanks very much, i'll try to make some progress
> >> Alfredo
> >>
> >>
> >>
> >> 2012/3/22 Rupert Westenthaler <[email protected]>
> >>
> >>> Hi Alfredo
> >>>
> >>> On 22.03.2012, at 12:24, seralf wrote:
> >>>
> >>>> Hi i'm new to stambol, i'm reading the documentation and examples, and
> >>> i'd
> >>>> like to start some testing with it on italian language, if it's
> >>> possible.
> >>>>
> >>>> Could someone give me some hint regarding the steps to try to costruct
> >>> my
> >>>> model (Italian) and configure it inside the platform? I suppose it's
> >>>> possible and it should be not very far to the steps taken for
> construct
> >>>> -let's say- the Spanish integration.
> >>>> What i need to do? I know it could sound a very generic question, but
> >>> it's
> >>>> not so clear from the documentation, so i need help.
> >>>> For my test i would like to be able to use a text corpora from the
> >>> database
> >>>> of a client, and a skos thesaurus from the same domain.
> >>>>
> >>>> thanks in advance for every help (suggestions, code examples, ideas,
> >>> etc)
> >>>>
> >>>
> >>> In principle there are two different workflows how to extract Entities
> >>> form Text
> >>>
> >>> (1) NamedEntityExtraction (NER) [3] => NamedEntityLinking [4]
> >>> (2) KeywordLinking [5]
> >>>
> >>>
> >>> (1) requires a OpenNLP [1] NER model for the language of your
> documents.
> >>> However currently there are no models for the italian language
> distributed
> >>> by OpenNLP. This would require you to build your own models. For more
> >>> information on how to do that please see the documentation of OpenNLP
> [1].
> >>> As soon as you have such models you need only copy them into the
> >>> {stanbol-workingdir}/sling/datafiles folder. If they follow the naming
> >>> scheme used by OpenNLP ("{lang}-ner-{type}.bin" e.g.
> "it-ner.location.bin"
> >>> for the model that detects locations for italian) Stanbol will pick
> them up
> >>> automatically.
> >>>
> >>> (2) directly matches words of the text with labels of entities within
> the
> >>> controlled vocabulary. This process can be improved by Natural Langauge
> >>> Processing (e.g. Part-of-Speech tagging) but this is not a requirement.
> >>> Typically this works fine for datasets that contain named entities
> such as
> >>> concepts of an thesaurus; contacts of an company, projects, products …
> It
> >>> does not work well with datasets that contains entities with labels
> that
> >>> are also used as common words in the given language as this will
> result in
> >>> a lot of false positives.
> >>>
> >>> Based on the information you provided on you use case I suggest that
> (2)
> >>> should work just fine for you. This user scenario [2] should provide
> you
> >>> will all the needed information on how to configure Stanbol for your
> use
> >>> case.
> >>>
> >>> I hope this helps. If you have any further questions feel free to ask
> >>>
> >>> best
> >>> Rupert Westenthaler
> >>>
> >>> [1] http://opennlp.apache.org/
> >>> [2]
> http://incubator.apache.org/stanbol/docs/trunk/customvocabulary.html
> >>>
> >>> [3]
> >>>
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentityextractionengine.html
> >>> [4]
> >>>
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentitytaggingengine.html
> >>> [5]
> >>>
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.html
> >>>
> >>>> cheers,
> >>>> Alfredo Serafini
> >>>
> >>>
> >>
>
>

Re: stambol with italian language

Reply via email to