Did a fresh build and inside Stanbol in localhost:8080, it is installed but is not activated. I still see the com.google.inject errors. I do see the pom.xml update from you.
-harish On Wed, Aug 1, 2012 at 12:55 AM, Walter Kasper <[email protected]> wrote: > Hi, > > The OSGI bundlöe declared some package imports that usually indeed are not > available nor required. I fixed that. Just check out the corrected pom.xml. > On a fresh clean Stanbol installation langdetect worked fine for me. > > > Best regards, > > Walter > > harish suvarna wrote: > >> Thanks Dr Walter. langdetect is very useful. I could successfully compile >> it but unable to load into stanbol as I get th error >> ====== >> ERROR: Bundle org.apache.stanbol.enhancer.**engines.langdetect [177]: >> Error >> starting/stopping bundle. (org.osgi.framework.**BundleException: >> Unresolved >> constraint in bundle org.apache.stanbol.enhancer.**engines.langdetect >> [177]: >> Unable to resolve 177.0: missing requirement [177.0] package; >> (package=com.google.inject)) >> org.osgi.framework.**BundleException: Unresolved constraint in bundle >> org.apache.stanbol.enhancer.**engines.langdetect [177]: Unable to resolve >> 177.0: missing requirement [177.0] package; (package=com.google.inject) >> at org.apache.felix.framework.**Felix.resolveBundle(Felix.** >> java:3443) >> at org.apache.felix.framework.**Felix.startBundle(Felix.java:**1727) >> at org.apache.felix.framework.**Felix.setBundleStartLevel(** >> Felix.java:1333) >> at >> org.apache.felix.framework.**StartLevelImpl.run(** >> StartLevelImpl.java:270) >> at java.lang.Thread.run(Thread.**java:680) >> ============== >> >> I added the dependency >> <dependency> >> <groupId>com.google.inject</**groupId> >> <artifactId>guice</artifactId> >> <version>3.0</version> >> </dependency> >> >> but looks like it is looking for version 1.3.0, which I can't find in >> repo1.maven.org. I am not sure who is needing the inject library. The >> entire source of langdetect plugin does not contain the word inject. Only >> the manifest file in target/classes has this listed. >> >> >> -harish >> >> On Tue, Jul 31, 2012 at 1:32 AM, Walter Kasper <[email protected]> wrote: >> >> Hi Harish, >>> >>> I checked in a new language identifier for Stanbol based on >>> http://code.google.com/p/****language-detection/<http://code.google.com/p/**language-detection/> >>> <http://**code.google.com/p/language-**detection/<http://code.google.com/p/language-detection/> >>> >. >>> >>> Just check out from Stanbol trunk, install and try out. >>> >>> >>> Best regards, >>> >>> Walter >>> >>> harish suvarna wrote: >>> >>> Rupert, >>>> My initial debugging for Chinese text told me that the language >>>> identification done by langid enhancer using apache tika does not >>>> recognize >>>> chinese. The tika language detection seems is not supporting the CJK >>>> languages. With the result, the chinese language is identified as >>>> lithuanian language 'lt' . The apache tika group has an enhancement item >>>> 856 registered for detecting cjk languages >>>> >>>> https://issues.apache.org/****jira/browse/TIKA-856<https://issues.apache.org/**jira/browse/TIKA-856> >>>> <https://**issues.apache.org/jira/browse/**TIKA-856<https://issues.apache.org/jira/browse/TIKA-856> >>>> > >>>> >>>> in Feb 2012. I am not sure about the use of language identification >>>> in >>>> stanbol yet. Is the language id used to select the dbpedia index >>>> (approprite dbpedia language dump) for entity lookups? >>>> >>>> >>>> I am just thinking that, for my purpose, pick option 3 and make sure >>>> that >>>> it is of my language of my interest and then call paoding segmenter. >>>> Then >>>> iterate over the ngrams and do an entityhub lookup. I just still need to >>>> understand the code around how the whole entity lookup for dbpedia >>>> works. >>>> >>>> I find that the language detection library >>>> http://code.google.com/p/****language-detection/<http://code.google.com/p/**language-detection/> >>>> <http://**code.google.com/p/language-**detection/<http://code.google.com/p/language-detection/>>is >>>> very good at language >>>> >>>> detection. It supports 53 languages out of box and the quality seems >>>> good. >>>> It is apache 2.0 license. I could volunteer to create a new langid >>>> engine >>>> based on this with the stanbol community approval. So if anyone sheds >>>> some >>>> light on how to add a new java library into stanbol, that be great. I >>>> am a >>>> maven beginner now. >>>> >>>> Thanks, >>>> harish >>>> >>>> >>>> >>>> >>>> On Thu, Jul 26, 2012 at 9:46 PM, Rupert Westenthaler < >>>> [email protected]> wrote: >>>> >>>> Hi harish, >>>> >>>>> Note: Sorry I forgot to include the stanbol-dev mailing list in my last >>>>> answer. >>>>> >>>>> >>>>> On Fri, Jul 27, 2012 at 3:33 AM, harish suvarna <[email protected]> >>>>> wrote: >>>>> >>>>> Thanks a lot Rupert. >>>>>> >>>>>> I am weighing between options 2 and 3. What is the difference? >>>>>> Optiion 2 >>>>>> sounds like enhancing KeyWordLinkingEngine to deal with chinese text. >>>>>> It >>>>>> >>>>>> may >>>>> >>>>> be like paoding is hardcoded into KeyWordLinkingEngine. Option 3 is >>>>>> like >>>>>> >>>>>> a >>>>> >>>>> separate engine. >>>>>> >>>>>> Option (2) will require some work improvements on the Stanbol side. >>>>> However there where already discussion on how to create a "text >>>>> processing chain" that allows to split up things like tokenizing, POS >>>>> tagging, Lemmatizing ... in different Enhancement Engines without >>>>> suffering form disadvantages of creating high amounts of RDF triples. >>>>> One Idea was to base this on the Apache Lucene TokenStream [1] API and >>>>> share the data as ContentPart [2] of the ContentItem. >>>>> >>>>> Option (3) indeed means that you will create your own >>>>> EnhancementEngine - a similar one to the KeywordLinkingEngine. >>>>> >>>>> But will I be able to use the stanbol dbpedia lookup using option >>>>> 3? >>>>> Yes. You need only to obtain a Entityhub "ReferencedSite" and use the >>>>> "FieldQuery" interface to search for Entities (see [1] for an example) >>>>> >>>>> best >>>>> Rupert >>>>> >>>>> [1] >>>>> http://blog.mikemccandless.****com/2012/04/lucenes-** >>>>> tokenstreams-are-actually.**html<http://blog.** >>>>> mikemccandless.com/2012/04/**lucenes-tokenstreams-are-**actually.html<http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html> >>>>> > >>>>> [2] >>>>> http://incubator.apache.org/****stanbol/docs/trunk/components/****<http://incubator.apache.org/**stanbol/docs/trunk/components/**> >>>>> enhancer/contentitem.html#****content-parts<http://** >>>>> incubator.apache.org/stanbol/**docs/trunk/components/** >>>>> enhancer/contentitem.html#**content-parts<http://incubator.apache.org/stanbol/docs/trunk/components/enhancer/contentitem.html#content-parts> >>>>> > >>>>> [3] >>>>> http://svn.apache.org/repos/****asf/incubator/stanbol/trunk/**<http://svn.apache.org/repos/**asf/incubator/stanbol/trunk/**> >>>>> enhancer/engines/****keywordextraction/src/main/**** >>>>> java/org/apache/stanbol/ >>>>> **enhancer/engines/****keywordextraction/linking/** >>>>> impl/EntitySearcherUtils.java<**http://svn.apache.org/repos/** >>>>> asf/incubator/stanbol/trunk/**enhancer/engines/** >>>>> keywordextraction/src/main/**java/org/apache/stanbol/** >>>>> enhancer/engines/**keywordextraction/linking/** >>>>> impl/EntitySearcherUtils.java<http://svn.apache.org/repos/asf/incubator/stanbol/trunk/enhancer/engines/keywordextraction/src/main/java/org/apache/stanbol/enhancer/engines/keywordextraction/linking/impl/EntitySearcherUtils.java> >>>>> > >>>>> >>>>> >>>>> >>>>> Btw, I created my own enhancement engine chains and I could see them >>>>> >>>>>> yesterday in localhost:8080. But today all of them have vanished and >>>>>> only >>>>>> the default chain shows up. Can I dig them up somewhere in the stanbol >>>>>> directory? >>>>>> >>>>>> -harish >>>>>> >>>>>> I just created the eclipse project >>>>>> On Thu, Jul 26, 2012 at 5:04 AM, Rupert Westenthaler >>>>>> <[email protected]****> wrote: >>>>>> >>>>>> Hi, >>>>>>> >>>>>>> There are no NER (Named Entity Recognition) models for Chinese text >>>>>>> available via OpenNLP. So the default configuration of Stanbol will >>>>>>> not process Chinese text. What you can do is to configure a >>>>>>> KeywordLinking Engine for Chinese text as this engine can also >>>>>>> process >>>>>>> in unknown languages (see [1] for details). >>>>>>> >>>>>>> However also the KeywordLinking Engine requires at least n tokenizer >>>>>>> for looking up Words. As there is no specific Tokenizer for OpenNLP >>>>>>> Chinese text it will use the default one that uses a fixed set of >>>>>>> chars to split words (white spaces, hyphens ...). You may better how >>>>>>> well this would work with Chinese texts. My assumption would be that >>>>>>> it is not sufficient - so results will be sub-optimal. >>>>>>> >>>>>>> To apply Chinese optimization I see three possibilities: >>>>>>> >>>>>>> 1. add support for Chinese to OpenNLP (Tokenizer, Sentence detection, >>>>>>> POS tagging, Named Entity Detection) >>>>>>> 2. allow the KeywordLinkingEngine to use other already available >>>>>>> tools >>>>>>> for text processing (e.g. stuff that is already available for >>>>>>> Solr/Lucene [2] or the paoding chinese segment or referenced in you >>>>>>> mail). Currently the KeywordLinkingEngine is hardwired with OpenNLP, >>>>>>> because representing Tokens, POS ... as RDF would be to much of an >>>>>>> overhead. >>>>>>> 3. implement a new EnhancementEngine for processing Chinese text. >>>>>>> >>>>>>> Hope this helps to get you started. >>>>>>> >>>>>>> best >>>>>>> Rupert >>>>>>> >>>>>>> [1] >>>>>>> http://incubator.apache.org/****stanbol/docs/trunk/**<http://incubator.apache.org/**stanbol/docs/trunk/**> >>>>>>> multilingual.html<http://**incubator.apache.org/stanbol/** >>>>>>> docs/trunk/multilingual.html<http://incubator.apache.org/stanbol/docs/trunk/multilingual.html> >>>>>>> > >>>>>>> [2] >>>>>>> >>>>>>> >>>>>>> http://wiki.apache.org/solr/****LanguageAnalysis#Chinese.2C_**<http://wiki.apache.org/solr/**LanguageAnalysis#Chinese.2C_**> >>>>>>> >>>>>> Japanese.2C_Korean<http://**wiki.apache.org/solr/** >>>>> LanguageAnalysis#Chinese.2C_**Japanese.2C_Korean<http://wiki.apache.org/solr/LanguageAnalysis#Chinese.2C_Japanese.2C_Korean> >>>>> > >>>>> >>>>> On Thu, Jul 26, 2012 at 2:00 AM, harish suvarna <[email protected]> >>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> Hi Rupert, >>>>>>>> Finally I am getting some time to work on Stanbol. My job is to >>>>>>>> demonstrate >>>>>>>> Stanbol annotations for Chinese text. >>>>>>>> I am just starting on it. I am following the instructions to build >>>>>>>> an >>>>>>>> enhancement engine from Anuj's blog. dbpedia has some chinese data >>>>>>>> >>>>>>>> dump >>>>>>> >>>>>> too. >>>>>> >>>>>>> We may have to depend on the ngrams as keys and look them up in the >>>>>>>> dbpedia >>>>>>>> labels. >>>>>>>> >>>>>>>> I am planning to use the paoding chinese segmentor >>>>>>>> (http://code.google.com/p/****paoding/<http://code.google.com/p/**paoding/> >>>>>>>> <http://code.google.**com/p/paoding/<http://code.google.com/p/paoding/> >>>>>>>> >) >>>>>>>> >>>>>>>> for word breaking. >>>>>>>> >>>>>>>> Just curious. I pasted some chinese text in default engine of >>>>>>>> stanbol. >>>>>>>> It >>>>>>>> kind of finished the processing in no time at all. This gave me >>>>>>>> suspicion >>>>>>>> that may be if the language is chinese, no further processing is >>>>>>>> done. >>>>>>>> Is it >>>>>>>> right? Any more tips for making all this work in Stanbol? >>>>>>>> >>>>>>>> -harish >>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> | Rupert Westenthaler [email protected] >>>>>>> | Bodenlehenstraße 11 ++43-699-11108907 >>>>>>> | A-5500 Bischofshofen >>>>>>> >>>>>>> >>>>>> -- >>>>> | Rupert Westenthaler [email protected] >>>>> | Bodenlehenstraße 11 ++43-699-11108907 >>>>> | A-5500 Bischofshofen >>>>> >>>>> >>>>> -- >>> Dr. Walter Kasper >>> DFKI GmbH >>> Stuhlsatzenhausweg 3 >>> D-66123 Saarbrücken >>> Tel.: +49-681-85775-5300 >>> Fax: +49-681-85775-5338 >>> Email: [email protected] >>> ------------------------------****----------------------------**--**- >>> >>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH >>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern >>> >>> Geschaeftsfuehrung: >>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) >>> Dr. Walter Olthoff >>> >>> Vorsitzender des Aufsichtsrats: >>> Prof. Dr. h.c. Hans A. Aukes >>> >>> Amtsgericht Kaiserslautern, HRB 2313 >>> ------------------------------****----------------------------**--**- >>> >>> >>> > > -- > Dr. Walter Kasper > DFKI GmbH > Stuhlsatzenhausweg 3 > D-66123 Saarbrücken > Tel.: +49-681-85775-5300 > Fax: +49-681-85775-5338 > Email: [email protected] > ------------------------------**------------------------------**- > Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH > Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern > > Geschaeftsfuehrung: > Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) > Dr. Walter Olthoff > > Vorsitzender des Aufsichtsrats: > Prof. Dr. h.c. Hans A. Aukes > > Amtsgericht Kaiserslautern, HRB 2313 > ------------------------------**------------------------------**- > >
