Re: UIMA Question

Richard Eckart de Castilho Wed, 05 Aug 2015 09:13:32 -0700

Hi,

as I said, in general UIMA is suitable, but the tools that build on top of UIMA 
may not be adapted.

I know only of a few punctual efforts towards adding support for such languages.

/me puts on DKPro hat (I'm working on that project)

E.g. the DKPro Core [1] collection of UIMA components integrates a couple of 
third-party tools and models such as Stanford CoreNLP (some arabic), MaltParser 
dependency parsing (Farsi) or HunPos postagger (Farsi). However, support is 
very spotty. E.g. there is no tokenizer for either of these languages available 
in DKPro Core. Most of these, I've collected across the web and integrated. 
Where possible, I tried to set up at least a few basic unit tests to make sure 
these tools and models do at least something, but since I speak neither Arabic 
nor Farsi... well... ;)

/me takes off DKPro hat and puts on WebAnno hat (I'm also working on that 
project)

Recently, I've added a basic (experimental) RTL support to the WebAnno 
annotation tool [2]. WebAnno internally uses UIMA data structures (CAS) to 
store annotations and is based on the same UIMA type system as DKPro Core (plus 
you can define your own types in WebAnno). Unfortunately support for RTL 
languages in browsers is also rather sad. RTL support in WebAnno works best 
with Safari [3].

/me takes off hats

So, you can use UIMA for these languages, there's already a few things there as 
well to build on get inspired from. Afaik there is no comprehensive open source 
NLP suite for Arabic or Farsi (or is there?). So if you build such, it would be 
great and as far as I can tell, you should be able to interface them with UIMA.

Cheers,

-- Richard

[1] https://dkpro.github.io/dkpro-core/releases/1.7.0/models.html
[2] https://webanno.github.io/webanno
[3] https://github.com/webanno/webanno/issues/49

On 05.08.2015, at 17:58, d.heidarp...@ut.ac.ir wrote:

> Hi,
> I have the same goal but for persian, although persian and arabic are
> different languages but they're using almost same orthography and I'm
> planning to develope a framework with basic modules for normalizing,
> stemming, POStagging, syntactic analysis, semantic/sentiment extraction
> and more. Actually we are a team of 6/7  students (less or more) and each
> one tries to develope one module as his/her own thesis. The whole effort
> should be a framework to use in text/audio engineering apps and more
> importantly for an IR system.
> Is this architecture suitable for such task and language?
> Thanks
> Davood Heidarpour
> 
>> Hi,
>> 
>> at the level of the internal data representation, UIMA certainly supports
>> arabic. However, specific visualization tools or analysis components may
>> not support it. So if you want to program your own analysis with UIMA, you
>> should be ok. If you want to use UIMA out-of-the-box for Arabic or other
>> RTL languages, you might be hitting a wall.
>> 
>> If you can explain in more detail what you plan to do, maybe we can give
>> some more specific pointers.
>> 
>> Cheers,
>> 
>> -- Richard
>> 
>> On 05.08.2015, at 11:09, Khaled Zaki <khaledami...@gmail.com> wrote:
>> 
>>> hi,
>>>  this is khaled from Cairo University , and I'm using UIMA for the
>>> first
>>> time and I'm having a question considering the text mining , I was
>>> wondering if the UIMA support mining the Arabic language or not and if
>>> yes
>>> what should I do , as I have tried to browse an Arabic file but it
>>> failed
>>> regards
>>> Thank You in Advance.
>> 
>> 
> 
> 
> 
>

Re: UIMA Question

Reply via email to