Hi, as I said, in general UIMA is suitable, but the tools that build on top of UIMA may not be adapted.
I know only of a few punctual efforts towards adding support for such languages. /me puts on DKPro hat (I'm working on that project) E.g. the DKPro Core [1] collection of UIMA components integrates a couple of third-party tools and models such as Stanford CoreNLP (some arabic), MaltParser dependency parsing (Farsi) or HunPos postagger (Farsi). However, support is very spotty. E.g. there is no tokenizer for either of these languages available in DKPro Core. Most of these, I've collected across the web and integrated. Where possible, I tried to set up at least a few basic unit tests to make sure these tools and models do at least something, but since I speak neither Arabic nor Farsi... well... ;) /me takes off DKPro hat and puts on WebAnno hat (I'm also working on that project) Recently, I've added a basic (experimental) RTL support to the WebAnno annotation tool [2]. WebAnno internally uses UIMA data structures (CAS) to store annotations and is based on the same UIMA type system as DKPro Core (plus you can define your own types in WebAnno). Unfortunately support for RTL languages in browsers is also rather sad. RTL support in WebAnno works best with Safari [3]. /me takes off hats So, you can use UIMA for these languages, there's already a few things there as well to build on get inspired from. Afaik there is no comprehensive open source NLP suite for Arabic or Farsi (or is there?). So if you build such, it would be great and as far as I can tell, you should be able to interface them with UIMA. Cheers, -- Richard [1] https://dkpro.github.io/dkpro-core/releases/1.7.0/models.html [2] https://webanno.github.io/webanno [3] https://github.com/webanno/webanno/issues/49 On 05.08.2015, at 17:58, d.heidarp...@ut.ac.ir wrote: > Hi, > I have the same goal but for persian, although persian and arabic are > different languages but they're using almost same orthography and I'm > planning to develope a framework with basic modules for normalizing, > stemming, POStagging, syntactic analysis, semantic/sentiment extraction > and more. Actually we are a team of 6/7 students (less or more) and each > one tries to develope one module as his/her own thesis. The whole effort > should be a framework to use in text/audio engineering apps and more > importantly for an IR system. > Is this architecture suitable for such task and language? > Thanks > Davood Heidarpour > >> Hi, >> >> at the level of the internal data representation, UIMA certainly supports >> arabic. However, specific visualization tools or analysis components may >> not support it. So if you want to program your own analysis with UIMA, you >> should be ok. If you want to use UIMA out-of-the-box for Arabic or other >> RTL languages, you might be hitting a wall. >> >> If you can explain in more detail what you plan to do, maybe we can give >> some more specific pointers. >> >> Cheers, >> >> -- Richard >> >> On 05.08.2015, at 11:09, Khaled Zaki <khaledami...@gmail.com> wrote: >> >>> hi, >>> this is khaled from Cairo University , and I'm using UIMA for the >>> first >>> time and I'm having a question considering the text mining , I was >>> wondering if the UIMA support mining the Arabic language or not and if >>> yes >>> what should I do , as I have tried to browse an Arabic file but it >>> failed >>> regards >>> Thank You in Advance. >> >> > > > >