In order to apply Sense Cluster to an agglutinative language (Turkish), I
have to do following setup and got some result.
The sense-cluster in hand take care of just "white space" in the corpus, and
counts the words' frequency, bigrams etc... However, I think, we do not
apply same principle to Agglutinative language. Because a word in such
language almost contains lots of code, case, phonological transformation,
with help of lots of affixes or Suffixes.
E.g. the word "masa" (table) can become as
- masam (my table)
- masaya (to table)
- masada (on the table)
In brief, I want to apply white space approach, I need the root OR Lemma of
the words in agglutinative langs...
Therefore I replace all words with their ROOT, and I got- not perfect but
some premising result as follows.
1)
---------------------------------------------
Four Senses Dist: (24.21) (30.53) (18.42) (26.84)
Precision = 46.32(88/190)
Recall = 44.44(88/190+8)
F-Measure = 45.36
---------------------------------
This score is just one of my experiments. I think difference betweem
Majority Sense (%30.53) and F-measue (45.36) can be considered as promising
results.
2)
In order to guarantee healthy, well planed experiments, I used pseudo-word
approach, I conflated some words...
3)
Sense-cluster does not accept ISO-8859-9 encode text. That is , the program
deletes special character such as (ü,ğ,ş,i) So I need to change special
character (ç-> xc , ü ->xu, ğ ->xg ..etc) . I think NGRAM package does not
accept them
4) My some contexts have a few words , 5-6 words... I want to eliminate such
context, the program does not deal such weak context...
Thank you very much your interest
> --
> Savas Yildirim
>
> Eberhard Karls Universität Tübingen & Istanbul Bilgi University
>
> Postal Address in Tuebingen:
> Seminar für Sprachwissenschaft
> Universität Tübingen
> Wilhelmstraße 19
> Room 1.07
> D-72074 Tübingen
>
> Postal Address in Istanbul:
> Sisli 34440 Dolapdere Kurtulusdere cad. No:47
> Istanbul / Turkey
> Phone:
> (0090) (212) 311 50 00
>
>
--
Savas Yildirim
Eberhard Karls Universität Tübingen & Istanbul Bilgi University
Postal Address in Tuebingen:
Seminar für Sprachwissenschaft
Universität Tübingen
Wilhelmstraße 19
Room 1.07
D-72074 Tübingen
Postal Address in Istanbul:
Sisli 34440 Dolapdere Kurtulusdere cad. No:47
Istanbul / Turkey
Phone:
(0090) (212) 311 50 00
------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users