In order to apply Sense Cluster to an agglutinative language (Turkish), I
have to do following setup and got some result.

The sense-cluster in hand take care of just "white space" in the corpus, and
counts the words' frequency, bigrams etc... However, I think, we do not
apply same principle to Agglutinative language. Because  a word in such
language almost contains lots of code, case, phonological transformation,
with  help of lots of affixes or Suffixes.
E.g. the word "masa" (table) can become as
- masam (my table)
- masaya (to table)
- masada (on the table)

In brief, I want to apply white space approach, I need the root OR Lemma of
the words in agglutinative langs...
Therefore I replace all words with their ROOT, and I got- not perfect but
some premising result as follows.

1)
---------------------------------------------
Four Senses Dist: (24.21)   (30.53)   (18.42)   (26.84)

Precision = 46.32(88/190)
Recall = 44.44(88/190+8)
F-Measure = 45.36
---------------------------------

This score is just one of my experiments. I think difference betweem
Majority Sense (%30.53) and F-measue (45.36) can be considered as promising
results.

2)
In order to guarantee healthy, well planed experiments, I used pseudo-word
approach, I conflated some words...

3)
Sense-cluster does not accept ISO-8859-9 encode text. That is , the program
deletes special character such as (ü,ğ,ş,i)  So I need to change special
character (ç-> xc , ü ->xu, ğ ->xg ..etc) . I think NGRAM package does not
accept them

4) My some contexts have a few words , 5-6 words... I want to eliminate such
context, the program does not deal such weak context...

Thank you very much your interest


> --
> Savas Yildirim
>
> Eberhard Karls Universität Tübingen & Istanbul Bilgi University
>
> Postal Address in Tuebingen:
> Seminar für Sprachwissenschaft
> Universität Tübingen
> Wilhelmstraße 19
> Room 1.07
> D-72074 Tübingen
>
> Postal Address in Istanbul:
> Sisli 34440 Dolapdere Kurtulusdere cad. No:47
> Istanbul / Turkey
> Phone:
> (0090) (212) 311 50 00
>
>


-- 
Savas Yildirim

Eberhard Karls Universität Tübingen & Istanbul Bilgi University

Postal Address in Tuebingen:
Seminar für Sprachwissenschaft
Universität Tübingen
Wilhelmstraße 19
Room 1.07
D-72074 Tübingen

Postal Address in Istanbul:
Sisli 34440 Dolapdere Kurtulusdere cad. No:47
Istanbul / Turkey
Phone:
(0090) (212) 311 50 00
------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

Reply via email to