Re: [Senseclusters-users] for special character

Ted Pedersen Sun, 08 Mar 2009 20:33:29 -0700

This is an issue that has come up in the context of the Ngram
Statistics Package, which is one of the underlying pieces of software
for SenseClusters. There is fairly extensive discussion of encoding
issues on the NSP mailing list, and if that is something that would be
interesting you might want to subscribe (it's a yahoo group...


http://tech.groups.yahoo.com/group/ngram/

However, the short version of that discussion is to modify the .pl
files to include the following line at the top of each .pl file...

use locale;

This is a bit of a hack however (with some drawbacks as discussed in
the mailing list above...)

But, you might want to try this and see if it helps!

Let us know how things go with this...or if you encounter any
different/better solutions.

Thanks,
Ted

On Thu, Mar 5, 2009 at 7:50 AM, Savas Yildirim <[email protected]> wrote:
> SenseCluster And Ngram delete special character (ü,ö,ş) in context.
> E.g. the word müssen occur as "m ssen" in SenseCluster and n-gram as well.
> Is there any solution for this ?
>
> I know that romanian language is used with SenseCluster.
>
> My simple solution is replacing such word "ü" with "xxu"
>
> --
> Savas Yildirim
>


-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

Re: [Senseclusters-users] for special character

Reply via email to