hi,
In java we can sort Pinyin like this: (Sun provide a Comparator)

public int compare(String o1, String o2) {
        return Collator.getInstance(Locale.CHINESE).compare(o1, o2);
    }

But it's  got some flaws.
You know there are so many homophones in Chinese.
But in Sun's Comparator they don't equals each other.

Assert.assertTrue(comparator.compare("怕", "帕") != 0); //怕 pà 帕 pà

And the unfamiliar Chinese character never sort userful in this Comparator.
Some like '怡'.

Assert.assertTrue(comparator.compare("怡", "张") > 0); //怡 yí 张 zhāng

With luck, there is a open source project at sf .
http://pinyin4j.sourceforge.net/

So we can Convert Chinese to Pinyin.Then it will be easy.

I can provide a Java code ,some  not coding by myself.
-------------------------------------------------------------------------

/**
  * @author Jeff
  *
  * Copyright (c)
  */
package chinese.utility;

import java.util.Comparator;
import net.sourceforge.pinyin4j.PinyinHelper;

public class PinyinComparator implements Comparator<String> {

    public int compare(String o1, String o2) {

        for (int i = 0; i < o1.length() && i < o2.length(); i++) {

            int codePoint1 = o1.charAt(i);
            int codePoint2 = o2.charAt(i);

            if (Character.isSupplementaryCodePoint(codePoint1)
                    || Character.isSupplementaryCodePoint(codePoint2)) {
                i++;
            }

            if (codePoint1 != codePoint2) {
                if (Character.isSupplementaryCodePoint(codePoint1)
                        || Character.isSupplementaryCodePoint(codePoint2)) {
                    return codePoint1 - codePoint2;
                }

                String pinyin1 = pinyin((char) codePoint1);
                String pinyin2 = pinyin((char) codePoint2);

                if (pinyin1 != null && pinyin2 != null) { // Both of them
are Chinese character
                    if (!pinyin1.equals(pinyin2)) {
                        return pinyin1.compareTo(pinyin2);
                    }
                } else {
                    return codePoint1 - codePoint2;
                }
            }
        }
        return o1.length() - o2.length();
    }

    /**
     * If it is a  polyphonic we got the first one.If not a Chinese
character return null.
     */
    private String pinyin(char c) {
        String[] pinyins = PinyinHelper.toHanyuPinyinStringArray(c);
        if (pinyins == null) {
            return null;
        }
        return pinyins[0];
    }
}
-------------------------------------------------------------------

The junit4 Test.
-------------------------------------------------------------------

/**
  * @author Jeff
  *
  * Copyright (c)
  */
package chinese.utility.test;

import java.util.Comparator;

import org.junit.Assert;
import org.junit.Test;

import chinese.utility.PinyinComparator;

public class PinyinComparatorTest {

    private Comparator<String> comparator = new PinyinComparator();

    /**
     * Sight Words
     */
    @Test
    public void testCommon() {
        Assert.assertTrue(comparator.compare("孟", "宋") < 0);
    }

    /**
     * different length
     */
    @Test
    public void testDifferentLength() {
        Assert.assertTrue(comparator.compare("天气真好", "天气真好啊") < 0);
    }

    /**
     * compare with non-Chinese character
     */
    @Test
    public void testNoneChinese() {
        Assert.assertTrue(comparator.compare("a", "阿") < 0);
        Assert.assertTrue(comparator.compare("1", "阿") < 0);
    }

    /**
     * unfamiliar characters (怡)
     */
    @Test
    public void testNoneCommon() {
        Assert.assertTrue(comparator.compare("怡", "张") < 0);
    }

    /**
     * homophones
     */
    @Test
    public void testSameSound() {
        Assert.assertTrue(comparator.compare("怕", "帕") == 0);
    }

    /**
     * polyphonic (曾[zēng,céng] )
     */
    @Test
    public void testMultiSound() {
        Assert.assertTrue(comparator.compare("曾经", "曾迪") > 0);
    }

}
----------------------------------------------------------------------


2011/9/5 Peter Neubauer <peter.neuba...@neotechnology.com>

> Yuanlong,
> can you provide Java code on how to sort Pinyin characters? In that case, I
> am sure there is a way to incorporate it into the Cypher sorting routines.
> It would be very helpful since we don't even know how to test Pinyin
> sorting
> for correctness :/
>
> Cheers,
>
> /peter neubauer
>
> GTalk:      neubauer.peter
> Skype       peter.neubauer
> Phone       +46 704 106975
> LinkedIn   http://www.linkedin.com/in/neubauer
> Twitter      http://twitter.com/peterneubauer
>
> http://www.neo4j.org               - Your high performance graph database.
> http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
>
>
> On Mon, Sep 5, 2011 at 4:59 AM, iamyuanlong <yuanlong1...@gmail.com>
> wrote:
>
> > hi ,
> >
> >  Sorry for disturb you,Please excuse for my bad english.
> >
> >  i'd like use the CypherParser of neo4j.
> >  when i query the user's info order by user.username desc.
> >  i got the result that have a little difference from the result in
> > sqlserver.
> >  i hope that the result can be sorted by chinese Pinyin.
> >
> > eg.
> > i got :
> > 风过这头
> > 镇定的猎豹
> > 达小鱼儿
> > 财富分享
> > 蝶儿菲菲
> > 脚一滑
> > 股童天尊
> > 股票赢家888
> >
> > i hope:
> > 镇定的猎豹
> > 脚一滑
> > 股童天尊
> > 股票赢家888
> > 风过这头
> > 蝶儿菲菲
> > 达小鱼儿
> > 财富分享
> >
> > --
> > View this message in context:
> >
> http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-how-Neo4j-work-for-sorting-chinese-character-tp3309754p3309754.html
> > Sent from the Neo4j Community Discussions mailing list archive at
> > Nabble.com.
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to