Re: Emoji and Annotation data

Takao Fujiwara Mon, 27 Jun 2016 00:51:27 -0700

On 06/27/16 16:01, Peter Edberg-san wrote:

I had suggested that you check
http://unicode.org/cldr/trac/browser/tags/latest/common/annotations/en.xml
which has the line
<annotation cp='[😀]' tts='grinning face'>face; grin</annotation>


Is that not what you want?


I'm sorry. I missed that.
OK, it seems emoji-list.html is the combination of en.xml and 
/Public/emoji/3.0/emoji-*.txt
However I cannot find some annotations. E.g. "america".

BTW, I think more categories are useful for the annotations likes "animal", 
"country".

Fujiwara


- Peter


On Jun 26, 2016, at 10:34 PM, Takao Fujiwara <tfuji...@redhat.com> wrote:


Hi,

E.g. http://unicode.org/emoji/charts/emoji-list.html
"😀" has the annotations of "face" and "grin".

The data is available in only the html files.

Fujiwara

On 06/27/16 14:16, Peter Edberg-san wrote:

Fujiwara-san,
If you follow the information indicated by UTR 51 (as Mark had suggested), you 
will see that:

1. The annotations data is available in CLDR here, in English:
http://unicode.org/cldr/trac/browser/tags/latest/common/annotations/en.xml
(or in many other languages, such as Japanese:)
http://unicode.org/cldr/trac/browser/tags/latest/common/annotations/ja.xml

The description of the format for those xml files is here:
http://www.unicode.org/reports/tr35/tr35-general.html#Annotations

2. Other emoji data files are here:
http://www.unicode.org/Public/emoji/latest/

These data files are what drive the generation of the charts.

Best regards,
Peter Edberg

On Jun 26, 2016, at 9:09 PM, Takao Fujiwara <tfuji...@redhat.com> wrote:

On 06/25/16 01:04, Mark Davis ☕️-san wrote:

You should never be scraping /any/ Unicode HTML files. They are not made for 
that, and there is no guarantee of stability.


I cannot find the license or descriptions about the HTML files.


The emoji files are built from data which is described in 
http://www.unicode.org/reports/tr51/
(plus CLDR annotations and collation)


OK, I need the data which packages Emoji unicode and the annotation.
It would be great if the data could be provided besides the html files.

Thanks,
Fujiwara


Mark
//////

On Fri, Jun 24, 2016 at 7:21 AM, Takao Fujiwara <tfuji...@redhat.com 
<mailto:tfuji...@redhat.com>> wrote:

  Hi,

  I'm working on IBus - the input method framework for Linux.
  I parse http://unicode.org/emoji/charts/emoji-list.html and create a 
dictionary between the annotations and the Emoji characters.
  Since the file size is large and it's often updated, I'm thinking how to 
maintain the file.

  I copied the file as http://ibus.github.io/files/ibus/emoji-list.html for the 
build at the moment.

  I have questions:
   - if unicode.org <http://unicode.org> provides the tarball of the stable 
html files or other data.
   - what is the license of the html files.

  Do you have any ideas?

  Thanks,
  Fujiwara

Re: Emoji and Annotation data

Reply via email to