Re: Emoji and Annotation data

2016-07-01 Thread Takao Fujiwara
I tested emoji.json but unfortunately it's less useful than emoji-list.html. 1. "name" element is too long for the dictionary, E.g. "grinning face with smiling eyes" but I need both single word and words, E.g. "tower" and "united states". 2. Some keywords are adjective but I need noun. E.g. "smi

Re: Emoji and Annotation data

2016-06-27 Thread Takao Fujiwara
On 06/27/16 15:58, Ori Avtalion-san wrote: On Mon, Jun 27, 2016 at 7:13 AM, Takao Fujiwara wrote: Why you don't use only annotations? E.g. "us" hits too many Emoji. It's for all kinds of Unicode symbols, not just those that have emoji representation. Sometimes I find myself searching by the "

Re: Emoji and Annotation data

2016-06-27 Thread Takao Fujiwara
On 06/27/16 16:01, Peter Edberg-san wrote: I had suggested that you check http://unicode.org/cldr/trac/browser/tags/latest/common/annotations/en.xml which has the line face; grin Is that not what you want? I'm sorry. I missed that. OK, it seems emoji-list.html is the combination of en.xml and

Re: Emoji and Annotation data

2016-06-27 Thread Ori Avtalion
On Mon, Jun 27, 2016 at 7:13 AM, Takao Fujiwara wrote: > Why you don't use only annotations? E.g. "us" hits too many Emoji. It's for all kinds of Unicode symbols, not just those that have emoji representation. Sometimes I find myself searching by the "real" Unicode name, and sometimes by keyword,

Re: Emoji and Annotation data

2016-06-26 Thread Takao Fujiwara
Hi, E.g. http://unicode.org/emoji/charts/emoji-list.html "😀" has the annotations of "face" and "grin". The data is available in only the html files. Fujiwara On 06/27/16 14:16, Peter Edberg-san wrote: Fujiwara-san, If you follow the information indicated by UTR 51 (as Mark had suggested), yo

Re: Emoji and Annotation data

2016-06-26 Thread Takao Fujiwara
Thanks for that info and contribution. Probably I will package the emojione for Fedora to use emoji.json. Why you don't use only annotations? E.g. "us" hits too many Emoji. Fujiwara On 06/26/16 18:12, Ori Avtalion-san wrote: Hey, I maintain an IBus module(?) that allows inputting emojis [1] (

Re: Emoji and Annotation data

2016-06-26 Thread Takao Fujiwara
On 06/25/16 01:04, Mark Davis ☕️-san wrote: You should never be scraping /any/ Unicode HTML files. They are not made for that, and there is no guarantee of stability. I cannot find the license or descriptions about the HTML files. The emoji files are built from data which is described in h

Re: Emoji and Annotation data

2016-06-26 Thread Ori Avtalion
Hey, I maintain an IBus module(?) that allows inputting emojis [1] (I think I mentioned it before on IRC). I use the data provided by EmojiOne, which also includes aliases and the popular (but unofficial) "shortnames". You might find it useful [2]. [1] https://github.com/salty-horse/ibus-uniemoji

Re: Emoji and Annotation data

2016-06-24 Thread Mark Davis ☕️
You should never be scraping *any* Unicode HTML files. They are not made for that, and there is no guarantee of stability. The emoji files are built from data which is described in http://www.unicode.org/reports/tr51/ (plus CLDR annotations and collation) Mark On Fri, Jun 24, 2016 at 7:21 AM, Ta

Emoji and Annotation data

2016-06-24 Thread Takao Fujiwara
Hi, I'm working on IBus - the input method framework for Linux. I parse http://unicode.org/emoji/charts/emoji-list.html and create a dictionary between the annotations and the Emoji characters. Since the file size is large and it's often updated, I'm thinking how to maintain the file. I copied