You should never be scraping *any* Unicode HTML files. They are not made for that, and there is no guarantee of stability.
The emoji files are built from data which is described in http://www.unicode.org/reports/tr51/ (plus CLDR annotations and collation) Mark On Fri, Jun 24, 2016 at 7:21 AM, Takao Fujiwara <tfuji...@redhat.com> wrote: > Hi, > > I'm working on IBus - the input method framework for Linux. > I parse http://unicode.org/emoji/charts/emoji-list.html and create a > dictionary between the annotations and the Emoji characters. > Since the file size is large and it's often updated, I'm thinking how to > maintain the file. > > I copied the file as http://ibus.github.io/files/ibus/emoji-list.html for > the build at the moment. > > I have questions: > - if unicode.org provides the tarball of the stable html files or other > data. > - what is the license of the html files. > > Do you have any ideas? > > Thanks, > Fujiwara >