Re: UCD in XML or in CSV? (was: Re: Unicode Digest, Vol 56, Issue 20)

2018-08-31 Thread Manuel Strehl via Unicode
To handle the UCD XML file a streaming parser like Expat is necessary. For codepoints.net I use that data to stuff everything in a MySQL database. If anyone is interested, the code for that is Open Source: https://github.com/Codepoints/unicode2mysql/ The example for handling the large XML file

Re: CLDR (was: Private Use areas)

2018-08-31 Thread Manuel Strehl via Unicode
The XML files in these folders: https://unicode.org/repos/cldr/tags/latest/common/ But I agree. I spent an extreme amount of time to get somewhat used to cldr.unicode.org and and the data repo, and still I have no clue, where to find a concrete piece of information without digging into the site.

Re: emoji props in the ucdxml ?

2017-07-05 Thread Manuel Strehl via Unicode
>> but are there any plans to integrate the data in the ucdxml [2] >> (possibly as separate files) ? > > No. Not unless and until they become formally part of the UCD. In this context: Would it be possible for the maintainers of the TR #51 data files to add a symlink "latest" under

Re: Comparing Raw Values of the Age Property

2017-05-23 Thread Manuel Strehl via Unicode
The rising standard in the world of web development (and others) is called »Semantic Versioning« [1], that many projects adhere to or sometimes must actively explain, why they don't. The structure of a »semantic version« string is a set of three integers, MAJOR.MINOR.PATCH, where the »sematics«

Re: [OT] Europe vs. European Union (was: Re: Unicode Emoji 5.0 characters now final)

2017-03-31 Thread Manuel Strehl
Maybe I'm missing context, but what is the specific problem of those lists differing? The EU and Europe _are_ two different things. The United States of America similarly do not include the whole of America, despite the name. And Norway and Switzerland and some others (incl. soon England) might

New tool unidump

2017-03-17 Thread Manuel Strehl
Hi, for my work on codepoints.net and Emojipedia I found myself repeatedly in a place, where I needed some tool like hexdump to inspect the content of a string. However, instead of raw bytes I am more interested in the code points that the string is composed of. So I wrote this tool. I reasoned,

Re: Unicode in the Curriculum?

2015-12-30 Thread Manuel Strehl
Not technically a school, but I gave a Batman-themed high-level overview of Unicode at Munich's local JavaScript user group two years ago: http://www.manuel-strehl.de/publications/holy-batman/presentation It was well received, especially for its lighter tone on this perceived dry subject, and

Additional adoption form on codepoints.net

2015-12-17 Thread Manuel Strehl
Hi, please let me start by saying, that I think the adoption of characters is a very good idea to provide funding for the development of Unicode. To promote this idea, I thought it could be worthwhile to place an "adopt this codepoint" button on the description pages of code points on

Re: Additional adoption form on codepoints.net

2015-12-17 Thread Manuel Strehl
Thank you! Yes, that's an implicit part of the "I'd like feedback from the people involved" :-) In fact, if such a GET parameter existed, I could remove the dialog and replace it with a simple link. This sounds like a good idea in principle. (It would also fix Leo Broukhis' issue.) Does anyone

Re: Additional adoption form on codepoints.net

2015-12-17 Thread Manuel Strehl
Thanks for the comment! > As far as I'm concerned, the pop-up contents should end with the link > "Read more about codepoint adoption." In your brief description, you > might want to add a proviso about the temporary nature of character > adoption. Good catch! The 12-month period is important to

Re: Sources for the B emoji samples in the PDFs

2015-12-06 Thread Manuel Strehl
Thank you, Doug and Rick! > If you can find the most recent version of the Symbola font updated > for Unicode 8.0, it contains a huge number of symbols and b/w > emoticons, etc. Yes, that's kind of the Go-to-font for pan-emoji support. It's a pity, that George won't continue development (although

Sources for the B emoji samples in the PDFs

2015-12-03 Thread Manuel Strehl
Hello, I am wondering, if there is a list, which font / work is used to render which of the black & white emoji (and other symbols) in the code chart PDFs. Neither http://www.unicode.org/charts/fonts.html nor http://unicode.org/emoji/images.html nor the PDF itself have a sufficiently detailed

Re: Arrow dingbats

2015-05-28 Thread Manuel Strehl
Interesting! Out of curiosity: How come this was recognized in Unicode 7? Is that documented anywhere? 2015-05-28 17:03 GMT+02:00 Doug Ewell d...@ewellic.org: Chris idou747 at gmail dot com wrote: Unicode has the arrow dingbats ⬅⬆⬇⬈⬉⬊⬋ in the range 2b05 with names like “LEFTWARDS BLACK

Re: Another Unicode viewing site

2014-01-23 Thread Manuel Strehl
Yes, they have the huge advantage over my http://codepoints.net, that they have a team providing already so many translations. I envy them for that a bit. But competition is good for business. :-) Cheers, Manuel 2014/1/23 Leo Broukhis l...@mailcom.com I find http://unicode-table.com/ of which

REST API to access Unicode 6.1 information

2013-07-16 Thread Manuel Strehl
Hello, under http://codepoints.net/api/v1/ I published a REST API to access information about Unicode (codepoints, blocks, planes, sample glyphs). There are also tools to transform or filter input. The API documentation is published on Github: https://github.com/Boldewyn/Codepoints.net/wiki/API

Re: Preconditions for changing a representative glyph?

2013-05-29 Thread Manuel Strehl
Out of curiosity, has it happened before, that a glyph was updated (i.e., substantially changed) in the standard? Cheers, 2013/5/29 Asmus Freytag asm...@ix.netcom.com On 5/29/2013 8:39 AM, Leo Broukhis wrote: I'd like to ask: what is supposed to be the trigger condition for the UTC to

Re: Searching data: map countries to scripts

2012-08-23 Thread Manuel Strehl
That's great news. I'm really looking forward to see Decode Unicode having v6.0 displayed. (By the way: The initial idea to display scripts and hence codepoints geographically came from Decode Unicode's solution.) Manuel 2012/8/22 Johannes Bergerhausen johan...@bergerhausen.com: Am 22.08.2012

Re: Searching data: map countries to scripts

2012-08-21 Thread Manuel Strehl
First of all, thanks for all the answers. It's quite interesting for me to learn, that data here is such fragmented. As calligrapher, who once was delighted to learn about the Latf script tag in the context of RFC 5646 et al., I guess I was way too naive when starting the scripts part of

Re: Searching data: map countries to scripts

2012-08-20 Thread Manuel Strehl
, not an effective or particularly error-free approach. Cheers, Manuel 2012/8/20 Asmus Freytag asm...@ix.netcom.com: On 8/19/2012 4:05 PM, Manuel Strehl wrote: Hello, I'm looking for a data source, that maps countries to scripts used in them. The target application is a visualization in the context

Re: Searching data: map countries to scripts

2012-08-20 Thread Manuel Strehl
This might not work too well, since the ISO 15924 code elements you're thinking of are Hira and Kana. This awkward moment... I'm trying to figure out, what I was thinking of with Hana. Of course, the mapping must be sensible in a way, that is, explain, how the mapping is done. I'd be fine, I

Searching data: map countries to scripts

2012-08-19 Thread Manuel Strehl
Hello, I'm looking for a data source, that maps countries to scripts used in them. The target application is a visualization in the context of my codepoints.net site, namely http://codepoints.net/scripts. At the moment I've extracted the prefered scripts from CLDR (e.g., Cyrl for Russia, Latn