Hello,

What would people think about including specific ICU data tables in WTF in 
order to provide a lightweight (but functional) unicode implementation?

On embedded systems the size of ICU is prohibitive.  Determining the right way 
to package it to make it small enough isn't simple either.

A patch was reviewed once that attempted to add ICU data tables directly in WTF 
and there were two concerns:
1) Checking in generated files 
(https://bugs.webkit.org/show_bug.cgi?id=27305#c8)
2) Questions concerning if the ICU license is compatible with WebCore 
(https://bugs.webkit.org/show_bug.cgi?id=27305#c9)

I believe the patch could be done differently as to not check in generated 
files.  Regarding the second concern, ICU has a very permissive license 
(http://www.icu-project.org/repos/icu/icu/trunk/license.html).  There are three 
requirements, basically that the copyright and permission notice has to appear 
with copies of the software.  I believe that is already a requirement for 
distributions of webkit that use ICU.  Except for WChar unicode, I believe all 
webkit builds now use ICU Unicode.

This Unicode path could replace WCHAR_UNICODE or be introduced as a third 
option, call it what you like - BASIC_ICU_UNICODE, ICU_LITE_UNICODE, 
COMPACT_ICU_UNICODE, etc..  I think it might be valuable for other ports that 
are size conscious - the up and coming NIX port comes to mind.

Thanks,
Mark

Background:
After rebasing my WinCE port of webkit, I ran into an ASSERT in 
WebCore/platform/text/wchar/TextBreakIteratorWchar.cpp, 
acquireLineBreakIterator().  I thought I'd be able to easily fix this, since I 
had already modified how LineBreakIterator works to take prior context into 
account (on my own branch) and find line break in a stream of non-ASCII 
characters.

However, the WCHAR Unicode implementation is very bare bones and does not even 
support returning the Unicode character category 
(http://trac.webkit.org/browser/trunk/Source/WTF/wtf/unicode/wchar/UnicodeWchar.cpp#L35).
  WCHAR Unicode was originally called WinCE Unicode, then it was properly 
renamed as it had nothing to do with WinCE.

WinCE Unicode originally came in here:  
https://bugs.webkit.org/show_bug.cgi?id=27305.  The reason it was introduced 
was to save space (filesystem and RAM).  ICU, if not packaged very carefully 
(http://userguide.icu-project.org/packaging), is actually larger than webkit 
itself.  On embedded systems, this is a big deal.  The original plan with the 
bug above was to include specific ICU data tables in webkit.

I've been compiling WTF with Unicode tables embedded for some time now.  I 
don't believe I've seen many layout test regressions due to using a simplified 
ICU implementation.


_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev

Reply via email to