Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-13 Thread Rich Felker
On Wed, Aug 13, 2014 at 07:06:38PM +0200, Harald Becker wrote: > Hi Denys ! > > > The world seems to be standardizing on utf-8. > >Thank God, supporting gazillion of encodings is no fun. > > You say this, but libbb/unicode.c contains a unicode_strlen calling > this complex mb to wc conversion fun

AW: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-13 Thread dietmar.schindler
> Von: Harald Becker > Gesendet: Mittwoch, 13. August 2014 19:07 > ... > size_t utf8len( const char* s ) > { >size_t n = 0; >while (*s) > if ((*s++ ^ 0x40) < 0xC0) >n++; >return n; > } > ... > char *utf8skip( char const* s, size_t n ) > { >for ( ; n && *s; --n ) >

Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-13 Thread Harald Becker
Hi Denys! >> 2) shell substitution ${#var} Does this length operation shall give the number of bytes in var or the number of characters (which may differ for multi byte characters, like UTF-8). bash gives number of Unicode chars. I just fixed both ash and hush to do the same. You fixed this

Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-13 Thread Harald Becker
3) applet expr, function length STRING This also may hit the *index*, *substr* and *match* functions. Do we look at positions of character or at byte positions. What does the specs say on this? Looks like they removed *length*, *index*, *substr* and *match* from the specification of this appl

Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-13 Thread Harald Becker
Hi Paul ! >>> The POSIX standard says that ${#var} give the length of variable var "in characters". I can't find, offhand, a definition of "characters" in the standard D'oh! It was only in the most obvious place: 3.87 Character A sequence of one or more bytes representing

Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-13 Thread Harald Becker
On 13.08.2014 19:56, Paul Smith wrote: On Wed, 2014-08-13 at 13:52 -0400, Paul Smith wrote: The POSIX standard says that ${#var} give the length of variable var "in characters". I can't find, offhand, a definition of "characters" in the standard D'oh! It was only in the most obvious place:

Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-13 Thread Paul Smith
On Wed, 2014-08-13 at 13:52 -0400, Paul Smith wrote: > The POSIX standard says that ${#var} give the length of variable var > "in characters". I can't find, offhand, a definition of "characters" > in the standard D'oh! It was only in the most obvious place: 3.87 Character

Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-13 Thread Paul Smith
On Wed, 2014-08-13 at 19:23 +0200, Harald Becker wrote: > > bash gives number of Unicode chars. > > I just fixed both ash and hush to do the same. > > bash seams to be the only shell which does this. So is this a > bash-ism? The POSIX standard says that ${#var} give the length of variable var "in

Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-13 Thread Harald Becker
Hi Denys ! 2) shell substitution ${#var} Does this length operation shall give the number of bytes in var or the number of characters (which may differ for multi byte characters, like UTF-8). bash gives number of Unicode chars. I just fixed both ash and hush to do the same. Add a big warning

Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-13 Thread Harald Becker
Hi Denys ! 2) shell substitution ${#var} Does this length operation shall give the number of bytes in var or the number of characters (which may differ for multi byte characters, like UTF-8). bash gives number of Unicode chars. I just fixed both ash and hush to do the same. bash seams to be

Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-13 Thread Harald Becker
Hi Denys ! > The world seems to be standardizing on utf-8. Thank God, supporting gazillion of encodings is no fun. You say this, but libbb/unicode.c contains a unicode_strlen calling this complex mb to wc conversion function to count the number of characters. Those multi byte functions tend

Old_e2fsprogs patch

2014-08-13 Thread Daniil Gentili
Hello everyone, I'd like to include e2fsck and mke2fs to my busybox. Since I'm kind of a noob in busybox building, could somebody please provide me a fully working patch for busybox 1.22.1 that enables e2fsck and mke2fs in busybox? Thank you. ___ busybox

Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-13 Thread Harald Becker
Hi Denys! This unveils an interesting question: Do we want to add UTF-8 support to BB or full multi byte support. The former may be simpler, the later more correct. The world seems to be standardizing on utf-8. Thank God, supporting gazillion of encodings is no fun. Full ACK. -- Harald __

Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-13 Thread Denys Vlasenko
On Wed, Aug 13, 2014 at 4:01 PM, Harald Becker wrote: > >> The real problem with unicode is utf-16 which contains \0 chars (but its >> another and uncommon problem) > > > This unveils an interesting question: Do we want to add UTF-8 support to BB > or full multi byte support. The former may be sim

Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-13 Thread Denys Vlasenko
On Wed, Aug 13, 2014 at 3:42 PM, Harald Becker wrote: > >> ive seen several implementations which use mbtowc functions to test some >> special chars, this is not correct for utf 8 in my opinion. > > > To count the number of UTF-8 characters is really simple, just count all > bytes except those wit

Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-13 Thread Denys Vlasenko
On Wed, Aug 13, 2014 at 1:40 PM, Harald Becker wrote: > 2) shell substitution ${#var} > Does this length operation shall give the number of bytes in var or the > number of characters (which may differ for multi byte characters, like > UTF-8). bash gives number of Unicode chars. I just fixed both

Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-13 Thread Harald Becker
The real problem with unicode is utf-16 which contains \0 chars (but its another and uncommon problem) This unveils an interesting question: Do we want to add UTF-8 support to BB or full multi byte support. The former may be simpler, the later more correct. ___

Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-13 Thread Harald Becker
ive seen several implementations which use mbtowc functions to test some special chars, this is not correct for utf 8 in my opinion. To count the number of UTF-8 characters is really simple, just count all bytes except those with value in range 0x80 to 0xBF. This has two exceptions 0xFE and

Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-13 Thread Tanguy Pruvot
in this case yes indeed, my mblen() function posted some days ago could be used to prevent display of cutted char series. The real problem with unicode is utf-16 which contains \0 chars (but its another and uncommon problem) 2014-08-13 15:17 GMT+02:00 Harald Becker : > Hi ! > > > > if cut fiel

Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-13 Thread Harald Becker
Hi ! > if cut fields supports strings bigger than a single char, there > should be no problem, the serie is found in input text. $ echo -n äöü | hd c3 a4 c3 b6 c3 bc $ echo -n äöü | cut -c1 | hd c3 0a $ echo -n äöü | cut -c2 | hd a4 0a This shows the position giv

Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-13 Thread Tanguy Pruvot
just remember utf-8 is not related to wchar, its just a serie of chars displayed as a single column. ive seen several implementations which use mbtowc functions to test some special chars, this is not correct for utf 8 in my opinion. if cut fields supports strings bigger than a single char, there

Re: Fix the addgroup help output

2014-08-13 Thread Laszlo Papp
On Wed, Aug 13, 2014 at 12:54 PM, tito wrote: > On Wednesday 13 August 2014 12:13:10 Laszlo Papp wrote: > > On Wed, Aug 13, 2014 at 9:52 AM, Laszlo Papp wrote: > > > > > commit 55d6582d88470078cef09f52d1bc3c9c3f7fca6a > > > Author: Laszlo Papp > > > Date: Wed Aug 13 09:48:08 2014 +0100 > > >

Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-13 Thread Harald Becker
Additional commands which may be hit by this question: cut -c, -f fold -w Looks as BB does it right, but different from upstream. sort, position specification ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/bus

Re: Fix the addgroup help output

2014-08-13 Thread tito
On Wednesday 13 August 2014 12:13:10 Laszlo Papp wrote: > On Wed, Aug 13, 2014 at 9:52 AM, Laszlo Papp wrote: > > > commit 55d6582d88470078cef09f52d1bc3c9c3f7fca6a > > Author: Laszlo Papp > > Date: Wed Aug 13 09:48:08 2014 +0100 > > > > Fix the addgroup help output > > > > Since the ap

Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-13 Thread Harald Becker
Hi All ! I start this thread to collect and discuss the possible Unicode (UTF-8) problems we detected and which may need further investigation: 1) sed s/./x/ the dot matches bytes not characters This at least hits uClibc builds, glibc seam to work correct with full set of locale files. Thi

Re: How do I (unconditionally) enable unicode support in busybox?

2014-08-13 Thread Harald Becker
Hi Denys! >> I don't see how init itself shall benefit from UTF8 support? > We run setlocale("") for all applets. If we exclude init, why not exclude e.g. mkdir? I went with the code which is smaller. If someone reports that setlocale() in init is *harmful* in some way, I can reinstate it - thi

Re: Fix the addgroup help output

2014-08-13 Thread Laszlo Papp
On Wed, Aug 13, 2014 at 9:52 AM, Laszlo Papp wrote: > commit 55d6582d88470078cef09f52d1bc3c9c3f7fca6a > Author: Laszlo Papp > Date: Wed Aug 13 09:48:08 2014 +0100 > > Fix the addgroup help output > > Since the applet has two options, it is quite misleading to only > mention one in >

Re: Fix the addgroup help output

2014-08-13 Thread Laszlo Papp
Yeah, I see that you are confused, please read the commit message carefully: "The latter seems to be more common in applet, so I picked that one." Have you even tried to run other applets to see their help output? To me, it seems that you have not investigated anything before chiming in. On Wed,

Re: Fix the addgroup help output

2014-08-13 Thread walter harms
Am 13.08.2014 10:52, schrieb Laszlo Papp: > commit 55d6582d88470078cef09f52d1bc3c9c3f7fca6a > Author: Laszlo Papp > Date: Wed Aug 13 09:48:08 2014 +0100 > > Fix the addgroup help output > > Since the applet has two options, it is quite misleading to only > mention one in > the us

Re: shutdown busybox and start another PID1 process

2014-08-13 Thread Sam Liddicott
This works in busybox ash getch() { read -t 1 -n 1 $1 } getch key echo $key you probably also want stty -echo before you start Sam On Tue, Aug 12, 2014 at 5:44 AM, James Bowlin wrote: > On Mon, Aug 11, 2014 at 07:35 PM, Harald Becker said: > > Did I get it right? > > This is very close t

Fix the addgroup help output

2014-08-13 Thread Laszlo Papp
commit 55d6582d88470078cef09f52d1bc3c9c3f7fca6a Author: Laszlo Papp Date: Wed Aug 13 09:48:08 2014 +0100 Fix the addgroup help output Since the applet has two options, it is quite misleading to only mention one in the usage example. It should either use OPTIONS there or enumerate t