On Wed, Aug 13, 2014 at 07:06:38PM +0200, Harald Becker wrote:
> Hi Denys !
>
> > The world seems to be standardizing on utf-8.
> >Thank God, supporting gazillion of encodings is no fun.
>
> You say this, but libbb/unicode.c contains a unicode_strlen calling
> this complex mb to wc conversion fun
> Von: Harald Becker
> Gesendet: Mittwoch, 13. August 2014 19:07
> ...
> size_t utf8len( const char* s )
> {
>size_t n = 0;
>while (*s)
> if ((*s++ ^ 0x40) < 0xC0)
>n++;
>return n;
> }
> ...
> char *utf8skip( char const* s, size_t n )
> {
>for ( ; n && *s; --n )
>
Hi Denys!
>> 2) shell substitution ${#var}
Does this length operation shall give the number of bytes in var or the
number of characters (which may differ for multi byte characters, like
UTF-8).
bash gives number of Unicode chars.
I just fixed both ash and hush to do the same.
You fixed this
3) applet expr, function length STRING
This also may hit the *index*, *substr* and *match* functions. Do we
look at positions of character or at byte positions. What does the specs
say on this?
Looks like they removed *length*, *index*, *substr* and *match* from the
specification of this appl
Hi Paul !
>>> The POSIX standard says that ${#var} give the length of variable var
"in characters". I can't find, offhand, a definition of "characters"
in the standard
D'oh! It was only in the most obvious place:
3.87 Character
A sequence of one or more bytes representing
On 13.08.2014 19:56, Paul Smith wrote:
On Wed, 2014-08-13 at 13:52 -0400, Paul Smith wrote:
The POSIX standard says that ${#var} give the length of variable var
"in characters". I can't find, offhand, a definition of "characters"
in the standard
D'oh! It was only in the most obvious place:
On Wed, 2014-08-13 at 13:52 -0400, Paul Smith wrote:
> The POSIX standard says that ${#var} give the length of variable var
> "in characters". I can't find, offhand, a definition of "characters"
> in the standard
D'oh! It was only in the most obvious place:
3.87 Character
On Wed, 2014-08-13 at 19:23 +0200, Harald Becker wrote:
> > bash gives number of Unicode chars.
> > I just fixed both ash and hush to do the same.
>
> bash seams to be the only shell which does this. So is this a
> bash-ism?
The POSIX standard says that ${#var} give the length of variable var "in
Hi Denys !
2) shell substitution ${#var}
Does this length operation shall give the number of bytes in var or the
number of characters (which may differ for multi byte characters, like
UTF-8).
bash gives number of Unicode chars.
I just fixed both ash and hush to do the same.
Add a big warning
Hi Denys !
2) shell substitution ${#var}
Does this length operation shall give the number of bytes in var or the
number of characters (which may differ for multi byte characters, like
UTF-8).
bash gives number of Unicode chars.
I just fixed both ash and hush to do the same.
bash seams to be
Hi Denys !
> The world seems to be standardizing on utf-8.
Thank God, supporting gazillion of encodings is no fun.
You say this, but libbb/unicode.c contains a unicode_strlen calling this
complex mb to wc conversion function to count the number of characters.
Those multi byte functions tend
Hello everyone,
I'd like to include e2fsck and mke2fs to my busybox. Since I'm kind of a
noob in busybox building, could somebody please provide me a fully working
patch for busybox 1.22.1 that enables e2fsck and mke2fs in busybox?
Thank you.
___
busybox
Hi Denys!
This unveils an interesting question: Do we want to add UTF-8 support to BB
or full multi byte support. The former may be simpler, the later more
correct.
The world seems to be standardizing on utf-8.
Thank God, supporting gazillion of encodings is no fun.
Full ACK.
--
Harald
__
On Wed, Aug 13, 2014 at 4:01 PM, Harald Becker wrote:
>
>> The real problem with unicode is utf-16 which contains \0 chars (but its
>> another and uncommon problem)
>
>
> This unveils an interesting question: Do we want to add UTF-8 support to BB
> or full multi byte support. The former may be sim
On Wed, Aug 13, 2014 at 3:42 PM, Harald Becker wrote:
>
>> ive seen several implementations which use mbtowc functions to test some
>> special chars, this is not correct for utf 8 in my opinion.
>
>
> To count the number of UTF-8 characters is really simple, just count all
> bytes except those wit
On Wed, Aug 13, 2014 at 1:40 PM, Harald Becker wrote:
> 2) shell substitution ${#var}
> Does this length operation shall give the number of bytes in var or the
> number of characters (which may differ for multi byte characters, like
> UTF-8).
bash gives number of Unicode chars.
I just fixed both
The real problem with unicode is utf-16 which contains \0 chars (but its
another and uncommon problem)
This unveils an interesting question: Do we want to add UTF-8 support to
BB or full multi byte support. The former may be simpler, the later more
correct.
___
ive seen several implementations which use mbtowc functions to test some
special chars, this is not correct for utf 8 in my opinion.
To count the number of UTF-8 characters is really simple, just count all
bytes except those with value in range 0x80 to 0xBF. This has two
exceptions 0xFE and
in this case yes indeed, my mblen() function posted some days ago could be
used to prevent display of cutted char series.
The real problem with unicode is utf-16 which contains \0 chars (but its
another and uncommon problem)
2014-08-13 15:17 GMT+02:00 Harald Becker :
> Hi !
>
>
> > if cut fiel
Hi !
> if cut fields supports strings bigger than a single char, there
> should be no problem, the serie is found in input text.
$ echo -n äöü | hd
c3 a4 c3 b6 c3 bc
$ echo -n äöü | cut -c1 | hd
c3 0a
$ echo -n äöü | cut -c2 | hd
a4 0a
This shows the position giv
just remember utf-8 is not related to wchar, its just a serie of chars
displayed as a single column.
ive seen several implementations which use mbtowc functions to test some
special chars, this is not correct for utf 8 in my opinion.
if cut fields supports strings bigger than a single char, there
On Wed, Aug 13, 2014 at 12:54 PM, tito wrote:
> On Wednesday 13 August 2014 12:13:10 Laszlo Papp wrote:
> > On Wed, Aug 13, 2014 at 9:52 AM, Laszlo Papp wrote:
> >
> > > commit 55d6582d88470078cef09f52d1bc3c9c3f7fca6a
> > > Author: Laszlo Papp
> > > Date: Wed Aug 13 09:48:08 2014 +0100
> > >
Additional commands which may be hit by this question:
cut -c, -f
fold -w
Looks as BB does it right, but different from upstream.
sort, position specification
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/bus
On Wednesday 13 August 2014 12:13:10 Laszlo Papp wrote:
> On Wed, Aug 13, 2014 at 9:52 AM, Laszlo Papp wrote:
>
> > commit 55d6582d88470078cef09f52d1bc3c9c3f7fca6a
> > Author: Laszlo Papp
> > Date: Wed Aug 13 09:48:08 2014 +0100
> >
> > Fix the addgroup help output
> >
> > Since the ap
Hi All !
I start this thread to collect and discuss the possible Unicode (UTF-8)
problems we detected and which may need further investigation:
1) sed s/./x/ the dot matches bytes not characters
This at least hits uClibc builds, glibc seam to work correct with full
set of locale files.
Thi
Hi Denys!
>> I don't see how init itself shall benefit from UTF8 support?
>
We run setlocale("") for all applets. If we exclude init,
why not exclude e.g. mkdir?
I went with the code which is smaller. If someone reports that setlocale()
in init is *harmful* in some way, I can reinstate it - thi
On Wed, Aug 13, 2014 at 9:52 AM, Laszlo Papp wrote:
> commit 55d6582d88470078cef09f52d1bc3c9c3f7fca6a
> Author: Laszlo Papp
> Date: Wed Aug 13 09:48:08 2014 +0100
>
> Fix the addgroup help output
>
> Since the applet has two options, it is quite misleading to only
> mention one in
>
Yeah, I see that you are confused, please read the commit message carefully:
"The latter seems to be more common in applet, so I picked that one."
Have you even tried to run other applets to see their help output? To me,
it seems that you have not investigated anything before chiming in.
On Wed,
Am 13.08.2014 10:52, schrieb Laszlo Papp:
> commit 55d6582d88470078cef09f52d1bc3c9c3f7fca6a
> Author: Laszlo Papp
> Date: Wed Aug 13 09:48:08 2014 +0100
>
> Fix the addgroup help output
>
> Since the applet has two options, it is quite misleading to only
> mention one in
> the us
This works in busybox ash
getch() {
read -t 1 -n 1 $1
}
getch key
echo $key
you probably also want stty -echo before you start
Sam
On Tue, Aug 12, 2014 at 5:44 AM, James Bowlin wrote:
> On Mon, Aug 11, 2014 at 07:35 PM, Harald Becker said:
> > Did I get it right?
>
> This is very close t
commit 55d6582d88470078cef09f52d1bc3c9c3f7fca6a
Author: Laszlo Papp
Date: Wed Aug 13 09:48:08 2014 +0100
Fix the addgroup help output
Since the applet has two options, it is quite misleading to only
mention one in
the usage example. It should either use OPTIONS there or enumerate t
31 matches
Mail list logo