On Tue, Mar 18, 2008 at 5:22 PM, John Cowan <[EMAIL PROTECTED]> wrote:
> felix winkelmann scripsit:
>
>
> > A real module system would solve all these problems cleanly.
>
> It wouldn't solve the data-punning problem. As long as the same object
> can be seen one way by one module and another way
Shawn Rutledge scripsit:
> But you would want the usual string operations to work with either
> kind of string, right?
Indeed.
> It could follow from the general principle of separating metadata from
> data: Put the encoding in the extended attributes of the file, or
> resource fork if you've
On Tue, Mar 18, 2008 at 4:50 PM, John Cowan <[EMAIL PROTECTED]> wrote:
> I'm not arguing that point. I'm arguing that there should be two
> different kinds of strings, one of which is UTF-8 and one of which
> contains single-byte characters in an unspecified ASCII-compatible
> encoding, *and t
Shawn Rutledge scripsit:
> That is a huge advantage. I think unless there are some
> insurmountable gotcha's, or it causes major efficiency problems, there
> are some good arguments for using UTF-8 for strings in Chicken.
I'm not arguing that point. I'm arguing that there should be two
differen
On Tue, Mar 18, 2008 at 1:53 PM, John Cowan <[EMAIL PROTECTED]> wrote:
> > Let's see... ASCII is valid UTF-8, so all ASCII external
> > representations wouldn't need any encoding or decoding work.
That is a huge advantage. I think unless there are some
insurmountable gotcha's, or it causes majo
Tobia Conforto scripsit:
> Let's see... ASCII is valid UTF-8, so all ASCII external
> representations wouldn't need any encoding or decoding work.
True. However, pure ASCII is less comment than people believe, as
indicated by the 59K Google hits for "8-bit ASCII".
> Most recent formats and pr
John Cowan wrote:
If we all lived in a UTF-8/LF world exclusively, then that would be
fine. As it is, many of us are not in that world at all, and few of
us are in it exclusively. So in practice it is necessary to convert
between internal and external encodings anyhow, which involves
copy
Tobia Conforto scripsit:
> This discussion has convinced me that from a *practical* point of
> view, it makes a lot of sense to use the same underlying object for
> both kinds of operation, instead of copying over the contents every
> time you want to switch between the two views (as I suppo
John Cowan wrote:
The difference between restricted and unrestricted strings may not
be as large as the distinction between pairs and fixnums, but it's
the same *kind* of difference.
I beg to differ.
A pair is no fixnum, and vice-versa. They're two disjoint domains.
On the other hand, an
> "Graham" == Graham Fawcett <[EMAIL PROTECTED]> writes:
Graham> On Tue, Mar 18, 2008 at 12:22 PM, John Cowan <[EMAIL PROTECTED]>
wrote:
>> It wouldn't solve the data-punning problem. As long
>> as the same object can be seen one way by one module
>> and another way by anothe
Graham Fawcett scripsit:
> Just curious, whence the 'restricted' terminology? I would have
> thought 'utf8 and raw/byte strings' since that's the practical
> implication.
Restricted strings are restricted to holding characters between #\x0
and #\xFF. Unrestricted strings can hold any character b
On Tue, Mar 18, 2008 at 12:22 PM, John Cowan <[EMAIL PROTECTED]> wrote:
> It wouldn't solve the data-punning problem. As long as the same object
> can be seen one way by one module and another way by another, problems
> will continue to be endemic. To fix that, we need two run-time types,
> w
Alex Shinn scripsit:
> Why do we need this? That's not rhetorical, I'd like to
> hear of any use cases where you think a problem could arise.
We need it because Scheme is a strongly (dynamically) typed language.
If FOO passes BAR a pair, and BAR is expecting an exact integer, the
programmer exp
On Mar 18, 2008, at 8:46 AM, John Cowan wrote:
Kon Lovett scripsit:
In my usage "byte-string" means "octet-string". See the "levenshtein"
egg. I mean a "blob".
I'd call that a byte vector.
"blob" is incorrect in the "levenshtein" egg context. However, not
incorrect in a previous post.
> "John" == John Cowan <[EMAIL PROTECTED]> writes:
John> felix winkelmann scripsit:
>> A real module system would solve all these problems
>> cleanly.
John> It wouldn't solve the data-punning problem. As
John> long as the same object can be seen one way by one
John> m
felix winkelmann scripsit:
> A real module system would solve all these problems cleanly.
It wouldn't solve the data-punning problem. As long as the same object
can be seen one way by one module and another way by another, problems
will continue to be endemic. To fix that, we need two run-time
Tobia Conforto scripsit:
> So they're ditching {"byte", u"unicode"} strings in favor of {b"byte",
> "unicode"} ones? What are they ditching exactly? It seems to me
> they're just switching the default.
Maybe so, but the word "just" is probably not appropriate, as switching
defaults has a hu
Kon Lovett scripsit:
> In my usage "byte-string" means "octet-string". See the "levenshtein"
> egg. I mean a "blob".
I'd call that a byte vector. The issue is, when you get a component,
what comes out, a character or an exact integer?
--
Work hard, John C
On Mar 18, 2008, at 7:01 AM, John Cowan wrote:
Graham Fawcett scripsit:
So, a byte string would simply be a string with a null auxilliary
vector.
That doesn't work. A byte-string is not a sequence of characters from
the ASCII repertoire, it's a sequence of characters from the
repertoire
On Tue, Mar 18, 2008 at 12:05 PM, Alex Shinn <[EMAIL PROTECTED]> wrote:
>
> That's just changing the procedures used to access strings.
> Changing the fundamental string representation is a more
> substantial change by an order of magnitude, involving
> changes to the core compiler and the FFI
John Cowan wrote:
Tobia Conforto scripsit:
This is more or less how other languages, such as Python, solved
the issue. Two kinds of strings, byte and unicode, and
overloading a few string operations to have a slightly different
meaning when called on either, computing byte length vs. cha
On Tue, Mar 18, 2008 at 10:01 AM, John Cowan <[EMAIL PROTECTED]> wrote:
> Graham Fawcett scripsit:
>
> > So, a byte string would simply be a string with a null auxilliary vector.
>
> That doesn't work. A byte-string is not a sequence of characters from
> the ASCII repertoire, it's a sequence of
Tobia Conforto scripsit:
> This is more or less how other languages, such as Python, solved the
> issue. Two kinds of strings, byte and unicode, and overloading a few
> string operations to have a slightly different meaning when called on
> either, computing byte length vs. character length
Graham Fawcett scripsit:
> So, a byte string would simply be a string with a null auxilliary vector.
That doesn't work. A byte-string is not a sequence of characters from
the ASCII repertoire, it's a sequence of characters from the repertoire
{ASCII set, characters numbered 129 through 255 with
On 18/03/2008, Graham Fawcett <[EMAIL PROTECTED]> wrote:
> For what it's worth, I also think that GMP should be in the core, and
> that no one, nowhere should be allowed to publish an egg with a
> toplevel procedure named (format) in it. Mysterious toplevel
> interactions between indirect depen
On Tue, Mar 18, 2008 at 7:05 AM, Alex Shinn <[EMAIL PROTECTED]> wrote:
> > "Tobia" == Tobia Conforto <[EMAIL PROTECTED]> writes:
>
>
> Tobia> Graham Fawcett wrote:
> >> Here's another thought. It seems to me that if we
> >> were to represent strings as composite values, e.g. a
>
The entire problem revolves around adding Unicode support as
an option, without modifying the core. *If* we allow
ourselves to modify the core, then there is no problem at
all, and we can just copy the utf8 egg code over the
existing string procedures, and add in some procedures for
byte-level ac
> "Peter" == Peter Bex <[EMAIL PROTECTED]> writes:
Peter> On Tue, Mar 18, 2008 at 11:41:08AM +0900, Alex Shinn wrote:
>> > "Kon" == Kon Lovett <[EMAIL PROTECTED]>
>> writes:
>>
Kon> Summary: I want a byte-string API. I want string
Kon> integrations. I want global U
On 18 Mar 2008, at 2:29 am, Alex Shinn wrote:
The problems we're having aren't even about string
representation though, they're about the semantics of the
string operations themselves. Are the string indices byte
positions or character positions? Different libraries
disagree.
IMHO Java doe
> "Tobia" == Tobia Conforto <[EMAIL PROTECTED]> writes:
Tobia> Graham Fawcett wrote:
>> Here's another thought. It seems to me that if we
>> were to represent strings as composite values, e.g. a
>> two-slot record whose first slot is an encoding (the
>> symbol 'utf8, or #f
Am Dienstag, den 18.03.2008, 09:38 +0100 schrieb Peter Bex:
> On Tue, Mar 18, 2008 at 11:41:08AM +0900, Alex Shinn wrote:
> > > "Kon" == Kon Lovett <[EMAIL PROTECTED]> writes:
> >
> > Kon> Summary: I want a byte-string API. I want string
> > Kon> integrations. I want global UTF8 strin
Graham Fawcett wrote:
Here's another thought. It seems to me that if we were to represent
strings as composite values, e.g. a two-slot record whose first slot
is an encoding (the symbol 'utf8, or #f for 'byte' encoding), and
whose second slot contains the string data, then the various string
On Tue, Mar 18, 2008 at 11:41:08AM +0900, Alex Shinn wrote:
> > "Kon" == Kon Lovett <[EMAIL PROTECTED]> writes:
>
> Kon> Summary: I want a byte-string API. I want string
> Kon> integrations. I want global UTF8 strings.
>
> The only way this can happen is to push the UTF8 handling
> in
On Mon, Mar 17, 2008 at 10:29 PM, Alex Shinn <[EMAIL PROTECTED]> wrote:
> > "Graham" == Graham Fawcett <[EMAIL PROTECTED]> writes:
>
> Graham> On Mon, Mar 17, 2008 at 11:22 AM, Kon Lovett <[EMAIL PROTECTED]>
> wrote:
>
> Graham> The Factor language borrowed from Larceny a
> Graham>
> "Kon" == Kon Lovett <[EMAIL PROTECTED]> writes:
Kon> Summary: I want a byte-string API. I want string
Kon> integrations. I want global UTF8 strings.
The only way this can happen is to push the UTF8 handling
into the core of Chicken itself.
Integration vs. modules are just different
> "Graham" == Graham Fawcett <[EMAIL PROTECTED]> writes:
Graham> On Mon, Mar 17, 2008 at 11:22 AM, Kon Lovett <[EMAIL PROTECTED]>
wrote:
Graham> The Factor language borrowed from Larceny a
Graham> clever mechanism for representing Unicode
Graham> strings efficiently. Perhaps
On Mon, Mar 17, 2008 at 11:22 AM, Kon Lovett <[EMAIL PROTECTED]> wrote:
> Summary: I want a byte-string API. I want string integrations. I want
> global UTF8 strings.
The Factor language borrowed from Larceny a clever mechanism for
representing Unicode strings efficiently. Perhaps such a system i
Summary: I want a byte-string API. I want string integrations. I want
global UTF8 strings.
Is Chicken to be a development tool for all kinds of software,
including i18n applications for general users (my major focus, all
indications to the contrary), or for in-house tools only.
Text is st
On Mon, Mar 17, 2008 at 10:07 AM, Alex Shinn <[EMAIL PROTECTED]> wrote:
>
> Felix> No, integration only happens in operator
> Felix> position.
>
> Well, that's easy enough to change.
But won't, in the foreseeable future.
>
> Felix> But won't syntax-case's module system rewrite the
>
> "Felix" == felix winkelmann <[EMAIL PROTECTED]> writes:
Felix> On Sun, Mar 16, 2008 at 8:04 AM, Alex Shinn <[EMAIL PROTECTED]>
wrote:
>>
>> I actually thought the change you introduced didn't
>> really inline most of the operators but referenced a
>> static table, and t
On Sun, Mar 16, 2008 at 8:04 AM, Alex Shinn <[EMAIL PROTECTED]> wrote:
> > "Felix" == Felix Winkelmann <[EMAIL PROTECTED]> writes:
>
> Felix> Alex, what happens if I pass string operators as first
> Felix> class values? These don't get inlined. What
> Felix> happens now?
>
> I actu
On Mar 16, 2008, at 12:07 AM, Alex Shinn wrote:
"Kon" == Kon Lovett <[EMAIL PROTECTED]> writes:
Kon> On Mar 15, 2008, at 9:33 AM, Felix Winkelmann wrote:
Kon> Is this a "char-string" issue or a "byte-string"
Kon> issue? When the source "...string..." is a string
Kon> of ASCII
> "Kon" == Kon Lovett <[EMAIL PROTECTED]> writes:
Kon> Is this a "char-string" issue or a "byte-string"
Kon> issue? When the source "...string..." is a string
Kon> of ASCII non-nul char then there should be no
Kon> problem w/ the utf8 egg overriding the string
Kon> operator
> "Kon" == Kon Lovett <[EMAIL PROTECTED]> writes:
Kon> On Mar 15, 2008, at 9:33 AM, Felix Winkelmann wrote:
Kon> Is this a "char-string" issue or a "byte-string"
Kon> issue? When the source "...string..." is a string
Kon> of ASCII non-nul char then there should be no
Kon>
> "Felix" == Felix Winkelmann <[EMAIL PROTECTED]> writes:
Felix> Alex, what happens if I pass string operators as first
Felix> class values? These don't get inlined. What
Felix> happens now?
I actually thought the change you introduced didn't really
inline most of the operators bu
On Mar 15, 2008, at 9:33 AM, Felix Winkelmann wrote:
When Felix says it would incur tremendous breakage now, I
believe he's referring to the fact that people who are
currently using utf8 are all writing:
(use utf8)
(import utf8)
and now they'll have to remove the import line from their
e
From: Alex Shinn <[EMAIL PROTECTED]>
Subject: Re: [Chicken-users] ditching syntax-case modules for the utf8 egg
Date: Sat, 15 Mar 2008 02:36:45 +0900
> >>>>> "Tobia" == Tobia Conforto <[EMAIL PROTECTED]> writes:
>
> Tobia> Alex Shinn wrote:
From: Tobia Conforto <[EMAIL PROTECTED]>
Subject: Re: [Chicken-users] ditching syntax-case modules for the utf8 egg
Date: Fri, 14 Mar 2008 12:41:51 +0100
> Alex Shinn wrote:
> > I'm considering changing the utf8 egg to no longer use syntax-case
> > modules, so t
> "John" == John Cowan <[EMAIL PROTECTED]> writes:
>> I could actually add a syntax module that does this
>> so people could say
>>
>> (use no-string-integrations)
John> Please do; or better yet, make that a consequence
John> of (use utf8).
It already is, otherwise (
Alex Shinn scripsit:
> I'd rather people not include the full list by hand now,
> since I'd like a compiler optimization such as
>
>(declare (not string-integrations))
Very well, but I doubt you will get one. Felix has already declared
against declarations.
> I could actually add a syntax
> "John" == John Cowan <[EMAIL PROTECTED]> writes:
John> Alex Shinn scripsit:
>> The new version actually makes this easier. You can
>> just have one version of spiffy that does
>>
>> (declare (not usual-integrations))
John> But that would really suck: we want car to
Alex Shinn scripsit:
> The new version actually makes this easier. You can just
> have one version of spiffy that does
>
> (declare (not usual-integrations))
But that would really suck: we want car to stay integrated while leaving
string-length not integrated. What you actually want is:
(de
> "Tobia" == Tobia Conforto <[EMAIL PROTECTED]> writes:
Tobia> Alex Shinn wrote:
>> I'm considering changing the utf8 egg to no longer
>> use syntax-case modules, so that it would work like
>> the numbers egg.
>>
>> The way this would work is that, naturally, if you
> "Robin" == Robin Lee Powell <[EMAIL PROTECTED]> writes:
Robin> On Tue, Mar 11, 2008 at 05:05:27PM +0900, Alex Shinn wrote:
>>
>> I'm not entirely sure why you think spiffy would need
>> two versions.
Robin> Because you said:
Robin> External modules, by default,
Alex Shinn wrote:
I'm considering changing the utf8 egg to no longer use syntax-case
modules, so that it would work like the numbers egg.
The way this would work is that, naturally, if you wanted to use
utf8 semantics you'd just (use utf8), this time with no need for
syntax-case and nothin
On Tue, Mar 11, 2008 at 05:05:27PM +0900, Alex Shinn wrote:
> > "Robin" == Robin Lee Powell <[EMAIL PROTECTED]> writes:
> Robin> On Thu, Jun 28, 2007 at 12:25:54PM +0900, Alex Shinn wrote:
>
> >> I'm considering changing the utf8 egg to no longer
> >> use syntax-case modules, so th
>But I want to ask again, do people want this, and is it OK
>to break compatibility in the current utf8 egg? Or should
>we possibly wait to see about the new module system?
IMHO, the current behaviour should not be changed, or at least
it should be the default and a module-free version be option
Alex Shinn scripsit:
> Individual servlets could then be compiled without the
> default integrations, and they would see the utf8 semantics
> if the utf8 egg were used. Both types of libraries can
> co-exist, and they would all share the same string
> representation.
Can things be arranged so th
On Mar 11, 2008, at 1:05 AM, Alex Shinn wrote:
"Robin" == Robin Lee Powell <[EMAIL PROTECTED]> writes:
But I want to ask again, do people want this, and is it OK
to break compatibility in the current utf8 egg? Or should
we possibly wait to see about the new module system?
In all cases,
> "Robin" == Robin Lee Powell <[EMAIL PROTECTED]> writes:
Robin> Replying to very old mail. :)
Very old, but still relevant, and I was actually about to
bring this up again myself. I have the new code ready to
check in (complete with Unicode 5.0 updates), but want to
reconfirm.
Rob
Replying to very old mail. :)
On Thu, Jun 28, 2007 at 12:25:54PM +0900, Alex Shinn wrote:
> Hi all,
>
> Following up on trac ticket #258:
>
> http://trac.callcc.org/ticket/258
Which is down right now, unfortunately.
> I'm considering changing the utf8 egg to no longer use syntax-case
> modules
Alex Shinn scripsit:
> I'm considering changing the utf8 egg to no longer use syntax-case
> modules, so that it would work like the numbers egg.
I am very much in favor of this, *provided* that it does not hurt
mmaintainability (not only by you, but by your eventual successor).
> (declare (not
On 6/28/07, Alex Shinn <[EMAIL PROTECTED]> wrote:
I'm considering changing the utf8 egg to no longer use syntax-case
modules, so that it would work like the numbers egg.
The way this would work is that, naturally, if you wanted to use utf8
semantics you'd just (use utf8), this time with no need
Hi all,
Following up on trac ticket #258:
http://trac.callcc.org/ticket/258
I'm considering changing the utf8 egg to no longer use syntax-case
modules, so that it would work like the numbers egg.
The way this would work is that, naturally, if you wanted to use utf8
semantics you'd just (use u
64 matches
Mail list logo