On 2008-11-23 19:31, Graeme Geldenhuys wrote:
At least the good thing of UTF-8 is that you don't have to worry about
LE or BE byte orders. UTF-16 and UTF-32 have that nasty issue.
LE/BE only applies when streaming to/from file/device/network, otherwise
life is much simpler with UTF-32.
__
On Sun, Nov 23, 2008 at 3:45 PM, listmember <[EMAIL PROTECTED]> wrote:
>
> I am referring to going to the nth character in a string. With UTF-8 it is
> no more a simple arithmetic and an index operation. You have to start from
> zero and iterate until you get to your characters --at every step,
> c
Hello Daniël,
Sunday, November 23, 2008, 5:21:16 PM, you wrote:
>> Combined and uncombined strings are different things for different
>> tasks, the only common point is that both have the same visual
>> representation, but unicode function "CharAt" (or alike) over
>> uncombined string must never
Op Sun, 23 Nov 2008, schreef JoshyFun:
Combined and uncombined strings are different things for different
tasks, the only common point is that both have the same visual
representation, but unicode function "CharAt" (or alike) over
uncombined string must never report the combined character as a
Hello Daniël,
Sunday, November 23, 2008, 1:49:32 PM, you wrote:
DM> I am aware of that, but the combining cedille is not in the "easy to
DM> process range" of UTF-8. In other words, you cannot do
DM> "if char[i]=combining_cedille" in UTF-8.
DM> Instead UTF-8, you need to make sure the string ha
In our previous episode, Dani?l Mantione said:
> >> AFAIK there are some more elements where is is possible to get a typeinfo
> >> pointer. A compiler specialist can say more. :-)
> >
> > Well, I'm not an expert, but I can only think of enumerations. These have
> > RTTI under Delphi because they ar
Op Sun, 23 Nov 2008, schreef Marco van de Voort:
In our previous episode, Martin Schreiber said:
[ Charset ISO-8859-1 unsupported, converting... ]
On Sunday 23 November 2008 13.44:02 Mattias Gaertner wrote:
But RTTI only contains published classes, does it not?
AFAIK there are some more e
On 2008-11-23 15:10, Marco van de Voort wrote:
In our previous episode, listmember said:
[]..
I'd like to know this, in particular, for FPC ana Lazarus --to begin with.
And, the reason I'd like to know this is this: Whenever I suggest that
char size be increased to 4, the idea gets opposed o
In our previous episode, Martin Schreiber said:
[ Charset ISO-8859-1 unsupported, converting... ]
> On Sunday 23 November 2008 13.44:02 Mattias Gaertner wrote:
>
> > But RTTI only contains published classes, does it not?
> >
> AFAIK there are some more elements where is is possible to get a typein
On Sunday 23 November 2008 13.44:02 Mattias Gaertner wrote:
> But RTTI only contains published classes, does it not?
>
AFAIK there are some more elements where is is possible to get a typeinfo
pointer. A compiler specialist can say more. :-)
> Does MSEGui read ppu files?
>
No.
Martin
__
On 2008-11-23 14:49, Daniël Mantione wrote:
Op Sun, 23 Nov 2008, schreef Jonas Maebe:
On 23 Nov 2008, at 13:31, Daniël Mantione wrote:
For an IDE, this is a little bit more complicated. I.e. searching for
a ç in a source file needs to find both the composed and the
decomposed variant, and in
On 2008-11-23 14:19, Mattias Gaertner wrote:
On Sun, 23 Nov 2008 13:35:07 +0200
listmember<[EMAIL PROTECTED]> wrote:
[...]
These dependencies are complex and require exclusive access. The
memory belongs to the program, the source files can be changed by
anyone.
Therefore the files are kept in
On 2008-11-23 14:34, Mattias Gaertner wrote:
On Sun, 23 Nov 2008 14:11:50 +0200
listmember<[EMAIL PROTECTED]> wrote:
That leaves me wondering how much do we lose performance-wise in
endlessly decompressing UTF-8 data, instead of using, say, UCS-4
strings.
I'm wondering what you mean with 'e
In our previous episode, listmember said:
> The last time I joined a relevant discussion, I was told worrying about
> native UCS-4 string-type would be pointless simply because that sort of
> thing is really needed for word processors only.
>
> Now, I have been informed that Lazarus (and perhaps
In our previous episode, listmember said:
> Is there a way to determine how much memory is consumed by strings by a
> running application?
Maybe you can keep a counter in the routines of astrings. Increase/adjust on
newansistring or setlength.
> I'd like to know this, in particular, for FPC an
Daniël Mantione wrote:
Instead UTF-8, you need to make sure the string has enough characters
left, and then compare multiple characters. Heck, you even need to take
care of the fact the the combining cedille can be encoded in 2, 3 or 4
bytes.
In this example it may be more efficient to enco
On Sun, 23 Nov 2008 13:49:32 +0100 (CET)
Daniël Mantione <[EMAIL PROTECTED]> wrote:
>
>
> Op Sun, 23 Nov 2008, schreef Jonas Maebe:
>
> >
> > On 23 Nov 2008, at 13:31, Daniël Mantione wrote:
> >
> >> For an IDE, this is a little bit more complicated. I.e. searching
> >> for a ç in a source file
Op Sun, 23 Nov 2008, schreef Jonas Maebe:
On 23 Nov 2008, at 13:31, Daniël Mantione wrote:
For an IDE, this is a little bit more complicated. I.e. searching for a ç
in a source file needs to find both the composed and the decomposed
variant, and in the case of UTF-8, this character can be
On Sun, 23 Nov 2008 12:37:32 +0100
Martin Schreiber <[EMAIL PROTECTED]> wrote:
> On Sunday 23 November 2008 09.26:35 Graeme Geldenhuys wrote:
> > On Sun, Nov 23, 2008 at 10:19 AM, Mattias Gaertner
> >
> > <[EMAIL PROTECTED]> wrote:
> > > On Sat, 22 Nov 2008 23:05:43 +0200
> > > For example the laz
On 23 Nov 2008, at 13:31, Daniël Mantione wrote:
For an IDE, this is a little bit more complicated. I.e. searching
for a ç in a source file needs to find both the composed and the
decomposed variant, and in the case of UTF-8, this character can be
encoded in 1, 2, 3 or 4 bytes which all ne
On Sun, 23 Nov 2008 14:11:50 +0200
listmember <[EMAIL PROTECTED]> wrote:
>[...]
> > For very large projects, that should probably be done anyway at some
> > point. But even in that case, using a more memory-efficient string
> > type enables you to keep more data in memory and hence potentially
> >
Op Sun, 23 Nov 2008, schreef listmember:
On 2008-11-23 14:10, Daniël Mantione wrote:
Therefore, any other encoding is a waste of memory and does not gain you
any speed. For that reason, I don't see the compiler switch from 8-bit
processing either.
I nearly fully agree with you.
Except tha
On Sun, 23 Nov 2008 13:35:07 +0200
listmember <[EMAIL PROTECTED]> wrote:
> > Do a 'find declaration' on an identifier, that does not exist. This
> > will explore all units of the uses section.
>
> Now I see what you mean.
>
> But, isn't this a design-choice; caching all sources in memory for
> s
On 2008-11-23 14:10, Daniël Mantione wrote:
Therefore, any other encoding is a waste of memory and does not gain you
any speed. For that reason, I don't see the compiler switch from 8-bit
processing either.
I nearly fully agree with you.
Except that, when a string constant needs to contain no
On 2008-11-23 13:49, Jonas Maebe wrote:
On 23 Nov 2008, at 12:35, listmember wrote:
But, isn't this a design-choice; caching all sources in memory for
speed reasons, as opposed to on-demand opening and closing each file.
For very large projects, that should probably be done anyway at some
Op Sun, 23 Nov 2008, schreef listmember:
What I had in mind wasn't to store the string data in UTF-32 (or UCS-4); it
would still be UTF-8 or whatever.
I am only considering in memory representation being UTF-32 (or UCS-4).
This way, loading from and saving to would hardly be affected, yet i
On 23 Nov 2008, at 12:35, listmember wrote:
Do a 'find declaration' on an identifier, that does not exist. This
will explore all units of the uses section.
Now I see what you mean.
But, isn't this a design-choice; caching all sources in memory for
speed reasons, as opposed to on-demand ope
Do a 'find declaration' on an identifier, that does not exist. This
will explore all units of the uses section.
Now I see what you mean.
But, isn't this a design-choice; caching all sources in memory for speed
reasons, as opposed to on-demand opening and closing each file.
Still. If that is
On Sunday 23 November 2008 09.26:35 Graeme Geldenhuys wrote:
> On Sun, Nov 23, 2008 at 10:19 AM, Mattias Gaertner
>
> <[EMAIL PROTECTED]> wrote:
> > On Sat, 22 Nov 2008 23:05:43 +0200
> > For example the lazarus IDE typically holds 50 to 200mb sources in
> > memory. If this would be changed to unic
However, you may hack into RTL at the NewAnsiString / NewWideString /
NewUnicodeString procedures and install hooks that will record the
number of bytes requested. That shouldn't be too difficult to do.
This is what I was looking for.
Thank you.
___
fp
On Sun, 23 Nov 2008 13:05:15 +0200
listmember <[EMAIL PROTECTED]> wrote:
> On 2008-11-23 12:50, Jonas Maebe wrote:
> >
> > On 23 Nov 2008, at 11:29, listmember wrote:
> >
> >>> It is not hard to tell that an app that works with text files
> >>> (such as Lazarus) will consume 4 times more memory pe
listmember wrote:
This is my thick-day. So, permit me to ask this:
Are you really saying that strings occupy 50 MB Lazarus's memory footprint?
I just checked (using Process Explorer, under Windows) and this is what
I see:
Working set: 2,216 K
Peak Working set: 26,988 K
I can't see where th
On Sun, Nov 23, 2008 at 1:13 PM, Graeme Geldenhuys
<[EMAIL PROTECTED]> wrote:
>> I can't see where that 50 MB fits into that.
>
> Well it all depends on how many files you have open, project size etc...
>
As an example. Using a small project, Lazarus sits at 26MB or memory.
I then open the MacOSAl
On Sun, Nov 23, 2008 at 1:05 PM, listmember <[EMAIL PROTECTED]> wrote:
> I just checked (using Process Explorer, under Windows) and this is what I
> see:
>
> Working set: 2,216 K
> Peak Working set: 26,988 K
>
> I can't see where that 50 MB fits into that.
Well it all depends on how many files you
On 2008-11-23 13:07, Graeme Geldenhuys wrote:
On Sun, Nov 23, 2008 at 12:29 PM, listmember<[EMAIL PROTECTED]> wrote:
What I am curious about is: 4 times of what?
RAM, Ramdom Access Memory, DIMMs those little green sticks you
shove into the motherboard. :-)
:)
__
On Sun, Nov 23, 2008 at 12:29 PM, listmember <[EMAIL PROTECTED]> wrote:
>
> What I am curious about is: 4 times of what?
RAM, Ramdom Access Memory, DIMMs those little green sticks you
shove into the motherboard. :-)
Regards,
- Graeme -
___
fpG
On 2008-11-23 12:50, Jonas Maebe wrote:
On 23 Nov 2008, at 11:29, listmember wrote:
It is not hard to tell that an app that works with text files (such
as Lazarus) will consume 4 times more memory per file loaded.
But, how much memory does, say, Lazarus --itself-- consume
specifically for st
On 23 Nov 2008, at 11:29, listmember wrote:
It is not hard to tell that an app that works with text files (such
as Lazarus) will consume 4 times more memory per file loaded.
But, how much memory does, say, Lazarus --itself-- consume
specifically for string storage when run for the first ti
I thought my example described just that. If strings use 4 bytes per
char then ASCII text will need 4 times more memory.
I am not disputing that.
What I am curious about is: 4 times of what?
It is not hard to tell that an app that works with text files (such as
Lazarus) will consume 4 times m
On Sun, 23 Nov 2008 11:09:25 +0200
listmember <[EMAIL PROTECTED]> wrote:
> >> I am only considering in memory representation being UTF-32 (or
> >> UCS-4).
> >
> > What do you mean with 'memory representation'?
>
> That, each char in a string in memory would be 4-bytes (or more);
> yet, when saved
Graeme Geldenhuys wrote:
On Sun, Nov 23, 2008 at 10:19 AM, Mattias Gaertner
<[EMAIL PROTECTED]> wrote:
On Sat, 22 Nov 2008 23:05:43 +0200
For example the lazarus IDE typically holds 50 to 200mb sources in
memory. If this would be changed to unicodestring (2 byte per char) then
the IDE would need
Actually, load times are not --does not seem to be-- linear at all.
4 times larger file seems to take only twice as long.
I did one very simple test using 2 text files:
File 1: 384 MB (403,248,710 bytes)
File 2: 120 MB (126,680,448 bytes)
with the code below:
procedure TForm1.Button1Click(Sen
I am only considering in memory representation being UTF-32 (or
UCS-4).
What do you mean with 'memory representation'?
That, each char in a string in memory would be 4-bytes (or more); yet,
when saved on disk (or transmitted across the net etc.) it would be
UTF-8 compressed. IOW, no compress
On Sun, 23 Nov 2008 10:31:39 +0200
listmember <[EMAIL PROTECTED]> wrote:
>[...]
> What I had in mind wasn't to store the string data in UTF-32 (or
> UCS-4); it would still be UTF-8 or whatever.
>
> I am only considering in memory representation being UTF-32 (or
> UCS-4).
What do you mean with 'm
On 2008-11-23 10:19, Mattias Gaertner wrote:
On Sat, 22 Nov 2008 23:05:43 +0200
listmember<[EMAIL PROTECTED]> wrote:
Is there a way to determine how much memory is consumed by strings by
a running application?
I'd like to know this, in particular, for FPC ana Lazarus --to begin
with.
And, th
On Sun, Nov 23, 2008 at 10:19 AM, Mattias Gaertner
<[EMAIL PROTECTED]> wrote:
> On Sat, 22 Nov 2008 23:05:43 +0200
> For example the lazarus IDE typically holds 50 to 200mb sources in
> memory. If this would be changed to unicodestring (2 byte per char) then
> the IDE would need 50 to 200mb more me
On Sat, 22 Nov 2008 23:05:43 +0200
listmember <[EMAIL PROTECTED]> wrote:
> Is there a way to determine how much memory is consumed by strings by
> a running application?
>
> I'd like to know this, in particular, for FPC ana Lazarus --to begin
> with.
>
> And, the reason I'd like to know this is
Is there a way to determine how much memory is consumed by strings by a
running application?
I'd like to know this, in particular, for FPC ana Lazarus --to begin with.
And, the reason I'd like to know this is this: Whenever I suggest that
char size be increased to 4, the idea gets opposed on t
48 matches
Mail list logo