Re: expectation vs requirements for locale facets

Martin Sebor Tue, 21 Aug 2007 08:31:45 -0700

Travis Vitek wrote:

Martin Sebor wrote:



Yes. But notice the text doesn't say anything about time_put_byname or
time_get_byname ;-)


Well, the standard doesn't say much at all about the *_byname<>
facets. All it really says about them is

  [21.1.1.2 p4] For some standard facets a standard "..._byname" class,

[...]

The _byname requirements are extremely vague. Sometimes they are
also implied by the requirements on the base facets, which makes
them difficult to find. It's a mess.


So, if I'm reading that right, the *_byname<> facet classes are just
there to prevent the user from having to instantiate a std::locale
directly.


I'm not sure what you mean by this. The _byname facets are really
just an implementation that's exposed in the interface if the
locale library. They should have never been specified.

The C++ standard (or even the C standard for that
matter) isn't going to of help here.


Wait. Say what now? I'm not sure what you're trying to tell me here.
If the C++ Standard says that these facets read or write years as
roman numerals, then they should probably do so, regardless of what
any other standard document requires. I think this will actually get
cleared up in a few seconds...


The C and C++ standards only specify the requirements on the "C"
locale and leave the localized behavior unspecified. So pretty
much anything goes. There are some ground rules but I suspect
you won't be able to tease the requirement on swallowing leading
space for the %e directive out of them.

Of
course that isn't what I'm seeing.

Test case?


Yeah. See attachment. Only tested on Win32/VC8 and Linux/GCC.


Thanks. Here are the results with stdcxx and with g++ 3.4.6:

$ ./t.stdcxx | grep fail
string=07/06/08 result=fail     locale=thai
string= 7.06.1908       result=fail     locale=bg_BG
string=07/06/08 result=fail     locale=lo_LA
string=07/06/08 result=fail     locale=th_TH

$ ./t.gcc | grep fail
string=��� %.1d ��� 1908        result=fail     locale=ar_SA
string=۰۸/۰۶/۰۷ result=fail     locale=fa_IR
string=ಗುರುವಾರ 07 ಜೂ 1908       result=fail     locale=kn_IN

Looks like g++ is failing on multibyte character sequences but
not on the spaces. We seem to somehow manage to process the
multibyte sequences (I wonder how, or if it's a weakness in
the test) but have issues with the leading space in bg_BG.
I don't know what the problem is with the other locales...

It's hard to say from just looking at the code (and I haven't looked
very carefully). In general, we [try to] to implement the POSIX
semantics, so if it works with strptime()/strftime() it should work
with our time_put_byname/ time_get_byname.


Well, there's the problem right there. The standard requires that the
time_put<> facet format its output according to the POSIX function
strftime(), with the option for supporting extensions. It makes no
indication that the time_get<> facet should read data in such a way as
to be compatible with strptime(). The only thing I see that says
anything about the format expecte by time_get<> is here...

[...]


Right. Pretty vague.


This paragraph says that time_get<>::get_date() is supposed to process
the output of time_put<>::put(..., 'x').

  [22.2.5.1.2 p4] Effects: Reads characters starting at  s until it has
  extracted  those  struct tm members, and remaining format characters,
  used by  time_put<>::put  to produce  the  format specified by 'x' or
  until it encounters an error.


Yes. The problem with the C++ standard in this area is that the
requirements a vague and not always implementable (e.g., the
multibyte sequences -- all the narrow specializations of the
_get facets operate on single characters).

If we test this behavior it's gotta be right ;-) Where does POSIX
say leading spaces must be skipped? I see this under %e: Equivalent
to %d. And under %d: The day of the month [01,31]; leading zeros
are permitted but not required. Nothing about ignoring spaces.


Absolutely. The docs for POSIX strftime()...

[...]

So strftime() isn't even compatible with strptime() when it comes to '%e'.


Hmm. That seems like a bug in POSIX then, unless we're missing
something. You might want to create a POSIX-only test case to
verify this and if I'm right open a discussion on the Austin
Group list (http://www.opengroup.org/austin/lists.html).

[...]

Unfortunately, without consistent input/output it is going to be
difficult for this multi-threading test to verify that no data
corruption is occuring with arbitrary locales. Hopefully there is some
system in place that allows us to explicitly specify which locales are
to be used for a test.


Not really. My approach would be to detect locales with this
problem and avoid using them. The test also doesn't need to
be exhaustive, at least not in this iteration. I think
exercising just the most common patterns should be good enough
(although %X is pretty common :)

Martin

Re: expectation vs requirements for locale facets

Reply via email to