[Ur] An issue with Cyrillic characters

Artyom Shalkhakov Thu, 04 Jul 2013 21:48:02 -0700

Hello list,

I'm trying to persist some strings with Cyrillic characters in them
into a Postgres 9.1 database. Here's my program:


table entry : {Id : int, Title: string}
  PRIMARY KEY Id
sequence entryS

fun new_handle r =
  id <- nextval entryS;
  dml (INSERT INTO entry (Id, Title) VALUES ({[id]}, {[r.Title]}));
  return <xml><body><p>OK</p></body></xml>

fun main (): transaction page =
  return <xml><body>
  <form>
    Title: <textbox {#Title}/>
    <submit action={new_handle}/>
  </form>
</body></xml>

When I submit "текст" to Ur/Web, I get an error along these lines:

Fatal error: /home/user/proj/simple.ur:7:2-10:2: DML failed:
INSERT INTO uw_Simple_entry (uw_Id, uw_Title) VALUES (20::int8,
E'\377\377\377\377\377\377\377\377'::text)
ERROR:  invalid byte sequence for encoding "UTF8": 0xff

I've prepared a patch (attached; it is made against the tip revision).
The behaviour of sprintf/printf for characters with high bit set is
unexpected on my system, for instance, the following program:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char** argv) {
  char c = (char)255;

  printf("%03o\n", c);

  return 0;
}

prints "37777777777". If [c] is cast to [unsigned char], then the
program prints "377" (as expected). I'm wondering if this has to do
with locale? FYI, on my system, LANG is set to en_US.UTF-8.

--
Cheers,
Artyom Shalkhakov

tip.patch
Description: Binary data

_______________________________________________
Ur mailing list
[email protected]
http://www.impredicative.com/cgi-bin/mailman/listinfo/ur

[Ur] An issue with Cyrillic characters

Reply via email to