Re: [GENERAL] UTF-8 on Postgres wire protocol

2016-12-21 Thread Michael Paquier
On Thu, Dec 22, 2016 at 8:25 AM, Rui Pacheco wrote: > I’m toying around with the wire protocol and came across something I don’t > understand. > > I created a table with two columns, one called “id” and one called “señor”. > When I select from that table I get the list of

[GENERAL] UTF-8 on Postgres wire protocol

2016-12-21 Thread Rui Pacheco
I’m toying around with the wire protocol and came across something I don’t understand. I created a table with two columns, one called “id” and one called “señor”. When I select from that table I get the list of columns and while its fairly easy to identify the column with the name “id”, I’m

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-20 Thread Daniel Verite
Dev Kumkar wrote: Succeeds but as replied earlier it creates database with LC_COLLATE = 'English_United States.1252' which corresponds to Latin1. Despite windows-1252 being a monobyte encoding sharing most of LATIN1 codes and character set, it does not mean that English_United

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-20 Thread Dev Kumkar
On Thu, Feb 20, 2014 at 4:34 PM, Daniel Verite dan...@manitou-mail.orgwrote: Despite windows-1252 being a monobyte encoding sharing most of LATIN1 codes and character set, it does not mean that English_United States.1252 is limited to this character set. You may use UTF-8 databases with that

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-20 Thread Dev Kumkar
On Thu, Feb 20, 2014 at 3:04 AM, Gavin Flower gavinflo...@archidevsys.co.nz wrote: Upgrade servers to Linux? :-P Actually that's not the solution but running away from it. There is a heavy footprint of customers and huge market on windows too and so not that easy to migrate and convince in

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-20 Thread Gavin Flower
On 21/02/14 02:04, Dev Kumkar wrote: On Thu, Feb 20, 2014 at 3:04 AM, Gavin Flower gavinflo...@archidevsys.co.nz mailto:gavinflo...@archidevsys.co.nz wrote: Upgrade servers to Linux? :-P Actually that's not the solution but running away from it. There is a heavy footprint of customers

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-20 Thread Dev Kumkar
On Fri, Feb 21, 2014 at 12:14 AM, Gavin Flower gavinflo...@archidevsys.co.nz wrote: On 21/02/14 02:04, Dev Kumkar wrote: On Thu, Feb 20, 2014 at 3:04 AM, Gavin Flower gavinflo...@archidevsys.co.nz wrote: Upgrade servers to Linux? :-P Actually that's not the solution but running

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-20 Thread Adrian Klaver
On 02/20/2014 11:40 AM, Dev Kumkar wrote: Hmm. Don't want to digress here and loose the topic context. Here would really appreciate if there are any suggestions for UTF-8 collation on Windows? Well I dug out a Windows machine and tried to get what you wanted, to no avail. As far as I know

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-20 Thread Dev Kumkar
On Fri, Feb 21, 2014 at 1:26 AM, Adrian Klaver adrian.kla...@aklaver.comwrote: Well I dug out a Windows machine and tried to get what you wanted, to no avail. As far as I know there is no UTF8 collation, it is an encoding. What you want if I am following, is the en_US locale (or equivalent for

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-20 Thread Adrian Klaver
On 02/20/2014 11:40 AM, Dev Kumkar wrote: Hmm. Don't want to digress here and loose the topic context. Here would really appreciate if there are any suggestions for UTF-8 collation on Windows? Just had idea, not sure how feasible it is in your situation though. Run Postgres in a Linux VM

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-20 Thread Adrian Klaver
On 02/20/2014 12:27 PM, Dev Kumkar wrote: On Fri, Feb 21, 2014 at 1:26 AM, Adrian Klaver adrian.kla...@aklaver.com mailto:adrian.kla...@aklaver.com wrote: Well I dug out a Windows machine and tried to get what you wanted, to no avail. As far as I know there is no UTF8 collation, it is

[GENERAL] UTF-8 collation on Windows?

2014-02-19 Thread Dev Kumkar
Am really going no where with this after so many searching over net or am missing some basic things, not sure! What is the equivalent for en_US.UTF-8 collation in case of windows? In Linux am creating database with following options, as follows: -E utf8 -l en_US.UTF-8 -T template0 This creates

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-19 Thread Adrian Klaver
On 02/19/2014 06:41 AM, Dev Kumkar wrote: Am really going no where with this after so many searching over net or am missing some basic things, not sure! What is the equivalent for en_US.UTF-8 collation in case of windows? In Linux am creating database with following options, as follows: -E

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-19 Thread Dev Kumkar
On Wed, Feb 19, 2014 at 10:16 PM, Adrian Klaver adrian.kla...@aklaver.comwrote: I found the below that might help. I do not use Windows much any more so I do not have a machine handy to confirm. http://www.g-loaded.eu/2011/02/27/locale-windows/ Thanks for the pointer. *american_usa* works

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-19 Thread Adrian Klaver
On 02/19/2014 11:42 AM, Dev Kumkar wrote: On Wed, Feb 19, 2014 at 10:16 PM, Adrian Klaver adrian.kla...@aklaver.com mailto:adrian.kla...@aklaver.com wrote: I found the below that might help. I do not use Windows much any more so I do not have a machine handy to confirm.

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-19 Thread Dev Kumkar
On Thu, Feb 20, 2014 at 1:19 AM, Adrian Klaver adrian.kla...@aklaver.comwrote: So what is the exact command you are using? createdb -U postgres -E utf8 -l american_usa DBNAME Above command fails to create utf-8 LC_COLLATE. Regards...

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-19 Thread Adrian Klaver
On 02/19/2014 12:03 PM, Dev Kumkar wrote: On Thu, Feb 20, 2014 at 1:19 AM, Adrian Klaver adrian.kla...@aklaver.com mailto:adrian.kla...@aklaver.com wrote: So what is the exact command you are using? createdb -U postgres -E utf8 -l american_usa DBNAME Above command fails to create utf-8

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-19 Thread Dev Kumkar
On Thu, Feb 20, 2014 at 1:41 AM, Adrian Klaver adrian.kla...@aklaver.comwrote: What does it set LC_CTYPE to? So what happens if you do?: createdb -U postgres -E utf8 -l american_usa.65001 DBNAME *createdb: database creation failed: ERROR: invalid locale name: american_usa.65001 * or

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-19 Thread Adrian Klaver
On 02/19/2014 12:16 PM, Dev Kumkar wrote: On Thu, Feb 20, 2014 at 1:41 AM, Adrian Klaver adrian.kla...@aklaver.com mailto:adrian.kla...@aklaver.com wrote: What does it set LC_CTYPE to? So what happens if you do?: createdb -U postgres -E utf8 -l american_usa.65001 DBNAME

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-19 Thread Dev Kumkar
On Thu, Feb 20, 2014 at 2:01 AM, Adrian Klaver adrian.kla...@aklaver.comwrote: Just noticed you are not specifying the template database. Try using template0: createdb -U postgres -E utf8 --lc-ctype=american_usa --lc-collate=american_usa -T template0 DBNAME Same result i.e. LC_COLLATE and

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-19 Thread Adrian Klaver
On 02/19/2014 12:43 PM, Dev Kumkar wrote: On Thu, Feb 20, 2014 at 2:01 AM, Adrian Klaver adrian.kla...@aklaver.com mailto:adrian.kla...@aklaver.com wrote: Just noticed you are not specifying the template database. Try using template0: createdb -U postgres -E utf8

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-19 Thread Dev Kumkar
On Thu, Feb 20, 2014 at 2:24 AM, Adrian Klaver adrian.kla...@aklaver.comwrote: Alright last shot:) Taking hint from here: http://msdn.microsoft.com/en-us/library/x99tb11d.aspx try: createdb -U postgres -E utf8 -l en-US DBNAME If that does not work, not sure where to go. This won't

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-19 Thread Adrian Klaver
On 02/19/2014 01:09 PM, Dev Kumkar wrote: On Thu, Feb 20, 2014 at 2:24 AM, Adrian Klaver adrian.kla...@aklaver.com mailto:adrian.kla...@aklaver.com wrote: Alright last shot:) Taking hint from here: http://msdn.microsoft.com/en-__us/library/x99tb11d.aspx

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-19 Thread Dev Kumkar
On Thu, Feb 20, 2014 at 2:45 AM, Adrian Klaver adrian.kla...@aklaver.comwrote: Have you tried it? Note that the locale name is different then the one Linux. On Linux it is en_US. What I suggested is en-US. Yes. Here is the output: createdb -U postgres -E utf8 -l en-US -T template0

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-19 Thread Adrian Klaver
On 02/19/2014 01:21 PM, Dev Kumkar wrote: On Thu, Feb 20, 2014 at 2:45 AM, Adrian Klaver adrian.kla...@aklaver.com mailto:adrian.kla...@aklaver.com wrote: Have you tried it? Note that the locale name is different then the one Linux. On Linux it is en_US. What I suggested is

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-19 Thread John R Pierce
On 2/19/2014 1:21 PM, Dev Kumkar wrote: createdb -U postgres -E utf8 -l en-US -T template0 mynewdb Password: *createdb: database creation failed: ERROR: invalid locale name: en-US* I believe its en_US ... _ not - -- john r pierce 37N 122W somewhere

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-19 Thread Gavin Flower
On 20/02/14 10:28, Adrian Klaver wrote: On 02/19/2014 01:21 PM, Dev Kumkar wrote: On Thu, Feb 20, 2014 at 2:45 AM, Adrian Klaver adrian.kla...@aklaver.com mailto:adrian.kla...@aklaver.com wrote: Have you tried it? Note that the locale name is different then the one Linux. On

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-19 Thread Adrian Klaver
On 02/19/2014 01:30 PM, John R Pierce wrote: On 2/19/2014 1:21 PM, Dev Kumkar wrote: createdb -U postgres -E utf8 -l en-US -T template0 mynewdb Password: *createdb: database creation failed: ERROR: invalid locale name: en-US* I believe its en_US ... _ not - Unfortunately this is a Windows

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-19 Thread John R Pierce
On 2/19/2014 1:35 PM, Adrian Klaver wrote: Unfortunately this is a Windows install and that does not work either. windows encodings are a pain. their Unicode is NOT utf8, its ucs2 aka utf16. I just checked my default install of potsgres 9.2, it appears its using WIN1252 encoding,

Re: [GENERAL] UTF-8 collation on Windows?

2014-02-19 Thread Dev Kumkar
On Thu, Feb 20, 2014 at 3:17 AM, John R Pierce pie...@hogranch.com wrote: On 2/19/2014 1:35 PM, Adrian Klaver wrote: Unfortunately this is a Windows install and that does not work either. windows encodings are a pain. their Unicode is NOT utf8, its ucs2 aka utf16. I just checked my

[GENERAL] UTF-8 for bytea

2011-11-03 Thread Robert James
When trying to INSERT on Postgres (9.1) to a bytea column, via E'' escaped strings, I get the strings rejected because they're not UTF8. I'm confused, since bytea isn't for strings but for binary. What causes this? How do I fix this? (I know that escaped strings is not the best way for binary

Re: [GENERAL] UTF-8 for bytea

2011-11-03 Thread Marko Kreen
On Thu, Nov 3, 2011 at 4:34 AM, Robert James srobertja...@gmail.com wrote: When trying to INSERT on Postgres (9.1) to a bytea column, via E'' escaped strings, I get the strings rejected because they're not UTF8. I'm confused, since bytea isn't for strings but for binary.  What causes this? How

[GENERAL] UTF-8 and Regular expression

2011-05-31 Thread Håvard Wahl Kongsgård
Hi, in 8.4 how does the regular expression functions in postgresql handle special UTF-8 characters? for example: SELECT name,substring(name from E'\\w+\\s(\\w+)$') from nodes; fails to select characters like ü ø æ å -- Håvard Wahl Kongsgård http://havard.security-review.net/

Re: [GENERAL] UTF-8 and Regular expression

2011-05-31 Thread Tom Lane
=?ISO-8859-1?Q?H=E5vard_Wahl_Kongsg=E5rd?= haavard.kongsga...@gmail.com writes: Hi, in 8.4 how does the regular expression functions in postgresql handle special UTF-8 characters? Badly :-( for example: SELECT name,substring(name from E'\\w+\\s(\\w+)$') from nodes; fails to select

Re: [GENERAL] =?UTF-8?Q?select_random_order_by_random?=

2007-11-01 Thread Chris Browne
[EMAIL PROTECTED] (=?UTF-8?Q?piotr=5Fsobolewski?=) writes: I was very surprised when I executed such SQL query (under PostgreSQL 8.2): select random() from generate_series(1, 10) order by random(); I thought I would receive ten random numbers in random order. But I received ten random

[GENERAL] UTF-8 encoding problem

2007-08-16 Thread bhyuan
hi I use UTF-8 as server character encoding, and use sjis as client character encoding. For some reason, some none sjis encoding character was insert into the database. WHEN I use set client_encoding='SJIS select * from xxx I got such error message Native Error: ERROR: character 0xc2a0 of

Re: [GENERAL] UTF-8 encoding problem

2007-08-16 Thread Peter Eisentraut
Am Donnerstag, 16. August 2007 08:40 schrieb bhyuan: Can I ignore the error message by confiing the config file? No, there are not provisions for that. Some errors of this type used to be ignored, but that led to SQL injection-like security issues, so you don't want that. -- Peter

Re: [GENERAL] UTF-8 encoding problem

2007-08-16 Thread bhyuan
Thanks for your replay. Maybe SQL injection-like security issues will occour, but I find that differend version of Postgresql get different result. Such as the sql set client_encoding='SJIS'; select '\xc3\xaa',* from xxx; on V7.4 @RH3 got \xc3\xaa on [EMAIL PROTECTED] got (blank) on

[GENERAL] UTF-8 encoding

2007-08-16 Thread James B. Byrne
On Wed, August 15, 2007 21:15, Phoenix Kiula wrote: Thanks. Here's my locale information: locale LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 LC_NUMERIC=en_US.UTF-8 LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8

Re: [GENERAL] UTF-8 encoding problem

2007-08-16 Thread Peter Eisentraut
Am Donnerstag, 16. August 2007 15:21 schrieb bhyuan: Maybe SQL injection-like security issues will occour, but I find that differend version of Postgresql get different result. That just shows that some versions are more broken than others. But there was a lot of thought put into the current

[GENERAL] UTF-8 to ASCII

2007-05-11 Thread Martin Marques
I have a doubt about the function to_ascii() and what the documentation says. Basically, I passed my DB from latin1 to UTF-8, and I started getting an error when using the to_ascii() function on a field of one of my DB [1]: ERROR: la conversión de codificación de UTF8 a ASCII no está

Re: [GENERAL] UTF-8 to ASCII

2007-05-11 Thread LEGEAY Jérôme
for convert my DB, i use this process: createdb -T old_DB copy_old_DB dropdb old_DB createdb -E LATIN1 -T copy_old_DB new_DB_name maybe this process will help you. regards Jérôme LEGEAY Le 14:13 11/05/2007, vous avez écrit: I have a doubt about the function to_ascii() and what the

Re: [GENERAL] UTF-8 to ASCII

2007-05-11 Thread Martin Marques
LEGEAY Jérôme wrote: for convert my DB, i use this process: createdb -T old_DB copy_old_DB dropdb old_DB createdb -E LATIN1 -T copy_old_DB new_DB_name maybe this process will help you. As I said in my original mail, the DB conversion went OK, but I see some discrepancies in the

Re: [GENERAL] UTF-8 to ASCII

2007-05-11 Thread Arnaud Lesauvage
Martin Marques a écrit : I have a doubt about the function to_ascii() and what the documentation says. Basically, I passed my DB from latin1 to UTF-8, and I started getting an error when using the to_ascii() function on a field of one of my DB [1]: ERROR: la conversión de codificación de

Re: [GENERAL] UTF-8 to ASCII

2007-05-11 Thread Albe Laurenz
I have a doubt about the function to_ascii() and what the documentation says. Basically, I passed my DB from latin1 to UTF-8, and I started What do you mean by 'passed the DB from Latin1 to UTF8'? getting an error when using the to_ascii() function on a field of one of my DB [1]:

Re: [GENERAL] UTF-8 to ASCII

2007-05-11 Thread Alvaro Herrera
Martin Marques escribió: I have a doubt about the function to_ascii() and what the documentation says. Basically, I passed my DB from latin1 to UTF-8, and I started getting an error when using the to_ascii() function on a field of one of my DB [1]: ERROR: la conversión de codificación

Re: [GENERAL] UTF-8 to ASCII

2007-05-11 Thread Martin Marques
Albe Laurenz wrote: [2]: http://www.postgresql.org/docs/8.1/interactive/functions-string.html#FTN.AEN7625 Well, the documentation for to_ascii states clearly: The to_ascii function supports conversion from LATIN1, LATIN2, LATIN9, and WIN1250 encodings only. Sorry, didn't see the

Re: [GENERAL] UTF-8 to ASCII

2007-05-11 Thread Martin Gainty
-general@postgresql.org Sent: Friday, May 11, 2007 9:33 AM Subject: Re: [GENERAL] UTF-8 to ASCII Martin Marques escribió: I have a doubt about the function to_ascii() and what the documentation says. Basically, I passed my DB from latin1 to UTF-8, and I started getting an error when using

Re: [GENERAL] UTF-8 to ASCII

2007-05-11 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes: Why on earth is it talking about MULE_INTERNAL? IIRC, a lot of the conversions translate through some common intermediate charset to save on code/table space. In such cases the problem will usually be detected on the backend conversion...

Re: [GENERAL] UTF-8 to ASCII

2007-05-11 Thread Alvaro Herrera
Tom Lane escribió: Alvaro Herrera [EMAIL PROTECTED] writes: Why on earth is it talking about MULE_INTERNAL? IIRC, a lot of the conversions translate through some common intermediate charset to save on code/table space. In such cases the problem will usually be detected on the backend

Re: [GENERAL] UTF-8

2006-10-13 Thread Martijn van Oosterhout
On Thu, Oct 12, 2006 at 11:09:53PM +0200, Tomi NA wrote: 2006/10/12, Martijn van Oosterhout kleptog@svana.org: On Tue, Oct 10, 2006 at 11:49:06AM +0300, Martins Mihailovs wrote: There are some misunderstood. Im using Linux 2.6.16.4, postgresql 8.1.4, (there are one of locale: lv_LV.utf8,

Re: [GENERAL] UTF-8

2006-10-13 Thread Tomi NA
2006/10/13, Martijn van Oosterhout kleptog@svana.org: On Thu, Oct 12, 2006 at 11:09:53PM +0200, Tomi NA wrote: 2006/10/12, Martijn van Oosterhout kleptog@svana.org: On Tue, Oct 10, 2006 at 11:49:06AM +0300, Martins Mihailovs wrote: There are some misunderstood. Im using Linux 2.6.16.4,

Re: [GENERAL] UTF-8

2006-10-13 Thread Martijn van Oosterhout
On Fri, Oct 13, 2006 at 03:40:17PM +0200, Tomi NA wrote: This is a reoccurring topic on the list: sure, it's possible to misconfigure pg so that uppercase/lowercase/ilike/tsearch2/order don't work with a single letter outside of the English alphabet, but the problem Martins seems to be facing

Re: [GENERAL] UTF-8

2006-10-13 Thread Tomi NA
2006/10/13, Martijn van Oosterhout kleptog@svana.org: While sorting for multiple languages simultaneously is an issue, that's not the problem here. Linux/GLibc *does* support correct sorting for all language/charset combinations, and that's what he's using. Just for the hell of it I setup

Re: [GENERAL] UTF-8

2006-10-13 Thread Tom Lane
Tomi NA [EMAIL PROTECTED] writes: 2006/10/13, Martijn van Oosterhout kleptog@svana.org: Similarly, upper/lower are also supported, although postgresql doesn't take advantage of the system support in that case. I think this is the crux of the problem. If it were true, then it might be ...

Re: [GENERAL] UTF-8

2006-10-13 Thread Martijn van Oosterhout
On Fri, Oct 13, 2006 at 12:04:02PM -0400, Tom Lane wrote: Tomi NA [EMAIL PROTECTED] writes: 2006/10/13, Martijn van Oosterhout kleptog@svana.org: Similarly, upper/lower are also supported, although postgresql doesn't take advantage of the system support in that case. I think this is the

Re: [GENERAL] UTF-8

2006-10-13 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes: Characters havn't fitted in an unsigned char in a very long time. It's obviously bogus for any multibyte encoding (the code even says so). For such encodings you could use the system's towupper() (ANSI C/Unix98) which will work on any unicode

Re: [GENERAL] UTF-8

2006-10-13 Thread Martins Mihailovs
Martijn van Oosterhout wrote: On Thu, Oct 12, 2006 at 11:09:53PM +0200, Tomi NA wrote: 2006/10/12, Martijn van Oosterhout kleptog@svana.org: On Tue, Oct 10, 2006 at 11:49:06AM +0300, Martins Mihailovs wrote: There are some misunderstood. Im using Linux 2.6.16.4, postgresql 8.1.4, (there are

Re: [GENERAL] UTF-8

2006-10-12 Thread Martijn van Oosterhout
On Tue, Oct 10, 2006 at 11:49:06AM +0300, Martins Mihailovs wrote: There are some misunderstood. Im using Linux 2.6.16.4, postgresql 8.1.4, (there are one of locale: lv_LV.utf8, for Latvian language). But if I want do lower, then with standard latin symbols all is ok, but with others

Re: [GENERAL] UTF-8

2006-10-12 Thread Tomi NA
2006/10/12, Martijn van Oosterhout kleptog@svana.org: On Tue, Oct 10, 2006 at 11:49:06AM +0300, Martins Mihailovs wrote: There are some misunderstood. Im using Linux 2.6.16.4, postgresql 8.1.4, (there are one of locale: lv_LV.utf8, for Latvian language). But if I want do lower, then with

Re: [GENERAL] UTF-8

2006-10-11 Thread Martins Mihailovs
Martijn van Oosterhout wrote: On Fri, Oct 06, 2006 at 12:44:43PM +0300, Martins Mihailovs wrote: I would be a glad to hear your solutions, experience in web application with multi languages (searching with indexing, sorting and others problems with multi byte encoding). For developers: what

Re: [GENERAL] UTF-8

2006-10-09 Thread Martijn van Oosterhout
On Fri, Oct 06, 2006 at 12:44:43PM +0300, Martins Mihailovs wrote: I would be a glad to hear your solutions, experience in web application with multi languages (searching with indexing, sorting and others problems with multi byte encoding). For developers: what are your future plans about

[GENERAL] UTF-8

2006-10-06 Thread Martins Mihailovs
Hello! I'm using PgSQL for a 3 years for web applications, but not only. But the main problem is in encoding. My web applications are used by international (mostly 3 languages: latvian (LATIN7), english and russian). The best (mostly) solution is to use UTF-8, but there are a lot of

[GENERAL] UTF-8, upper() and Chinese characters yielding blank result

2006-07-27 Thread Scott Eade
While I could see various multibyte issues in the archives and in the TODO list, I couldn't spot this exact issue: I am working with a database that uses UNICODE encoding. I have a varchar column (col_x) that includes a mix of Chinese and regular ASCII characters. On PostgreSQL 7.4.13 (on

Re: [GENERAL] UTF-8, upper() and Chinese characters yielding blank result

2006-07-27 Thread Peter Eisentraut
Scott Eade wrote: The problem appears on PostgreSQL 8.0.7 (on WinXP) PostgreSQL 8.0 on Windows does not support UTF-8. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 6: explain analyze is your friend

Re: [GENERAL] UTF-8, upper() and Chinese characters yielding blank result

2006-07-27 Thread Martijn van Oosterhout
On Thu, Jul 27, 2006 at 07:22:17PM +0200, Peter Eisentraut wrote: Scott Eade wrote: The problem appears on PostgreSQL 8.0.7 (on WinXP) PostgreSQL 8.0 on Windows does not support UTF-8. In addition, PostgreSQL is totally reliant on the OS for upper/lower/collation support, so there is no way

[GENERAL] UTF-8 and stripping accents

2006-06-15 Thread Christopher Murtagh
Greetings folks, I'm trying to write a stored procedure that strips accents from UTF-8 encoded text. I saw a thread on this list discussing something very similar to this on April 8th, and used it to start. However, I'm getting odd behaviour. My stored procedure: CREATE OR REPLACE FUNCTION

Re: [GENERAL] UTF-8 context of BYTEA datatype??

2006-06-01 Thread Rafal Pietrak
On Thu, 2006-06-01 at 02:00 +, Greg Sabino Mullane wrote: #!perl package testone; use DBI; printf SQL_INTEGER is %d\n, SQL_INTEGER; package testtwo; use DBI qw(:sql_types); printf SQL_INTEGER is %d\n, SQL_INTEGER; But this is not as bad as having to use DBD:Pg (or any other

Re: [GENERAL] UTF-8 context of BYTEA datatype??

2006-05-31 Thread Rafal Pietrak
On Tue, 2006-05-30 at 22:47 +0200, Martijn van Oosterhout wrote: That's why bytea need special encoding to get around this check. But may be you would know, why I should write: { pg_type = DBD::Pg::PG_BYTEA } instead of possibly more generic: { TYPE = SQL_BINARY } The later

Re: [GENERAL] UTF-8 context of BYTEA datatype??

2006-05-31 Thread Daniel Verite
Martijn van Oosterhout wrote: However, there is a solution: send the paramters seperate from the query. In fact, postgres has been able to do that for a while now but not all interfaces have been made to use it. My guess is that those other databases you've used were already doing

Re: [GENERAL] UTF-8 context of BYTEA datatype??

2006-05-31 Thread Martijn van Oosterhout
On Wed, May 31, 2006 at 11:31:28AM +0200, Daniel Verite wrote: Martijn van Oosterhout wrote: However, there is a solution: send the paramters seperate from the query. In fact, postgres has been able to do that for a while now but not all interfaces have been made to use it. My guess

Re: [GENERAL] UTF-8 context of BYTEA datatype??

2006-05-31 Thread Greg Sabino Mullane
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Rafal Pietrak asked: 2. I admitt, that I should have spotted myself, that the DBD::Pg::PG_BYTEA might not have been recognized without the use clausure, but the driver itself understands prity much of the underlaying datatypes - I fon't need to

Re: [GENERAL] UTF-8 context of BYTEA datatype??

2006-05-30 Thread SCassidy
: Sent by: Subject: Re: [GENERAL] UTF-8 context of BYTEA datatype

Re: [GENERAL] UTF-8 context of BYTEA datatype??

2006-05-30 Thread Rafal Pietrak
On Tue, 2006-05-30 at 09:05 -0700, [EMAIL PROTECTED] wrote: Did you try escaping the data: my $rc=$sth-bind_param(1, escape_bytea($imgdata), { pg_type = DBD::Pg::PG_BYTEA }); No. But: $ ./test Undefined subroutine main::escape_bytea called at ./test line 34. Where can I find one?

Re: [GENERAL] UTF-8 context of BYTEA datatype??

2006-05-30 Thread SCassidy
] UTF-8 context of BYTEA datatype

Re: [GENERAL] UTF-8 context of BYTEA datatype??

2006-05-30 Thread Daniel Verite
Rafal Pietrak wrote: On Mon, 2006-05-29 at 14:01 +0200, Martijn van Oosterhout wrote: How come the bytearea is *interpreted* as having encoding? Actually, it's not the bytea type that is being interpreted, it's the string you're sending to the server that is. Before you

Re: [GENERAL] UTF-8 context of BYTEA datatype??

2006-05-30 Thread Rafal Pietrak
On Tue, 2006-05-30 at 20:12 +0200, Daniel Verite wrote: Rafal Pietrak wrote: Hmmm, despite initial euphoria, this doesn't actually work. Just an idea: make sure DBD::Pg::PG_BYTEA is defined. If not, you're just lacking a use DBD::Pg; and the result :) This time it's a hit. Thenx!

Re: [GENERAL] UTF-8 context of BYTEA datatype??

2006-05-30 Thread Martijn van Oosterhout
On Tue, May 30, 2006 at 10:26:31PM +0200, Rafal Pietrak wrote: Now, this is probably not exactly the furum to discuss that, but: 1. I did quite a few scripts with DBI, not only for Postgesql in fact - scripts worked flowlessly between Oracle/Sybase and the old DBASE files, too. And I have

[GENERAL] UTF-8 context of BYTEA datatype??

2006-05-29 Thread Rafal Pietrak
Hi! Within a UTF-8 encoded database, I have a table: CREATE TABLE pics (id serial not null unique, img bytea); The table is originally initialized with a set of IDs. Then I'm using perl-script to insert apropriate images by means of UPDATEing rows: --within my script called

Re: [GENERAL] UTF-8 context of BYTEA datatype??

2006-05-29 Thread Peter Eisentraut
Am Montag, 29. Mai 2006 13:35 schrieb Rafal Pietrak: How come the bytearea is *interpreted* as having encoding? If you pass data in text mode, all data is subject to encoding handling. If you don't want that, you need to use the binary mode. Or to put it the other way around: What column

Re: [GENERAL] UTF-8 context of BYTEA datatype??

2006-05-29 Thread Martijn van Oosterhout
On Mon, May 29, 2006 at 01:35:58PM +0200, Rafal Pietrak wrote: The table is originally initialized with a set of IDs. Then I'm using perl-script to insert apropriate images by means of UPDATEing rows: --within my script called 'job'--- my $db =

Re: [GENERAL] UTF-8 context of BYTEA datatype??

2006-05-29 Thread Rafal Pietrak
On Mon, 2006-05-29 at 14:01 +0200, Martijn van Oosterhout wrote: How come the bytearea is *interpreted* as having encoding? Actually, it's not the bytea type that is being interpreted, it's the string you're sending to the server that is. Before you send bytea data in a query string, you

[GENERAL] utf-8 and cultural sensitive sorting

2005-07-12 Thread sknipe
Our product will be storing its character data in utf-8 format (unicode encoding). What is the best way to achive cultural sensitive sorting using the utf-8 data? Is it possible have the locale apply to a connection? If so, is the cultural sorting support mature in PostgreSQL? What type

Re: [GENERAL] utf-8 and cultural sensitive sorting

2005-07-12 Thread Richard Huxton
[EMAIL PROTECTED] wrote: Our product will be storing its character data in utf-8 format (unicode encoding). What is the best way to achive cultural sensitive sorting using the utf-8 data? See below. Is it possible have the locale apply to a connection? A locale applies to a whole

Re: [GENERAL] utf-8 and cultural sensitive sorting

2005-07-12 Thread Alex Stapleton
It depends what language you want to sort. Lots of languages do not have a sort alphabet. For example, Japanese. It can be quite difficult to sort unusual languages like this. I am not aware of any standard technique for sorting Japanese text other than keeping an arbitrarily sorted

Re: [GENERAL] utf-8 and cultural sensitive sorting

2005-07-12 Thread Tatsuo Ishii
It depends what language you want to sort. Lots of languages do not have a sort alphabet. For example, Japanese. It can be quite difficult to sort unusual languages like this. I am not aware of any standard technique for sorting Japanese text other than keeping an arbitrarily sorted

[GENERAL] UTF-8 and LC_CTYPE locale

2005-05-16 Thread Stefan Hans
Hi *, we are using PostgreSQL for data in different languages like English, German andFrench. The encoding and locale parameters on our OS (UTF-8 and en_US.UTF-8) had problems e.g. with german umlaut. After some tries we found encoding andlocale parameters(LATIN1 and de_DE.iso88591) which

[GENERAL] UTF-8 and =, LIKE problems

2004-11-03 Thread Edmund Lian
I am running a web-based accounting package (SQL-Ledger) that supports multiple languages on PostgreSQL. When a database encoding is set to Unicode, multilingual operation is possible. However, when a user's input language is set to say English, and the user enters data such as 79, the data

Re: [GENERAL] UTF-8 and =, LIKE problems

2004-11-03 Thread Michael Glaesemann
On Nov 4, 2004, at 1:24 PM, Edmund Lian wrote: I am running a web-based accounting package (SQL-Ledger) that supports multiple languages on PostgreSQL. When a database encoding is set to Unicode, multilingual operation is possible. snip / Semantically, one might expect U+FF17 U+FF19 to be

Re: [GENERAL] UTF-8 - ISO8859-1 conversion problem

2004-10-30 Thread Cott Lang
Thanks for the detailed reply, you've confirmed what I suspected. :) I guess I have some work to do! On Fri, 2004-10-29 at 10:19, J. Michael Crawford wrote: In my experience, there are just some characters that don't want to be converted, even if they appear to be part of the normal 8-bit

[GENERAL] UTF-8 - ISO8859-1 conversion problem

2004-10-29 Thread Cott Lang
ERROR: could not convert UTF-8 character 0x00ef to ISO8859-1 Running 7.4.5, I frequently get this error, and ONLY on this particular character despite seeing quite a bit of 8 bit. I don't really follow why it can't be converted, it's the same character (239) in both character sets. Databases are

Re: [GENERAL] UTF-8 - ISO8859-1 conversion problem

2004-10-29 Thread J. Michael Crawford
In my experience, there are just some characters that don't want to be converted, even if they appear to be part of the normal 8-bit character system. We went to Unicode databases to hold our Latin1 characters because of this. There was even a case where the client was cutting and pasting

Re: [GENERAL] UTF-8 - ISO8859-1 conversion problem

2004-10-29 Thread J. Michael Crawford
Correction: Four things that need to be done, THREE if you're not serving up html. Sorry for the editing error. - Mike At 01:19 PM 10/29/2004, J. Michael Crawford wrote: In my experience, there are just some characters that don't want to be converted, even if they appear to be part

Re: [GENERAL] UTF-8 - ISO8859-1 conversion problem

2004-10-29 Thread Ian Pilcher
Cott Lang wrote: ERROR: could not convert UTF-8 character 0x00ef to ISO8859-1 Running 7.4.5, I frequently get this error, and ONLY on this particular character despite seeing quite a bit of 8 bit. I don't really follow why it can't be converted, it's the same character (239) in both character

Re: [GENERAL] UTF-8 question.

2004-09-17 Thread Pierre-Frdric Caillaud
= show client_encoding ; client_encoding - UNICODE (1 ligne) = select char_length('a'), bit_length('a'); char_length | bit_length -+ 1 | 8 (1 ligne) # that's an accented e = select char_length('é'), bit_length('é'); ; char_length |

[GENERAL] UTF-8 question.

2004-09-16 Thread Richard Connamacher
I'm new to PostgreSQL, and from the looks of it, it's a great database, and I'll be using more of it in the future. I had a quick question if anyone could clear this up. The documentation for PostgreSQL (version 7.1, the version this server is using) says that it supports multibyte character

Re: [GENERAL] UTF-8 question.

2004-09-16 Thread Dan Sugalski
At 8:39 PM -0400 9/16/04, Richard Connamacher wrote: I'm new to PostgreSQL, and from the looks of it, it's a great database, and I'll be using more of it in the future. I had a quick question if anyone could clear this up. The documentation for PostgreSQL (version 7.1, the version this server is

Re: [GENERAL] UTF-8 question.

2004-09-16 Thread Michael Glaesemann
On Sep 17, 2004, at 9:39 AM, Richard Connamacher wrote: UTF-8 is the 8-bit version of Unicode. The multibyte version of Unicode is UTF-16. UTF-8 encodes characters with varying numbers of bytes, not just 1 byte per character. IIRC, it's anywhere from 1 to 5 bytes, actually. PostgreSQL uses

Re: [GENERAL] UTF-8 question.

2004-09-16 Thread Richard Connamacher
Thanks to both Dan Sugalski and Michael Glaesemann for answering my question. I probably should have realized that, while Latin letters are one byte, the fact that others are encoded into up to 5-byte groups qualifies it as a multi-byte encoding. I don't anticipate having very many non-latin

Re: [GENERAL] UTF-8 question.

2004-09-16 Thread Tom Lane
Richard Connamacher [EMAIL PROTECTED] writes: 7.1 may be prehistoric, but it's running on an off-site server that I'm renting, and this version came pre-installed. Since it's already there and working, I'd like to get familiar with it before I try to reinstall a newer version. I doubt I'd know

  1   2   >