Re: UNICODE version of _T(x) macro

sowmya satyanarayana Tue, 23 Nov 2010 02:18:00 -0800

>>> Asmus Freytag wrote
> TCHAR x = _T('x');
> TCHAR * x = _T("x");
>
> that is to wrap a string or character literal so that it can be used either 
> as 
>Unicode literal or as non-Unicode literal, depending on whether some global 
>compile time flat (usually UNICODE or _UNICODE) is set or 
>
> not.
>
> The usual way a _T macro is defined is something like:
>
> #ifdef UNICODE
> #define _T(x) L##x
> #else
> #define _T(x) x
> #endif
>
> That defintion relies on the compiler to support L'x' or L"string" by using 
>UTF-16.



This what I am actually looking for. My ODBC application supports UTF-16, which 
is 2 byte width characters. This application is completely oriented around 
using 
_T(x) macro as Asmus Freytag figured out.
I could not get away using this compiler dependent macro _T( ) for many 
historical reasons or because of restriction by the underlying layers(it always 
expects unsigned short text) that is being used by the  ODBC application.So in 
my case I need _T('x') or _T("x") to be defined to 16 bit all ways.The most 
worrying part is that the macro definition of _T(x) is dependent on compiler to 
support L'x' or L"x" as I need my application to be portable on Unix 
platforms,which can have varying behavior.
In order to get similar behavior of _T( ) both in case of literal and strings, 
I 
tried some thing like this.


Case-1) Constant cases can be addressed by macro 
#define __convert_to_integer(x) x
 
Case-2) String cases can be addressed by function
TCHAR *__copy_to_unicode( char *src )
{
  int dest_len = strlen(src);
  TCHAR *dest = malloc(strlen(src)*sizeof(TCHAR));
  TCHAR *ptr = dest;
  while( *src && dest_len > 0 ) {
        *ptr++ = *src++;
        dest_len --;
  }
  *ptr = '\0';
  return dest;
              }
Using these two functions we need formulate a macro for _T( ). Could get good 
way to do this.

 
Sample C program
#include <stdio.h>
typedef unsigned short TCHAR; 

/* Constant cases can be addressed by following macro */
#define __convert_to_integer(x) x

/* String cases can be addressed by following function */
TCHAR *__copy_to_unicode( char *src )
{
     int dest_len = strlen(src);
    TCHAR *dest = malloc(strlen(src)*sizeof(TCHAR));
    TCHAR *ptr = dest;
    while( *src && dest_len > 0 ) {
        *ptr++ = *src++;
        dest_len --;
      }
     *ptr = '\0';
    return dest;
}
 
int main()
{
    typedef struct st_AlternateCol
     {
         TCHAR  *pszName;
         signed short   sType;
         int   ulLen;
        signed short   fNullable;
    }
    _ALT_COL;
    char *string = "SELECT * FROM DUAL";
    char src = 's';
    TCHAR *pSql;
    TCHAR ch;
    TCHAR *concatstr = malloc((strlen(string)*2)*sizeof(TCHAR));
    TCHAR *ternary;
      
// Variable initialization
             pSql =__copy_to_unicode(string);
             ch = __convert_to_integer(src);
 
//Conditional check
             if(*pSql == __convert_to_integer('S'))
                      printf("string starts with letter S\n");
              else
                      printf("string does not start with letter S\n");
 
//As constant
            switch(*pSql)
           {
                   case __convert_to_integer('S'): printf("matched with S\n");
                                                                   break;
                   default: printf("did not match\n");
            }
 
//Arguments to function 
            /* For Unicode string we can't use string.h functions. Rather use 
functions from odbc.h like M_FSTRCPYU. Just to check if the function is working 
I have added following part. 

            The output will not be Unicode string. */
 
            strncpy(concatstr,__copy_to_unicode(string),strlen(string));
            strcat(concatstr,pSql);
            printf("Concatenated string is: %s\n", concatstr);
 
//Structure member initialize
            _ALT_COL  AltDescCol[] =
             {
                   { (TCHAR *)__copy_to_unicode("TABLE_CAT"), 12, 31L, 1  },
                   { (TCHAR *)__copy_to_unicode("TABLE_SCHEM"), 12, 30L, 1  },
              };
 
              

 //Argument to ternary operator
             ternary = (1>2)? __copy_to_unicode("Greater"): 
__copy_to_unicode("Less");
              printf("Ternary testing: %s\n",ternary);
 
return 0;
}
I don't know if I have to do some thing like this in order to have 
_T("x"")/_T('x') represent UTF-16 characters i.e. 2 byte width.But doing the 
above way is not practical.
I wanted to know if there is better way to have macro definition for 
_T('x')/_T("xyz") which can be independent of compiler and with 2 byte (UTF-16) 
wide character.

Thanks in advance.
Sowmya.




________________________________
From: Asmus Freytag <[email protected]>
To: "Phillips, Addison" <[email protected]>
Cc: Doug Ewell <[email protected]>; sowmya satyanarayana 
<[email protected]>; [email protected]
Sent: Tue, 23 November, 2010 12:38:37  AM
Subject: Re: UNICODE version of _T(x) macro

On 11/22/2010 10:18 AM, Phillips, Addison wrote:
>> sowmya satyanarayana<sowmya underscore satyanarayana at yahoo dot
>> com>
>> wrote:
>> 
>>> Taking this, what is the best way to define _T(x) macro of
>> UNICODE version, so
>>> that my strings will always be
>>> 2 byte wide character?
>> Unicode characters aren't always 2 bytes wide.  Characters with
>> values
>> of U+10000 and greater take two UTF-16 code units, and are thus 4
>> bytes
>> wide in UTF-16.
>> 
> Not exactly. The code units for UTF-16 are always 16-bits wide. Supplementary 
>characters (those with code points>= U+10000) use a surrogate pair, which are 
>two 16-bit code units. Most processing and string traversal is in terms of  
>the 
>16-bit code units, with a special case for the surrogate pairs.
> 
> It is very useful when discussing Unicode character encoding forms to 
>distinguish between characters ("code points") and their in memory 
>representation ("code units"), rather than using non-specific terminology such 
>as "character".
> 
> If you want to use UTF-32, which uses 32-bit code units, one per code point, 
>you can use a 32-bit data type instead. Those are always 4 bytes wide.

The question is relevant to the C and C++ languages.

What is asked: which native data type to I use to make sure I end up with a 
16-bit code unit.

The usual way a _T macro is used is

TCHAR x = _T('x');
TCHAR * x = _T("x");

that is to wrap a string or character literal so that it can be used either as 
Unicode literal or as non-Unicode literal, depending on whether some global 
compile time flat (usually UNICODE or _UNICODE) is set or  not.

The usual way a _T macro is defined is something like:

#ifdef UNICODE
#define _T(x) L##x
#else
#define _T(x) x
#endif

That defintion relies on the compiler to support L'x' or L"string" by using 
UTF-16.

A few years ago, there was a proposal to amend the C standard to have a way to 
ensure that this is the case in a cross platform way. I can't recall offhand 
what became of it.

A./

Re: UNICODE version of _T(x) macro

Reply via email to