I just found a comment, that windows UNICODE is UCS-2. What do you think
about the following specific code for Windows to convert from the decoded
ANSI input to UTF-8:

  // Convert file ANSI to Windows UNICODE (AKA UCS-2)
 MultiByteToWideChar(CP_ACP,0,....);

   // now convert from Windows UNICODE (AKA UCS-2) to UTF-8

  WideCharToMultiByte(CP_UTF8,0,....);


on linux we could use iconv, or whatever.

Where are you proposing to put this? In the Perl code?

Nope into ssphys. I just experimented with this and some russian text:

Take the following function:

std::string ACPToUTF8 (const char* pBuffer, int nLength = -1)
{
  // Convert file from ACP to UTF-8 via the Windows UNICODE (AKA UCS-2)

  int nWideLength = MultiByteToWideChar(CP_ACP, 0,pBuffer,nLength,NULL,0);
  wchar_t* pWideBuffer = new wchar_t[nWideLength];
  MultiByteToWideChar(CP_ACP, 0,pBuffer,nLength,pWideBuffer,nWideLength);

int nUtf8Length = WideCharToMultiByte(CP_UTF8, 0,pWideBuffer,nWideLength,NULL,0, NULL, NULL);
  char* pUtf8Buffer = new char[nUtf8Length];
WideCharToMultiByte(CP_UTF8, 0,pWideBuffer,nWideLength,pUtf8Buffer,nUtf8Length, NULL, NULL);

  std::string utf8 (pUtf8Buffer, nUtf8Length);
delete [] pWideBuffer;
  delete [] pUtf8Buffer;

  return utf8;
}

I have cyrillic text in the codepage 855. If I set the codepage to 855 before calling this function, my text will be converted into utf8. I have loaded the resulting XML file in a UTF8 aware editor and the results looked good.

dirk
_______________________________________________
vss2svn-users mailing list
Project homepage:
http://www.pumacode.org/projects/vss2svn/
Subscribe/Unsubscribe/Admin:
http://lists.pumacode.org/mailman/listinfo/vss2svn-users-lists.pumacode.org
Mailing list web interface (with searchable archives):
http://dir.gmane.org/gmane.comp.version-control.subversion.vss2svn.user

Reply via email to