On Jun 8, 2009, at 2:23 PM, Klaus Berkling wrote:
Hi all. This seems it should work but it doesn't.I truncate a string that may contain Japanese characters, purely for display purposes. Double byte or multi-byte characters are split appart.Results look like this:お使いのコンピュータにDVDドライブが搭載れている かは�?[...] Here is the code:public String stringWithNoHTML(String aStringWithHTML, int lengthTruncated) {String returnValue = null; if (aStringWithHTML != null && aStringWithHTML.length() > 0) { //StringBuffer textBlock = new StringBuffer(aStringWithHTML); StringBuffer textBlock = new StringBuffer();Pattern htmlTagPattern = Pattern.compile("<(.|\n|\r)+?>|&[a-zA- Z0-9]+;");Matcher lineBreakMatcher = htmlTagPattern.matcher(aStringWithHTML); boolean results = lineBreakMatcher.find(); while (results) { lineBreakMatcher.appendReplacement(textBlock, " "); results = lineBreakMatcher.find(); } lineBreakMatcher.appendTail(textBlock); if (lengthTruncated > 0 && textBlock.length() > SUMMARY_LENGTH) { try {returnValue = new String(textBlock.toString().getBytes("UTF-8"), 0, lengthTruncated, "UTF-8");} catch (UnsupportedEncodingException ex) { returnValue = null; }//returnValue = new String(textBlock.substring(0, lengthTruncated) + "...");} else returnValue = textBlock.toString(); } return returnValue; }The original string may contain single byte characters as well. I expect the string to be properly truncated and not chop off bytes of the characters. It works fine with single byte characters.UsingreturnValue = new String(textBlock.toString().getBytes("UTF-8"), 0, lengthTruncated, "UTF-8");orreturnValue = new String(textBlock.substring(0, lengthTruncated) + "...");makes no difference. I also bypassed the regex patter and still see the same problem. Files, components, class, etc. are in UTF-8.
(For the archive)After a chat with the Java people at WWDC, this code seems to make the proper truncation:
int correctLengthTrucated = lengthTruncated; while (correctLengthTrucated > 0)//if ( Character.isWhitespace(textBlock.charAt(correctLengthTrucated)) )
if ( Character.isLetter(textBlock.charAt(correctLengthTrucated)) )
break;
else
correctLengthTrucated--;
returnValue = new String(textBlock.substring(0, correctLengthTrucated)
+ "...");
Thanks to all who helped. kib"Success is not final, failure is not fatal: it is the courage to continue that counts."
Winston Churchill Klaus Berkling Systems Administrator DynEd International, Inc. www.dyned.com | www.eskimo.com/~kiberkli
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list ([email protected]) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to [email protected]
