DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUGĀ·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=41455>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED ANDĀ·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=41455

           Summary: TarEntry.java: getName does not provide for non-ASCII
                    encoded entry names
           Product: Ant
           Version: 1.7.0RC1
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Core
        AssignedTo: dev@ant.apache.org
        ReportedBy: [EMAIL PROTECTED]


If a tar file contains entries that have non-ASCII encoding, for example encoded
in 'Shift_JIS', then the encoded entry's filename is not returned correctly in
the TarEntry getName method. It is converted to ASCII. (The zip classes work 
fine)

One possible way to remove the limitation, and to not cause any incompatibility,
is to add a new method, getByteName, which returns the name of the entry as a
byte array. 

I have enclosed a patch to TarInputStream.java which adds the ability to set and
return the byte name for the entry's name. This solution is probably incomplete.
Ideally, the TarEntry class should have a method getByteName, but more
modifcations are needed in order to do that.

I do not have good reproduction steps, but basically:

tar cvpf testfile.tar 'Shift_JIS filename'
tar xvpf testfile.tar

I do not see a way to attach a patch file, so I include the patch to
TarInputStream.java below:

--- TarInputStream.java.orig    2007-01-24 10:30:04.321949000 -0500
+++ TarInputStream.java 2007-01-24 16:45:38.477672000 -0500
@@ -19,8 +19,34 @@
 /*
  * This package is based on the work done by Timothy Gerard Endres
  * ([EMAIL PROTECTED]) to whom the Ant project is very grateful for his great 
code.
+ *
+ */
+
+/*
+ * The class is modified from the original to provide the ability
+ * to return the byte name for the entry.  The byte name array can
+ * be used to correctly construct a filename with an appropriate
+ * non-ASCII encoding.  The reason for the modifications is the
+ * TarInputStream class does not return non-ASCII names in the
+ * getName method.
+ *
+ * The method getNextEntry was modified from the original version to add
+ * extraction and setting of the protected variable byteName, the byte
+ * array name for the entry.
+ *
+ * New methods to process the byte name array have been added:
+ *    append        Append a byte buffer to the appendTo byte array.
+ *    parseByteName Parse the byte name from the header.
+ *    setByteName   Set this entry's byte name.
+ *    getByteName   Get this entry's byte name.
+ *
+ * The modifcations were written by
+ * Kelly G. Luetkemeyer
+ * The MathWorks, Inc.
+ * 1/24/2007
  */

+
 package org.apache.tools.tar;

 import java.io.FilterInputStream;
@@ -42,6 +68,7 @@
     protected long entrySize;
     protected long entryOffset;
     protected byte[] readBuf;
+    protected byte[] byteName;
     protected TarBuffer buffer;
     protected TarEntry currEntry;

@@ -191,6 +218,8 @@
      * If there are no more entries in the archive, null will
      * be returned to indicate that the end of the archive has
      * been reached.
+     *
+     * The byteName is set from the TarEntry header buffer.
      *
      * @return The next TarEntry in the archive, or null.
      * @throws IOException on error
@@ -254,10 +283,17 @@
             StringBuffer longName = new StringBuffer();
             byte[] buf = new byte[256];
             int length = 0;
+            byte [] tmpname;
+            tmpname = null;
+
             while ((length = read(buf)) >= 0) {
                 longName.append(new String(buf, 0, length));
+                tmpname = append(tmpname, buf, length);
             }
+
             getNextEntry();
+            this.byteName = tmpname;
+
             if (this.currEntry == null) {
                 // Bugzilla: 40334
                 // Malformed tar file - long entry name not followed by entry
@@ -269,8 +305,9 @@
                 longName.deleteCharAt(longName.length() - 1);
             }
             this.currEntry.setName(longName.toString());
+       } else {
+            this.byteName = parseByteName(headerBuf, TarConstants.NAMELEN);
         }
-
         return this.currEntry;
     }

@@ -387,4 +424,66 @@
             out.write(buf, 0, numRead);
         }
     }
+  /**
+    * Append a byte buffer to the appendTo byte array.
+    *
+    * @param appendTo The byte buffer to append data to.
+    * @param buf The byte buffer containing the new data.
+    * @param length The number of bytes in buf to copy
+    * @return The new array with appended data.
+    */
+   public byte [] append(byte[] appendTo, byte[] buf, int buflength) {
+
+      if (appendTo == null) {
+         byte [] results = new byte[buflength];
+         System.arraycopy(buf, 0, results, 0, buflength);
+         return results;
+
+      } else {
+         int length;
+         length = appendTo.length + buflength;
+         byte [] results = new byte[length];
+         System.arraycopy(appendTo, 0, results, 0, appendTo.length);
+         System.arraycopy(buf, 0, results, appendTo.length-1, buflength);
+         return results;
+      }
+   }
+
+   /**
+    * Parse the byte name from the header.
+    *
+    * @param header The TarEntry header array.
+    * @param length The number of byte to parse.
+    * @return The byte array name.
+    */
+   public byte [] parseByteName(byte[] header, int length) {
+      int end;
+      for(end = 0; end < length; ++end) {
+         if (header[end] == 0) {
+            break;
+         }
+      }
+      byte [] results = new byte[end];
+      System.arraycopy(header, 0, results, 0, end);
+      return results;
+   }
+
+  /**
+    * Set this entry's byte name.
+    *
+    * @param name This entry's name as a byte array.
+    */
+   public  void setByteName(byte [] name) {
+      this.byteName = name;
+   }
+
+   /**
+    * Get this entry's byte name.
+    *
+    * @return This entry's name as a byte array.
+    */
+   public byte[] getByteName() {
+      return this.byteName;
+   }
+
 }

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to