Here's some code to test in ResourcesMojo to try and sniff binary files
out from text without looking at extensions...

Have a play, let me know if it is useful.

Regards,
John


    private void copyFile(File from, final File to, boolean filtering)
            throws IOException {
        FileUtils.FilterWrapper[] wrappers = null;
        if (filtering && isTextFile(from)) {

...



    /**
     * Attempt to determine if a file is text or binary by examining for
a BOM
     * or unprintable ASCII characters.
     * 
     * @param file a file to test
     * @return true if the file is a text file
     */
    private final boolean isTextFile(File file) {
        FileInputStream fis = null;
        try {
            fis = new FileInputStream(file);
            byte[] bom = new byte[4];
            int read = fis.read(bom);
            /** Try to detect these BOM formats which all indicate a
text file
             * 
             * 00 00 FE FF UTF-32, big-endian 
             * FF FE 00 00 UTF-32, little-endian 
             * EF BB BF    UTF-8 
             * FE FF       UTF-16, big-endian 
             * FF FE       UTF-16, little-endian 
             */
            if (read == 4) {
                if ((bom[0] == 0x00 && bom[1] == 0x00 && bom[2] == 0xFE
&& bom[3] == 0xFF)) {
                    if (getLog().isDebugEnabled()) {
                        getLog().debug(file.getAbsolutePath() + "
UTF-32BE encoded");
                    }
                    return true;
                }
                if ((bom[0] == 0xFF && bom[1] == 0xFE && bom[2] == 0x00
&& bom[3] == 0x00)) {
                    if (getLog().isDebugEnabled()) {
                        getLog().debug(file.getAbsolutePath() + "
UTF-32LE encoded");
                    }
                    return true;
                }
            }
            if (read >= 3) {
                if ((bom[0] == 0xEF && bom[1] == 0xBB && bom[2] ==
0xBF)) {
                    if (getLog().isDebugEnabled()) {
                        getLog().debug(file.getAbsolutePath() + " UTF-8
encoded");
                    }
                    return true;
                }
            }
            if (read >= 2) {
                if ((bom[0] == 0xFE && bom[1] == 0xFF)) {
                    if (getLog().isDebugEnabled()) {
                        getLog().debug(file.getAbsolutePath() + "
UTF-16BE encoded");
                    }
                    return true;
                }
                if ((bom[0] == 0xFF && bom[1] == 0xFE)) {
                    if (getLog().isDebugEnabled()) {
                        getLog().debug(file.getAbsolutePath() + "
UTF-32LE encoded");
                    }
                    return true;
                }
            }

            /** Check some bytes to see if there are unprintable ASCII
chars and
                if there are, then this is probably a binary file.
             */
            for (int b = 0; b < read; b++) {
                if (isNotASCIIChar(bom[b] & 0xFF)) {
                    if (getLog().isDebugEnabled()) {
                        getLog().debug(file.getAbsolutePath() + " binary
encoded");
                    }
                    return false;
                }
            }
            int inchar = -1, ccount = 0;
            while ((inchar = fis.read()) != -1 && ccount++ < 1024) {
                if (isNotASCIIChar(inchar)) {
                    if (getLog().isDebugEnabled()) {
                        getLog().debug(file.getAbsolutePath() + " binary
encoded");
                    }
                    return false;
                }
            }
        } catch (IOException ex) {
            getLog().debug(ex);
        } finally {
            if (fis != null) {
                try {
                    fis.close();
                } catch (Exception ex) {
                // ignore
                }
            }
        }
        if (getLog().isDebugEnabled()) {
            getLog().debug(file.getAbsolutePath() + " ASCII encoded");
        }
        return true;
    }
    
    private static final boolean isNotASCIIChar(int inchar) {
        return inchar < 0 || (inchar >= 0 && inchar <= 0x1F && inchar !=
0x09 && inchar != 0x0A && inchar != 0x0C && inchar != 0x0D && inchar !=
0x1A) || inchar == 0x7F;
    }

Eurobase International Limited and its subsidiaries (Eurobase) are unable to 
exercise control over the content of information in E-Mails. Any views and 
opinions expressed may be personal to the sender and are not necessarily those 
of Eurobase. Eurobase will not enter into any contractual obligations in 
respect of any part of its business in any E-mail. 

Privileged / confidential information may be contained in this message and /or 
any attachments. This E-mail is intended for the use of the addressee(s) only 
and may contain confidential information. If you are not the / an intended 
recipient, you are hereby notified that any use or dissemination of this 
communication is strictly prohibited.  If you receive this transmission in 
error, please notify us immediately, and then delete this E-mail. 

Neither the sender nor Eurobase accepts any liability whatsoever for any 
defects of any kind either in or arising from this E-mail transmission. E-Mail 
transmission cannot be guaranteed to be secure or error-free, as messages can 
be intercepted, lost, corrupted, destroyed, contain viruses, or arrive late or 
incomplete. Eurobase does not accept any responsibility for viruses and it is 
your responsibility to scan any attachments.

Eurobase Systems Limited is the main trading company in the Eurobase 
International Group; registered in England and Wales as company number 
02251162; registered address: Essex House, 2 County Place, Chelmsford, Essex 
CM2 0RE, UK.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to