Reloading configuration when using imputstream resources results in 
org.xml.sax.SAXParseException
-------------------------------------------------------------------------------------------------

                 Key: HADOOP-7614
                 URL: https://issues.apache.org/jira/browse/HADOOP-7614
             Project: Hadoop Common
          Issue Type: Bug
          Components: conf
    Affects Versions: 0.21.0
            Reporter: Ferdy
            Priority: Minor


When using an inputstream as a resource for configuration, reloading this 
configuration will throw the following exception:

Exception in thread "main" java.lang.RuntimeException: 
org.xml.sax.SAXParseException: Premature end of file.
        at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1576)
        at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1445)
        at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1381)
        at org.apache.hadoop.conf.Configuration.get(Configuration.java:569)
...
Caused by: org.xml.sax.SAXParseException: Premature end of file.
        at 
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:249)
        at 
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
        at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
        at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1504)
        ... 4 more

To reproduce see following testcode:
    Configuration conf = new Configuration();
    ByteArrayInputStream bais = new 
ByteArrayInputStream("<configuration></configuration>".getBytes());
    conf.addResource(bais);
    System.out.println(conf.get("blah"));
    conf.addResource("core-site.xml"); //just add a named resource, doesn't 
matter which one
    System.out.println(conf.get("blah"));

Allowing inputstream resources is flexible, but in cases such as this in can 
lead to difficult to debug problems.

What do you think is the best solution? We could:
A) reset the inputstream after it is read instead of closing it (but what to do 
when the stream does not support marking?)
B) leave it up to the client (for example make sure you implement close() so 
that it resets the steam)
C) when reading the inputstream for the first time, cache or wrap the contents 
somehow so that is can be read multiple times (let's at least document it)
D) remove inputstream method altogether
e) something else?

For now I have attached a patch for solution A.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to