Hello Cazen,

This looks to me like this is suffering from an unintended side effect of 
closing the FileSystem object.  Hadoop internally caches instances of the 
FileSystem class, and the same instance can be returned to multiple call sites. 
 Even after one call site closes it, it's possible that other call sites still 
hold a reference to that same FileSystem instance.  Closing the FileSystem 
instance makes it unusable.

HdfsAdmin#getInotifyEventStream is likely using the same FileSystem instance 
that your own FileSystem.get call returns.  By closing it (using 
try-with-resources), that FileSystem instance is made invalid for the 
subsequent calls to retrieve inotify events.

The FileSystem cache is a fairly common source of confusion.  However, its 
current behavior is considered by design.  For reasons of 
backwards-incompatibility, we can't easily change its behavior to help with 
confusing situations like this.  (Sorry!)

A few suggestions to try:

1. Just don't close the FileSystem.  Even if you don't close it explicitly, it 
will be closed at process teardown via a shutdown hook.  This definitely looks 
wrong from a resource management perspective, but a lot of applications work 
this way.

2. Call FileSystem#newInstance instead of FileSystem#get.  The newInstance 
method is guaranteed to return an instance unique to that call site, not a 
shared instance potentially in use by other call sites.  If you use 
newInstance, then you must guarantee it gets closed to avoid a leak with a 
long-term impact.

3. You can disable the FileSystem cache for specific file system types by 
editing core-site.xml and setting property fs.<file system 
type>.impl.disable.cache to true, e.g. fs.hdfs.impl.disable.cache.  In general, 
disabling the cache is not desirable, because the performance benefits of the 
cache are noticeable.  Sometimes this is a helpful workaround for specific 
applications though.

--Chris Nauroth

From: Cazen Lee <cazen....@gmail.com<mailto:cazen....@gmail.com>>
Date: Thursday, April 28, 2016 at 5:53 PM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" 
<user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: [HDFS-inotify] "IOException: The client is stopped" after reading file


Good day this is Cazen
Could I kindly ask about something weird situation when reading file in hdfs 
with inotify polling

- Env : MacOS, EMR, Linux(standalone) - same problem
- Version : Hadoop 2.7.2

1. I would like to write down a code that read file under particular location 
when it created(with using inotify)
    So I modify sample code based on "hdfs-inotify-example" in github
https://github.com/onefoursix/hdfs-inotify-example/blob/master/src/main/java/com/onefoursix/HdfsINotifyExample.java

2. I've changed code with read and print line to console when it renamed
https://github.com/onefoursix/hdfs-inotify-example/commit/82485881c5da85a46dd1741c2d8420c7c4e81f93

case RENAME:
    Event.RenameEvent renameEvent = (Event.RenameEvent) event;
    Configuration conf = new Configuration();
    conf.set("fs.defaultFS", defaultFS);
    System.out.println(renameEvent.getDstPath() + " " + inputPath.getPath());
    if (renameEvent.getDstPath().startsWith(inputPath.getPath())) {
        //Try to read file
        try (FileSystem fs = FileSystem.get(conf)) {
            Path filePath = new Path(defaultFS + renameEvent.getDstPath());
            BufferedReader br = new BufferedReader(new 
InputStreamReader(fs.open(filePath)));
            String line;
            line = br.readLine();
            while (line != null) {
                System.out.println(line);
                line = br.readLine();
            }
            br.close();
        }
    }

3. It works. But I encountered IOException in next eventStream.take() after 
file read. It doesn't happen if I do not read file on hdfs.
-------------CODE-------------
DFSInotifyEventInputStream eventStream = admin.getInotifyEventStream();
EventBatch batch = eventStream.take();

-------------LOG-------------
Cazens-MacBook-Pro:hdfs-inotify-example Cazen$ java -jar 
target/hdfs-inotify-example-uber.jar hdfs://localhost:8032/cazen/
lastReadTxid = 0
log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.
TxId = 3134
event type = CREATE
  path = /cazen/test2.txt._COPYING_
  owner = Cazen
  ctime = 1461850245559
TxId = 3138
event type = CLOSE
TxId = 3139
event type = RENAME
/cazen/test2.txt /cazen/
--------------------File Start
Input File Text Sample LOL
--------------------File END
Exception in thread "main" java.io.IOException: The client is stopped
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1507)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy9.getEditsFromTxid(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolTranslatorPB.java:1511)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.getEditsFromTxid(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSInotifyEventInputStream.poll(DFSInotifyEventInputStream.java:111)
at 
org.apache.hadoop.hdfs.DFSInotifyEventInputStream.take(DFSInotifyEventInputStream.java:224)
at com.onefoursix.HdfsINotifyExample.main(HdfsINotifyExample.java:40)

There is possibility that I may have written the wrong code. If anyone already 
know about this situation, could I ask the reason?
Any advice would be appreciated.
Thank you Have a good day :)

--
cazen....@gmail.com<mailto:cazen....@gmail.com>
cazen....@samsung.com<mailto:cazen....@samsung.com>
http://www.cazen.co.kr

Reply via email to