Kaifeng Huang created FLINK-16355:
-------------------------------------

             Summary: Inconsistent library versions notice.
                 Key: FLINK-16355
                 URL: https://issues.apache.org/jira/browse/FLINK-16355
             Project: Flink
          Issue Type: Improvement
            Reporter: Kaifeng Huang
         Attachments: apache flink.pdf

Hi. I have implemented a tool to detect library version inconsistencies. Your 
project have 9 inconsistent libraries and 9 false consistent libraries.

Take org.apache.hadoop:hadoop-common for example, this library is declared as 
version 2.4.1 in flink-yarn-tests, 3.1.0 in flink-filesystems/flink-s3-fs-base, 
2.7.5 in flink-table/flink-sql-client and etc... Such version inconsistencies 
may cause unnecessary maintenance effort in the long run. For example, if two 
modules become inter-dependent, library version conflict may happen. It has 
already become a common issue and hinders development progress. Thus a version 
harmonization is necessary.

Provided we applied a version harmonization, I calculated the cost it may have 
to harmonize to all upper versions including an up-to-date one. The cost refers 
to POM config changes and API invocation changes. Take 
org.apache.hadoop:hadoop-common for example, if we harmonize all the library 
versions into 3.1.3. The concern is, how much should the project code adapt to 
the newer library version. We list an effort table to quantify the 
harmonization cost.

The effort table is listed below. It shows the overall harmonization effort by 
modules. The columns represents the number of library APIs and API 
calls(NA,NAC), deleted APIs and API calls(NDA,NDAC) as well as modified API and 
API calls(NMA,NMAC). Modified APIs refers to those APIs whose call graph is not 
the same as previous version. Take the first row for example, if upgrading the 
library into version 3.1.3. Given that 103 APIs is used in module 
flink-filesystems/flink-fs-hadoop-shaded, 0 of them is deleted in a recommended 
version(which will throw a NoMethodFoundError unless re-compiling the project), 
55 of them is regarded as modified which could break the former API contract.
||Index||Module||NA(NAC)||NDA(NDAC)||NMA(NMAC)||
|1|flink-filesystems/flink-fs-hadoop-shaded|103(223)|0(0)|55(115)|
|2|flink-filesystems/flink-s3-fs-base|2(4)|0(0)|1(1)|
|3|flink-yarn-tests|0(0)|0(0)|0(0)|
|4|..|..|..|..|


Also we provided another table to show the potential files that may be affected 
due to library API change, which could help to spot the concerned API usage and 
rerun the test cases. The table is listed below.


||Module||File||Type||API||
|flink-filesystems/flink-s3-fs-base|flink-filesystems/flink-s3-fs-base/src/main/java/org/apache/flink/fs/s3/common/writer/S3RecoverableMultipartUploadFactory.java|modify|org.apache.hadoop.fs.Path.isAbsolute()|
|flink-filesystems/flink-fs-hadoop-shaded|flink-filesystems/flink-fs-hadoop-shaded/src/main/java/org/apache/hadoop/util/VersionInfo.java|modify|org.apache.hadoop.util.VersionInfo._getDate()|
|flink-filesystems/flink-fs-hadoop-shaded|flink-filesystems/flink-fs-hadoop-shaded/src/main/java/org/apache/hadoop/util/VersionInfo.java|modify|org.apache.hadoop.util.VersionInfo._getBuildVersion()|
|4|..|..|..|



 

As for false consistency, take log4j log4j jar for example. The library is 
declared in version 1.2.17 in all modules. However they are declared 
differently. As components are developed in parallel, if one single library 
version is updated, which could become inconsistent as mentioned above, may 
cause above-mentioned inconsistency issues


If you are interested, you can have a more complete and detailed report in the 
attached PDF file.
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to