I am quite ignorant to the details of package management in Java (Usually write stuff in Python, but the beam SDK in Python is not to the same level the Java one is), I am troubleshooting an issue specifically on the DataflowRunner and I decided to try upgrading Beam from 2.28.0 to 2.30.0.
However, code that ran under 2.28.0 now gives a class not found exception when it attempts to write data to Parquet locally. My question is what is the expected path to know that I am going to need additional dependencies and what they are when upgrading the Beam SDK? I would assume that there is a path that does not involve googling classes that the pipeline tries to call and adding dependencies until it stops complaining. Could someone more experienced tell me what the expected path is for this? The specific error I am getting is regarding some Hadoop class for either ParquetIO or Snappy compression, but my question is more general. How do I know what packages and versions are intended to be used with the different aspects of beam extensions? [https://storage.googleapis.com/e24-email-images/e24logonotag.png]<https://www.evolve24.com> Andrew Kettmann DevOps Engineer P: 1.314.596.2836 [LinkedIn]<https://linkedin.com/company/evolve24> [Twitter] <https://twitter.com/evolve24> [Instagram] <https://www.instagram.com/evolve_24> evolve24 Confidential & Proprietary Statement: This email and any attachments are confidential and may contain information that is privileged, confidential or exempt from disclosure under applicable law. It is intended for the use of the recipients. If you are not the intended recipient, or believe that you have received this communication in error, please do not read, print, copy, retransmit, disseminate, or otherwise use the information. Please delete this email and attachments, without reading, printing, copying, forwarding or saving them, and notify the Sender immediately by reply email. No confidentiality or privilege is waived or lost by any transmission in error.
