HI Noam, It seems incompatible libraries are in your dependencies. Would you share pom.xml (or build.gradle) that produced the NoSuchMethodError?
On Wed, Feb 12, 2020 at 12:47 AM Gershi, Noam <noam.ger...@citi.com> wrote: > > I used HCatalogIO: > > > > Map<String, String> configProperties = new HashMap<>(); > configProperties.put("hive.metastore.uris", "...."); > > pipeline.apply(HCatalogIO.read() > .withConfigProperties(configProperties) > .withDatabase(DB_NAME) > .withTable(TEST_TABLE)) > .apply("to-string", MapElements.via(new SimpleFunction<HCatRecord, > String>() { > @Override > public String apply(HCatRecord input) { > return input.toString(); > } > })) > .apply(TextIO.write().to("my-logs/output.txt").withoutSharding()); > > > > But I am getting this error: > > > > 20/02/11 07:49:24 INFO spark.SparkContext: Starting job: collect at > BoundedDataset.java:93 > Exception in thread "dag-scheduler-event-loop" java.lang.NoSuchMethodError: > org.apache.hive.hcatalog.common.HCatUtil.getHiveMetastoreClient(Lorg/apache/hadoop/hive/conf/HiveConf;)Lorg/apache/hadoop/hive/metastore/IMetaStoreClient; > at > org.apache.beam.sdk.io.hcatalog.HCatalogUtils.createMetaStoreClient(HCatalogUtils.java:42) > at > org.apache.beam.sdk.io.hcatalog.HCatalogIO$BoundedHCatalogSource.getEstimatedSizeBytes(HCatalogIO.java:323) > at > org.apache.beam.runners.spark.io.SourceRDD$Bounded.getPartitions(SourceRDD.java:104) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:94) > at > org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:87) > at > org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:240) > at > org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:238) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.dependencies(RDD.scala:238) > at > org.apache.spark.scheduler.DAGScheduler.getShuffleDependencies(DAGScheduler.scala:512) > at > org.apache.spark.scheduler.DAGScheduler.getOrCreateParentStages(DAGScheduler.scala:461) > at > org.apache.spark.scheduler.DAGScheduler.createResultStage(DAGScheduler.scala:448) > at > org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:962) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2067) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) > 20/02/11 07:52:54 INFO util.JsonImpl: Shutting down HTTP client > > > > > > From: Gershi, Noam [ICG-IT] > Sent: Tuesday, February 11, 2020 11:08 AM > To: user@beam.apache.org > Subject: RE: Apache Beam with Hive > > > > Thanx. > > Looks like Hcatalog could work. > > But - is the an example with ‘SELECT’ query? > > > > JdbcIO probably not good to me, since my spark cluster in already configured > to work with Hive. So – When I am code Spark pipelines, I can write queries > without the need to give the user/password. – I would like to have something > similar in Apache Beam. > > > > From: [gmail.com] utkarsh.kh...@gmail.com <utkarsh.kh...@gmail.com> > Sent: Tuesday, February 11, 2020 10:26 AM > To: user@beam.apache.org > Subject: Re: Apache Beam with Hive > > > > > > While I've not created Hive tables using this .. reading and writing worked > very well using HCatalogIO. > > https://beam.apache.org/releases/javadoc/2.18.0/org/apache/beam/sdk/io/hcatalog/HCatalogIO.html > > > > > > > > On Tue, Feb 11, 2020 at 11:38 AM Jean-Baptiste Onofre <j...@nanthrax.net> > wrote: > > Hi, > > > > If you are ok to use Hive via JDBC, you can use JdbcIO and see the example in > the javadoc: > > > > https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L82 > > > > Regards > > JB > > > > Le 11 févr. 2020 à 07:03, Gershi, Noam <noam.ger...@citi.com> a écrit : > > > > Hi, > > > > I am searching for a detailed example how to use Apache Beam with Hive and/or > Hcatalog? > > Creating tables, inserting data and fetching it… > > > > > > Noam Gershi Software Developer > > ICG Technology – TLV Lab > > T: +972 (3) 7405718 > > > > <image002.png> > > -- Regards, Tomo