I see the Beam dependency 2.15 is a bit outdated. Would you try 2.19.0? On Thu, Feb 13, 2020 at 3:52 AM Gershi, Noam <noam.ger...@citi.com> wrote: > > Hi > Pom attached. > > -----Original Message----- > From: [google.com] Tomo Suzuki <suzt...@google.com> > Sent: Wednesday, February 12, 2020 4:56 PM > To: user@beam.apache.org > Subject: Re: Apache Beam with Hive > > HI Noam, > > It seems incompatible libraries are in your dependencies. > Would you share pom.xml (or build.gradle) that produced the NoSuchMethodError? > > On Wed, Feb 12, 2020 at 12:47 AM Gershi, Noam <noam.ger...@citi.com> wrote: > > > > I used HCatalogIO: > > > > > > > > Map<String, String> configProperties = new HashMap<>(); > > configProperties.put("hive.metastore.uris", "...."); > > > > pipeline.apply(HCatalogIO.read() > > .withConfigProperties(configProperties) > > .withDatabase(DB_NAME) > > .withTable(TEST_TABLE)) > > .apply("to-string", MapElements.via(new SimpleFunction<HCatRecord, > > String>() { > > @Override > > public String apply(HCatRecord input) { > > return input.toString(); > > } > > })) > > > > .apply(TextIO.write().to("my-logs/output.txt").withoutSharding()); > > > > > > > > But I am getting this error: > > > > > > > > 20/02/11 07:49:24 INFO spark.SparkContext: Starting job: collect at > > https://urldefense.proofpoint.com/v2/url?u=http-3A__BoundedDataset.java&d=DwIFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=weSBkYYkcVBTPEmCbP7YQPAjEPSbBXyqPYqK6vLTOAM&m=AgJosEDem12NXaKJZhxwD2jn-v5Tq7A1LD8RDrntO4o&s=xMTsprUk9mFvSvOcJWFCDcrzfvSEetFK1HgqSYnGbMQ&e= > > :93 Exception in thread "dag-scheduler-event-loop" > > java.lang.NoSuchMethodError: > > org.apache.hive.hcatalog.common.HCatUtil.getHiveMetastoreClient(Lorg/apache/hadoop/hive/conf/HiveConf;)Lorg/apache/hadoop/hive/metastore/IMetaStoreClient; > > at > > org.apache.beam.sdk.io.hcatalog.HCatalogUtils.createMetaStoreClient(https://urldefense.proofpoint.com/v2/url?u=http-3A__HCatalogUtils.java&d=DwIFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=weSBkYYkcVBTPEmCbP7YQPAjEPSbBXyqPYqK6vLTOAM&m=AgJosEDem12NXaKJZhxwD2jn-v5Tq7A1LD8RDrntO4o&s=uGrTnwLNA2_Ujs7ChnrVtTm-npQ2Myuu-NNK0yu66MA&e= > > :42) > > at > > org.apache.beam.sdk.io.hcatalog.HCatalogIO$BoundedHCatalogSource.getEstimatedSizeBytes(https://urldefense.proofpoint.com/v2/url?u=http-3A__HCatalogIO.java&d=DwIFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=weSBkYYkcVBTPEmCbP7YQPAjEPSbBXyqPYqK6vLTOAM&m=AgJosEDem12NXaKJZhxwD2jn-v5Tq7A1LD8RDrntO4o&s=9JEggr2DRm4S-qJMbmhlBZNciSLsMMbiSSvh8StIOHE&e= > > :323) > > at > > org.apache.beam.runners.spark.io.SourceRDD$Bounded.getPartitions(https://urldefense.proofpoint.com/v2/url?u=http-3A__SourceRDD.java&d=DwIFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=weSBkYYkcVBTPEmCbP7YQPAjEPSbBXyqPYqK6vLTOAM&m=AgJosEDem12NXaKJZhxwD2jn-v5Tq7A1LD8RDrntO4o&s=4tyzNpdlBvQTL8Zz27JmOkcXgAjDmLj2pWhpKuMFLZc&e= > > :104) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > > at scala.Option.getOrElse(Option.scala:121) > > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > > at > > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > > at scala.Option.getOrElse(Option.scala:121) > > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > > at > > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > > at scala.Option.getOrElse(Option.scala:121) > > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > > at > > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > > at scala.Option.getOrElse(Option.scala:121) > > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > > at > > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > > at scala.Option.getOrElse(Option.scala:121) > > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > > at > > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > > at scala.Option.getOrElse(Option.scala:121) > > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > > at > > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > > at scala.Option.getOrElse(Option.scala:121) > > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > > at > > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > > at scala.Option.getOrElse(Option.scala:121) > > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > > at > > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > > at scala.Option.getOrElse(Option.scala:121) > > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > > at > > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > > at scala.Option.getOrElse(Option.scala:121) > > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > > at > > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > > at scala.Option.getOrElse(Option.scala:121) > > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > > at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:94) > > at > > org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:87) > > at > > org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:240) > > at > > org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:238) > > at scala.Option.getOrElse(Option.scala:121) > > at org.apache.spark.rdd.RDD.dependencies(RDD.scala:238) > > at > > org.apache.spark.scheduler.DAGScheduler.getShuffleDependencies(DAGScheduler.scala:512) > > at > > org.apache.spark.scheduler.DAGScheduler.getOrCreateParentStages(DAGScheduler.scala:461) > > at > > org.apache.spark.scheduler.DAGScheduler.createResultStage(DAGScheduler.scala:448) > > at > > org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:962) > > at > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2067) > > at > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059) > > at > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048) > > at > > org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) > > 20/02/11 07:52:54 INFO util.JsonImpl: Shutting down HTTP client > > > > > > > > > > > > From: Gershi, Noam [ICG-IT] > > Sent: Tuesday, February 11, 2020 11:08 AM > > To: user@beam.apache.org > > Subject: RE: Apache Beam with Hive > > > > > > > > Thanx. > > > > Looks like Hcatalog could work. > > > > But - is the an example with ‘SELECT’ query? > > > > > > > > JdbcIO probably not good to me, since my spark cluster in already > > configured to work with Hive. So – When I am code Spark pipelines, I can > > write queries without the need to give the user/password. – I would like to > > have something similar in Apache Beam. > > > > > > > > From: > > [https://urldefense.proofpoint.com/v2/url?u=http-3A__gmail.com&d=DwIFa > > Q&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=weSBkYYkcVBTPEmCbP7YQPAjEPSbBXyqPYqK6vLTO > > AM&m=AgJosEDem12NXaKJZhxwD2jn-v5Tq7A1LD8RDrntO4o&s=IjORTrY1VF_Wi1G1pPT > > B4tTaYHMwaHFG1KATqPmvkfw&e= ] utkarsh.kh...@gmail.com > > <utkarsh.kh...@gmail.com> > > Sent: Tuesday, February 11, 2020 10:26 AM > > To: user@beam.apache.org > > Subject: Re: Apache Beam with Hive > > > > > > > > > > > > While I've not created Hive tables using this .. reading and writing worked > > very well using HCatalogIO. > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__beam.apache.org_r > > eleases_javadoc_2.18.0_org_apache_beam_sdk_io_hcatalog_HCatalogIO.html > > &d=DwIFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=weSBkYYkcVBTPEmCbP7YQPAjEPSbBXyqP > > YqK6vLTOAM&m=AgJosEDem12NXaKJZhxwD2jn-v5Tq7A1LD8RDrntO4o&s=c9eNd6p0ihI > > DovylFf8P8OjjZ2j6hcZgDnIO_0CR_8c&e= > > > > > > > > > > > > > > > > On Tue, Feb 11, 2020 at 11:38 AM Jean-Baptiste Onofre <j...@nanthrax.net> > > wrote: > > > > Hi, > > > > > > > > If you are ok to use Hive via JDBC, you can use JdbcIO and see the example > > in the javadoc: > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache > > _beam_blob_master_sdks_java_io_jdbc_src_main_java_org_apache_beam_sdk_ > > io_jdbc_JdbcIO.java-23L82&d=DwIFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=weSBkYYk > > cVBTPEmCbP7YQPAjEPSbBXyqPYqK6vLTOAM&m=AgJosEDem12NXaKJZhxwD2jn-v5Tq7A1 > > LD8RDrntO4o&s=D563ULZXnhz13d4Ro61dMXYNpPVHUP4VupjBeoMwZz4&e= > > > > > > > > Regards > > > > JB > > > > > > > > Le 11 févr. 2020 à 07:03, Gershi, Noam <noam.ger...@citi.com> a écrit : > > > > > > > > Hi, > > > > > > > > I am searching for a detailed example how to use Apache Beam with Hive > > and/or Hcatalog? > > > > Creating tables, inserting data and fetching it… > > > > > > > > > > > > Noam Gershi Software Developer > > > > ICG Technology – TLV Lab > > > > T: +972 (3) 7405718 > > > > > > > > <image002.png> > > > > > > > > -- > Regards, > Tomo
-- Regards, Tomo