此外,还有我发现Parquet格式是可以的,顺便看了下FlinkStreamConnector种,FileSink的ParquetBulkFomart。
然后文档讲到ParquetAvroWriters,这种格式写的文件对应hive表怎么创建?貌似默认stored as
parquet的话,不带任何avro的信息呀。

赵一旦 <hinobl...@gmail.com> 于2021年1月24日周日 上午6:45写道:

> 补充(1)FlinkSQL的查询,对于分区字符串字段貌似必须加'',不加就查询不到?如上hour=02这种直接导致no more split。
> 其次(2)去除这部分问题后,目前可以发现有split了,但是报了orc相关的错误。并且提交SQL会导致JM直接失败。JM日志如下:
>
> 2021-01-24 04:41:24,952 ERROR 
> org.apache.flink.runtime.util.FatalExitExceptionHandler      [] - FATAL: 
> Thread 'flink-akka.actor.default-dispatcher-2' produced an uncaught 
> exception. Stopping the process...
>
> java.util.concurrent.CompletionException: 
> org.apache.flink.util.FlinkRuntimeException: Failed to start the operator 
> coordinators
>         at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
>  ~[?:1.8.0_251]
>         at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
>  ~[?:1.8.0_251]
>         at 
> java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:722) 
> ~[?:1.8.0_251]
>         at 
> java.util.concurrent.CompletableFuture.uniRunStage(CompletableFuture.java:731)
>  ~[?:1.8.0_251]
>         at 
> java.util.concurrent.CompletableFuture.thenRun(CompletableFuture.java:2023) 
> ~[?:1.8.0_251]
>         at 
> org.apache.flink.runtime.jobmaster.JobMaster.resetAndStartScheduler(JobMaster.java:935)
>  ~[flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> org.apache.flink.runtime.jobmaster.JobMaster.startJobExecution(JobMaster.java:801)
>  ~[flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> org.apache.flink.runtime.jobmaster.JobMaster.lambda$start$1(JobMaster.java:357)
>  ~[flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleCallAsync(AkkaRpcActor.java:383)
>  ~[flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:199)
>  ~[flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:88)
>  ~[flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154)
>  ~[flink-dists-extended_2.11-1.12.0.jar:?]
>         at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
> [flink-dists-extended_2.11-1.12.0.jar:?]
>         at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
> [flink-dists-extended_2.11-1.12.0.jar:?]
>         at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
> [flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
> [flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
> [flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dists-extended_2.11-1.12.0.jar:?]
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:517) 
> [flink-dists-extended_2.11-1.12.0.jar:?]
>         at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) 
> [flink-dists-extended_2.11-1.12.0.jar:?]
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) 
> [flink-dists-extended_2.11-1.12.0.jar:?]
>         at akka.actor.ActorCell.invoke(ActorCell.scala:561) 
> [flink-dists-extended_2.11-1.12.0.jar:?]
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) 
> [flink-dists-extended_2.11-1.12.0.jar:?]
>         at akka.dispatch.Mailbox.run(Mailbox.scala:225) 
> [flink-dists-extended_2.11-1.12.0.jar:?]
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:235) 
> [flink-dists-extended_2.11-1.12.0.jar:?]
>         at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
> [flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
> [flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
> [flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>  [flink-dists-extended_2.11-1.12.0.jar:?]
> Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to start the 
> operator coordinators
>         at 
> org.apache.flink.runtime.scheduler.SchedulerBase.startAllOperatorCoordinators(SchedulerBase.java:1100)
>  ~[flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> org.apache.flink.runtime.scheduler.SchedulerBase.startScheduling(SchedulerBase.java:567)
>  ~[flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> org.apache.flink.runtime.jobmaster.JobMaster.startScheduling(JobMaster.java:944)
>  ~[flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:719) 
> ~[?:1.8.0_251]
>         ... 27 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
>         at 
> org.apache.hadoop.hive.common.ValidReadTxnList.readFromString(ValidReadTxnList.java:142)
>  ~[?:?]
>         at 
> org.apache.hadoop.hive.common.ValidReadTxnList.<init>(ValidReadTxnList.java:57)
>  ~[?:?]
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$Context.<init>(OrcInputFormat.java:421)
>  ~[?:?]
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:983)
>  ~[?:?]
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048)
>  ~[?:?]
>         at 
> org.apache.flink.connectors.hive.HiveSourceFileEnumerator.createInputSplits(HiveSourceFileEnumerator.java:86)
>  ~[?:?]
>         at 
> org.apache.flink.connectors.hive.HiveSourceFileEnumerator.enumerateSplits(HiveSourceFileEnumerator.java:57)
>  ~[?:?]
>         at 
> org.apache.flink.connector.file.src.AbstractFileSource.createEnumerator(AbstractFileSource.java:140)
>  ~[flink-table_2.11-1.12.0.jar:1.12.0]
>         at 
> org.apache.flink.connectors.hive.HiveSource.createEnumerator(HiveSource.java:115)
>  ~[?:?]
>         at 
> org.apache.flink.runtime.source.coordinator.SourceCoordinator.start(SourceCoordinator.java:119)
>  ~[flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator$DeferrableCoordinator.applyCall(RecreateOnResetOperatorCoordinator.java:308)
>  ~[flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator.start(RecreateOnResetOperatorCoordinator.java:72)
>  ~[flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.start(OperatorCoordinatorHolder.java:182)
>  ~[flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> org.apache.flink.runtime.scheduler.SchedulerBase.startAllOperatorCoordinators(SchedulerBase.java:1094)
>  ~[flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> org.apache.flink.runtime.scheduler.SchedulerBase.startScheduling(SchedulerBase.java:567)
>  ~[flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> org.apache.flink.runtime.jobmaster.JobMaster.startScheduling(JobMaster.java:944)
>  ~[flink-dists-extended_2.11-1.12.0.jar:?]
>         at 
> java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:719) 
> ~[?:1.8.0_251]
>         ... 27 more
> 2021-01-24 04:41:24,963 INFO  org.apache.flink.runtime.blob.BlobServer        
>              [] - Stopped BLOB server at 0.0.0.0:13146
>
>
>

回复