subject:"Reading as Parquet a directory created by Spark Structured Streaming \- problems"

Re: Reading as Parquet a directory created by Spark Structured Streaming - problems

2019-01-11 Thread Phillip Henry

Hi, Denis. It should be a String. Even if it looks like a number when you do hadoop fs -ls ..., it's a String representation of a date/time. Phillip On Thu, Jan 10, 2019 at 2:00 PM ddebarbieux wrote: > cala> spark.read.schema(StructType(Seq(StructField("_1",StringType,false), >

Re: Reading as Parquet a directory created by Spark Structured Streaming - problems

2019-01-10 Thread ddebarbieux

cala> spark.read.schema(StructType(Seq(StructField("_1",StringType,false), StructField("_2",StringType,true.parque ("hdfs://---/MY_DIRECTORY/*_1=201812030900*").show() +++ | _1| _2| +++ |null|ba1ca2dc033440125...| |null|ba1ca2dc033440125...|

Reading as Parquet a directory created by Spark Structured Streaming - problems

2019-01-09 Thread Phillip Henry

Hi, I write a stream of (String, String) tuples to HDFS partitioned by the first ("_1") member of the pair. Everything looks great when I list the directory via "hadoop fs -ls ...". However, when I try to read all the data as a single dataframe, I get unexpected results (see below). I notice