Hi Will,
have you tried using S3 as state store with the option in EMR enabled for
faster file sync, also there is an option now of using FSx Lustre.
Thanks and Regards,
Gourav Sengupta
On Wed, Jan 15, 2020 at 5:17 AM William Briggs wrote:
> Hi all, I've got a problem that really has me stumpe
Hi all, I've got a problem that really has me stumped. I'm running a
Structured Streaming query that reads from Kafka, performs some
transformations and stateful aggregations (using flatMapGroupsWithState),
and outputs any updated aggregates to another Kafka topic.
I'm running this job using Spark
Unsubscribe
It only makes sense if the underlying file is also splittable, and even
then, it doesn't really do anything for you if you don't explicitly tell
spark about the split boundaries
On Tue, Jan 14, 2020 at 7:36 PM Someshwar Kale wrote:
> I would suggest to use other compression technique which is sp
I would suggest to use other compression technique which is splittable for
eg. Bzip2, lzo, lz4.
On Wed, Jan 15, 2020, 1:32 AM Enrico Minack wrote:
> Hi,
>
> Spark does not support 7z natively, but you can read any file in Spark:
>
> def read(stream: PortableDataStream): Iterator[String] = {
> S
Hi,
Spark does not support 7z natively, but you can read any file in Spark:
def read(stream: PortableDataStream):Iterator[String] =
{Seq(stream.getPath()).iterator }
spark.sparkContext
.binaryFiles("*.7z")
.flatMap(file => read(file._2))
.toDF("path")
.show(false)
This scales with the
Regards
Sanjiv Singh
Mob : +1 571-599-5236
Hello everyone!
I try to get data from DB2 table which columns have names with
non-ascii (cyrillic) symbols, but I get from JDBC-driver error with
"SQLCODE=-206" (object-name IS NOT VALID IN THE CONTEXT WHERE IT IS
USED) and SQLERRMC consists of name of this column and the added parts
";N*.N*" lik
Regards
Sanjiv Singh
Mob : +1 571-599-5236