date:20181015

SparkSQL read Hive transactional table

2018-10-15 Thread daily

Hi, I use HCatalog Streaming Mutation API to write data to hive transactional table, and then, I use SparkSQL to read data from the hive transactional table. I get the right result. However, SparkSQL uses more time to read hive orc bucket transactional table, beacause SparkSQL

Re: Timestamp Difference/operations

2018-10-15 Thread Srabasti Banerjee

Hi Paras,Check out the link Spark Scala: DateDiff of two columns by hour or minute | | | | || | | | | | Spark Scala: DateDiff of two columns by hour or minute I have two timestamp columns in a dataframe that I'd like to get the minute difference of, or

Re: overcommit: cpus / vcores

2018-10-15 Thread Peter Liu

Hi Khaled, I have attached the spark streaming config below in (a). In case of the 100vcore run (see the initial email), I used 50 executors where each executor has 2 vcores and 3g memory. For 70 vcore case, 35 executors, for 80 vcore case, 40 executors. In the yarn config (yarn-site.xml, (b)

Re: [VOTE] SPARK 2.4.0 (RC3)

2018-10-15 Thread Imran Rashid

I just discovered https://issues.apache.org/jira/browse/SPARK-25738 with some more testing. I only marked it as critical, but seems pretty bad -- I'll defer to others opinion On Sat, Oct 13, 2018 at 4:15 PM Dongjoon Hyun wrote: > Yes. From my side, it's -1 for RC3. > > Bests, > Dongjoon. > >

Re: overcommit: cpus / vcores

2018-10-15 Thread Khaled Zaouk

Hi Peter, What parameters are you putting in your spark streaming configuration? What are you putting as number of executor instances and how many cores per executor are you setting in your Spark job? Best, Khaled On Mon, Oct 15, 2018 at 9:18 PM Peter Liu wrote: > Hi there, > > I have a

re: overcommit: cpus / vcores

2018-10-15 Thread Peter Liu

Hi there, I have a system with 80 vcores and a relatively light spark streaming workload. Overcomming the vcore resource (i.e. > 80) in the config (see (a) below) seems to help to improve the average spark batch time (see (b) below). Is there any best practice guideline on resource overcommit

Re: [Events] Events not fired for SaveAsTextFile (?)

2018-10-15 Thread Bolke de Bruin

Hi Fokko Spark fires it off for many other things. It does so for ML pipelines and it does make information available for data frames. We use S3 in this case I just simplified the example. It is important to know what process took what action. Only spark knows this and it does supply this

Re: [Events] Events not fired for SaveAsTextFile (?)

2018-10-15 Thread Driesprong, Fokko

Hi Bolke, I would argue that Spark is not the right level of abstraction of doing this. I would create a wrapper around the particular filesystem: http://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html Therefore you can write a wrapper around the LocalFileSystem if data

[Events] Events not fired for SaveAsTextFile (?)

2018-10-15 Thread Bolke de Bruin

Hi, Apologies upfront if this should have gone to user@ but it seems a developer question so here goes. We are trying to improve a listener to track lineage across our platform. This requires tracking where data comes from and where it goes to. E.g. sc.setLogLevel("INFO"); val data =

Re: Coalesce behaviour

2018-10-15 Thread Koert Kuipers

i realize it is unlikely all data will be local to tasks, so placement will not be optimal and there will be some network traffic, but is this the same as a shuffle? in CoalesceRDD it shows a NarrowDependency, which i thought meant it could be implemented without a shuffle. On Mon, Oct 15, 2018

Re: Coalesce behaviour

2018-10-15 Thread Jörn Franke

This is not fully correct. If you have less files then you need to move some data to some other nodes, because not all the data is there for writing (even the case for the same node, but then it is easier from a network perspective). Hence a shuffling is needed. > Am 15.10.2018 um 05:04

Re: Timestamp Difference/operations

2018-10-15 Thread Paras Agarwal

Thanks John, Actually need full date and time difference not just date difference, which I guess not supported. Let me know if its possible, or any UDF available for the same. Thanks And Regards, Paras From: John Zhuge Sent: Friday, October 12, 2018

SparkSQL read Hive transactional table

Re: Timestamp Difference/operations

Re: overcommit: cpus / vcores

Re: [VOTE] SPARK 2.4.0 (RC3)

Re: overcommit: cpus / vcores

re: overcommit: cpus / vcores

Re: [Events] Events not fired for SaveAsTextFile (?)

Re: [Events] Events not fired for SaveAsTextFile (?)

[Events] Events not fired for SaveAsTextFile (?)

Re: Coalesce behaviour

Re: Coalesce behaviour

Re: Timestamp Difference/operations

12 matches

Site Navigation

Mail list logo

Footer information