date:20181008

Re: Coalesce behaviour

2018-10-08 Thread Koert Kuipers

although i personally would describe this as a bug the answer will be that this is the intended behavior. the coalesce "infects" the shuffle before it, making a coalesce useless for reducing output files after a shuffle with many partitions b design. your only option left is a repartition for

Re: Random sampling in tests

2018-10-08 Thread Dongjoon Hyun

Sean's approach looks much better to me ( https://github.com/apache/spark/pull/22672) It achieves both contradictory goals simultaneously; keeping all test coverages and reducing the time from 2:31 to 0:24. Since we can remove test coverages anytime, can we proceed with Sean's non-intrusive

Re: [DISCUSS][K8S] Local dependencies with Kubernetes

2018-10-08 Thread Matt Cheah

Relying on kubectl exec may not be the best solution because clusters with locked down security will not grant users permissions to execute arbitrary code in pods. I can’t think of a great alternative right now but I wanted to bring this to our attention for the time being. -Matt Cheah

Re: [DISCUSS][K8S] Local dependencies with Kubernetes

2018-10-08 Thread Rob Vesse

Well yes. However the submission client is already able to monitor the driver pod status so can see when it is up and running. And couldn’t we potentially modify the K8S entry points e.g. KubernetesClientApplication that run inside the driver pods to wait for dependencies to be uploaded?

Re: [DISCUSS][K8S] Local dependencies with Kubernetes

2018-10-08 Thread Yinan Li

> You can do this manually yourself via kubectl cp so it should be possible to programmatically do this since it looks like this is just a tar piped into a kubectl exec. This would keep the relevant logic in the Kubernetes specific client which may/may not be desirable depending on whether we’re

Re: [DISCUSS][K8S] Local dependencies with Kubernetes

2018-10-08 Thread Marcelo Vanzin

On Mon, Oct 8, 2018 at 6:36 AM Rob Vesse wrote: > Since connectivity back to the client is a potential stumbling block for > cluster mode I wander if it would be better to think in reverse i.e. rather > than having the driver pull from the client have the client push to the > driver pod? > >

Re: Random sampling in tests

2018-10-08 Thread Xiao Li

Yes. Testing all the timezones is not needed. Xiao On Mon, Oct 8, 2018 at 8:36 AM Maxim Gekk wrote: > Hi All, > > I believe we should also take into account what we test, for example, I > don't think it makes sense to check all timezones for JSON/CSV > functions/datasources because those

DataSourceV2 documentation & tutorial

2018-10-08 Thread assaf.mendelson

Hi all, I have been working on a legacy datasource integration with data source V2 for the last couple of week including upgrading it to the Spark 2.4.0 RC. During this process I wrote a tutorial with explanation on how to create a new datasource (it can be found in

Re: Random sampling in tests

2018-10-08 Thread Maxim Gekk

Hi All, I believe we should also take into account what we test, for example, I don't think it makes sense to check all timezones for JSON/CSV functions/datasources because those timezones are just passed to external libraries. So, the same code is involved into testing of each out of 650

Re: Random sampling in tests

2018-10-08 Thread Sean Owen

If the problem is simply reducing the wall-clock time of tests, then even before we get to this question, I'm advocating: 1) try simple parallelization of tests within the suite. In this instance there's no reason not to test these in parallel and get a 8x or 16x speedup from cores. This assumes,

Re: Random sampling in tests

2018-10-08 Thread Marco Gaido

Yes, I see. It makes sense. Thanks. Il giorno lun 8 ott 2018 alle ore 16:35 Reynold Xin ha scritto: > Marco - the issue is to reproduce. It is much more annoying for somebody > else who might not have touched this test case to be able to reproduce the > error, just given a timezone. It is much

Re: Random sampling in tests

2018-10-08 Thread Reynold Xin

Marco - the issue is to reproduce. It is much more annoying for somebody else who might not have touched this test case to be able to reproduce the error, just given a timezone. It is much easier to just follow some documentation saying "please run TEST_SEED=5 build/sbt ~ ". On Mon, Oct 8,

Re: Random sampling in tests

2018-10-08 Thread Marco Gaido

Hi all, thanks for bringing up the topic Sean. I agree too with Reynold's idea, but in the specific case, if there is an error the timezone is part of the error message. So we know exactly which timezone caused the failure. Hence I thought that logging the seed is not necessary, as we can

Re: Random sampling in tests

2018-10-08 Thread Xiao Li

For this specific case, I do not think we should test all the timezone. If this is fast, I am fine to leave it unchanged. However, this is very slow. Thus, I even prefer to reducing the tested timezone to a smaller number or just hardcoding some specific time zones. In general, I like Reynold’s

Re: [DISCUSS][K8S] Local dependencies with Kubernetes

2018-10-08 Thread Rob Vesse

Folks, thanks for all the great input. Responding to various points raised: Marcelo/Yinan/Felix – Yes, client mode will work. The main JAR will be automatically distributed and --jars/--files specified dependencies are also distributed though for --files user code needs to use the

Random sampling in tests

2018-10-08 Thread Sean Owen

Recently, I've seen 3 pull requests that try to speed up a test suite that tests a bunch of cases by randomly choosing different subsets of cases to test on each Jenkins run. There's disagreement about whether this is good approach to improving test runtime. Here's a discussion on one that was

Re: Coalesce behaviour

Re: Random sampling in tests

Re: [DISCUSS][K8S] Local dependencies with Kubernetes

Re: [DISCUSS][K8S] Local dependencies with Kubernetes

Re: [DISCUSS][K8S] Local dependencies with Kubernetes

Re: [DISCUSS][K8S] Local dependencies with Kubernetes

Re: Random sampling in tests

DataSourceV2 documentation & tutorial

Re: Random sampling in tests

Re: Random sampling in tests

Re: Random sampling in tests

Re: Random sampling in tests

Re: Random sampling in tests

Re: Random sampling in tests

Re: [DISCUSS][K8S] Local dependencies with Kubernetes

Random sampling in tests

16 matches

Site Navigation

Mail list logo

Footer information