Hi lucas

Thanks for the detailed feedback, that's really useful!

I did suggest Github but my colleague asked for an email

You raise a good point with the grammar, sure I will rephrase it. I am more
than happy to merge in the PR if you send it

Th at said I know you can make BDD tests using any framework but I am a
lazy developer and would rather use the framework or library defaults to
make it easier for other devs to pick up.

The number of rows is only a start correct, we can add more tests to check
the transformed version but I was going to point that out on the future
part of the series since this one is mainly about raw extracts.

Thank you very much for the feedback and I will be sure to add it once I
have more feedback

Maybe we can create a gist of all this or even a tiny book on best
practices if people find it useful

Looking forward to the PR!


On Sat, 29 Apr 2017 at 06:36, lucas.g...@gmail.com <lucas.g...@gmail.com>

> Awesome, thanks.
> Just reading your post
> A few observations:
> 1) You're giving out Marius's email: "I have been lucky enough to
> build this pipeline with the amazing Marius Feteanu".  A linked or
> github link might be more helpful.
> 2) "If you are in Pyspark world sadly Holden’s test base wont work so
> I suggest you check out Pytest and pytest-bdd.".  doesn't read well to
> me, on first read I was wondering if Spark-Test-Base wasn't available
> in python... It took me about 20 seconds to figure out that you
> probably meant it doesn't allow for direct BDD semantics.  My 2nd
> observation here is that BDD semantics can be aped in any given
> testing framework.  You just need to be flexible :)
> 3) You're doing a transformation (IE JSON input against a JSON
> schema).  You are testing for # of rows which is a good start.  But I
> don't think that really exercises a test against your JSON schema. I
> tend to view schema as the things that need the most rigorous testing
> (it's code after all).  IE I would want to confirm that the output
> matches the expected shape and values after being loaded against the
> schema.
> I saw a few minor spelling and grammatical issues as well.  I put a PR
> into your blog for them.  I won't be offended if you squish it :)
> I should be getting into our testing 'how-to' stuff this week.  I'll
> scrape our org specific stuff and put it up to github this week as
> well.  It'll be in python so maybe we'll get both use cases covered
> with examples :)
> G
> On 27 April 2017 at 03:46, Sam Elamin <hussam.ela...@gmail.com> wrote:
> > Hi
> >
> > @Lucas I certainly would love to write an integration testing library for
> > workflows, I have a few ideas I would love to share with others and they
> are
> > focused around Airflow since that is what we use
> >
> >
> > As promised here is the first blog post in a series of posts I hope to
> write
> > on how we build data pipelines
> >
> > Please feel free to retweet my original tweet and share because the more
> > ideas we have the better!
> >
> > Feedback is always welcome!
> >
> > Regards
> > Sam
> >
> > On Tue, Apr 25, 2017 at 10:32 PM, lucas.g...@gmail.com
> > <lucas.g...@gmail.com> wrote:
> >>
> >> Hi all, whoever (Sam I think) was going to do some work on doing a
> >> template testing pipeline.  I'd love to be involved, I have a current
> task
> >> in my day job (data engineer) to flesh out our testing how-to / best
> >> practices for Spark jobs and I think I'll be doing something very
> similar
> >> for the next week or 2.
> >>
> >> I'll scrape out what i have now in the next day or so and put it up in a
> >> gist that I can share too.
> >>
> >> G
> >>
> >> On 25 April 2017 at 13:04, Holden Karau <hol...@pigscanfly.ca> wrote:
> >>>
> >>> Urgh hangouts did something frustrating, updated link
> >>> https://hangouts.google.com/hangouts/_/ha6kusycp5fvzei2trhay4uhhqe
> >>>
> >>> On Mon, Apr 24, 2017 at 12:13 AM, Holden Karau <hol...@pigscanfly.ca>
> >>> wrote:
> >>>>
> >>>> The (tentative) link for those interested is
> >>>> https://hangouts.google.com/hangouts/_/oyjvcnffejcjhi6qazf3lysypue .
> >>>>
> >>>> On Mon, Apr 24, 2017 at 12:02 AM, Holden Karau <hol...@pigscanfly.ca>
> >>>> wrote:
> >>>>>
> >>>>> So 14 people have said they are available on Tuesday the 25th at 1PM
> >>>>> pacific so we will do this meeting then (
> >>>>> https://doodle.com/poll/69y6yab4pyf7u8bn ).
> >>>>>
> >>>>> Since hangouts tends to work ok on the Linux distro I'm running my
> >>>>> default is to host this as a "hangouts-on-air" unless there are
> alternative
> >>>>> ideas.
> >>>>>
> >>>>> I'll record the hangout and if it isn't terrible I'll post it for
> those
> >>>>> who weren't able to make it (and for next time I'll include more
> European
> >>>>> friendly time options - Doodle wouldn't let me update it once
> posted).
> >>>>>
> >>>>> On Fri, Apr 14, 2017 at 11:17 AM, Holden Karau <hol...@pigscanfly.ca
> >
> >>>>> wrote:
> >>>>>>
> >>>>>> Hi Spark Users (+ Some Spark Testing Devs on BCC),
> >>>>>>
> >>>>>> Awhile back on one of the many threads about testing in Spark there
> >>>>>> was some interest in having a chat about the state of Spark testing
> and what
> >>>>>> people want/need.
> >>>>>>
> >>>>>> So if you are interested in joining an online (with maybe an IRL
> >>>>>> component if enough people are SF based) chat about Spark testing
> please
> >>>>>> fill out this doodle - https://doodle.com/poll/69y6yab4pyf7u8bn
> >>>>>>
> >>>>>> I think reasonable topics of discussion could be:
> >>>>>>
> >>>>>> 1) What is the state of the different Spark testing libraries in the
> >>>>>> different core (Scala, Python, R, Java) and extended languages (C#,
> >>>>>> Javascript, etc.)?
> >>>>>> 2) How do we make these more easily discovered by users?
> >>>>>> 3) What are people looking for in their testing libraries that we
> are
> >>>>>> missing? (can be functionality, documentation, etc.)
> >>>>>> 4) Are there any examples of well tested open source Spark projects
> >>>>>> and where are they?
> >>>>>>
> >>>>>> If you have other topics that's awesome.
> >>>>>>
> >>>>>> To clarify this about libraries and best practices for people
> testing
> >>>>>> their Spark applications, and less about testing Spark's internals
> (although
> >>>>>> as illustrated by some of the libraries there is some strong
> overlap in what
> >>>>>> is required to make that work).
> >>>>>>
> >>>>>> Cheers,
> >>>>>>
> >>>>>> Holden :)
> >>>>>>
> >>>>>> --
> >>>>>> Cell : 425-233-8271
> >>>>>> Twitter: https://twitter.com/holdenkarau
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Cell : 425-233-8271
> >>>>> Twitter: https://twitter.com/holdenkarau
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Cell : 425-233-8271
> >>>> Twitter: https://twitter.com/holdenkarau
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Cell : 425-233-8271
> >>> Twitter: https://twitter.com/holdenkarau
> >>
> >>
> >

Reply via email to