unsubscribe

On 1/29/19, 3:11 PM, "Grant" <grant.xu...@gmail.com> wrote:

    We could have a SQL syntax checker using the existing parser logic,

    Once it detects the SQL expression with the DSL type "griffin-dsl", it
    could take the following steps
    1. attempt to delegate the execution of the rule to "spark-sql" type
    directly. Whether the execution is successful or not, run the step 2
    2. notify the user to use "spark-sql" in the future

    We only keep the checker in the distribution only for several
    releases(say,2 or 3). And then we remove it.

    Another thing I am thinking is we should consider to support UDF provided
    by the end users.

    On Tue, Jan 29, 2019 at 5:35 PM Nick Sokolov <chemika...@gmail.com> wrote:

    > I think we need to maintain backward compatibility or provide easy
    > (automated?) migration -- otherwise existing users will be stuck in older
    > versions.
    >
    > On Tue, Jan 29, 2019 at 2:28 PM William Guo <gu...@apache.org> wrote:
    >
    > > Thanks Grant.
    > >
    > > I agree Griffin-DSL should leverage spark-sql for sql part , and
    > > Griffin-DSL should work as DQ layer to assemble different dimensions as
    > > MLlib does.
    > > Since we already have some experiences for data quality domain, it is 
now
    > > for Griffin-DSL to evolve to next level.
    > >
    > > Thanks,
    > > William
    > >
    > >
    > > On Wed, Jan 30, 2019 at 5:48 AM Grant <grant.xu...@gmail.com> wrote:
    > >
    > > > Hi all,
    > > >
    > > > I would suggest simplifying Griffin-DSL.
    > > >
    > > > Currently, Griffin supports three types of DSL: spark-sql, griffin-dsl
    > > and
    > > > df-ops respectively. In this proposal, I only focus on the first two.
    > > >
    > > > Griffin-DSL is a SQL-like language, supporst a wide range of clauses,
    > key
    > > > words, operators etc as Spark SQL. class "GriffinDslParser" also
    > defines
    > > > how to parse the SQL-like syntax. Actually, Griffin-DSL's SQL-like
    > syntax
    > > > could be covered by Spark SQL completely. Spark 2.0 substantially
    > > improved
    > > > SQL functionalities with SQL2003 support and can now run all 99 TPC-DS
    > > > queries.
    > > >
    > > > So is it possible for Griffin-DSL to remove all SQL-like language
    > > features?
    > > > All rules, which could be expressed by SQL, would be categorized as
    > > > "spark-sql" DSL type instead of "griffin-dsl". In this case, we could
    > > > simplify the implementation of Griffin-DSL.
    > > >
    > > > For my understanding, Griffin-DSL should be the high-order 
expressions,
    > > > each of them represents a specific set of semantics. Griffin-DSL
    > > continues
    > > > focusing on the expressions with the richer semantics in data
    > exploration
    > > > or wrangling area, and leaves all SQL compatible expressions to Spark
    > > SQL.
    > > > Griffin-DSL is still translated into Spark-SQL when being executed.
    > > >
    > > > here is an example from the unit test 
"_accuracy-batch-griffindsl.json"
    > > >
    > > > "evaluate.rule": {
    > > >     "rules": [
    > > >       {
    > > >         "dsl.type": "griffin-dsl",
    > > >         "dq.type": "accuracy",
    > > >         "out.dataframe.name": "accu",
    > > >         "rule": "source.user_id = target.user_id AND
    > > > upper(source.first_name) = upper(target.first_name) AND
    > source.last_name
    > > =
    > > > target.last_name AND source.address = target.address AND source.email 
=
    > > > target.email AND source.phone = target.phone AND source.post_code =
    > > > target.post_code",
    > > >         "details": {
    > > >           "source": "source",
    > > >           "target": "target",
    > > >           "miss": "miss_count",
    > > >           "total": "total_count",
    > > >           "matched": "matched_count"
    > > >         },
    > > >         "out":[
    > > >           {
    > > >             "type": "record",
    > > >             "name": "missRecords"
    > > >           }
    > > >         ]
    > > >       }
    > > >     ]
    > > >   }
    > > >
    > > >   If we move SQL-like syntax out of Griffin-DSL, the preceding example
    > > will
    > > > take "dsl.type" as "spark-sql", and "rule" would be probably a list of
    > > > columns or all columns by default.
    > > >
    > > >   Discussions are welcomed.
    > > >
    > > > Grant
    > > >
    > >
    >


This message is confidential, intended only for the named recipient(s) and may 
contain information that is privileged or exempt from disclosure under 
applicable law. If you are not the intended recipient(s), you are notified that 
the dissemination, distribution, or copying of this message is strictly 
prohibited. If you receive this message in error or are not the named 
recipient(s), please notify the sender by return email and delete this message. 
Thank you.

Reply via email to