Re: new feature for griffin COMPLETENESS dq type

2019-09-10 Thread Lionel Liu
Hi Zhao, Your requirement makes sense, that would be a common usage of COMPLETENESS cases. You can submit a JIRA ticket for Griffin community with the description: https://issues.apache.org/jira/browse/griffin, and then someone would pick the ticket and do the implementation. Thanks, Lionel On M

Re: [ANNOUNCE] New Committer: Wan Kun

2019-09-02 Thread Lionel Liu
Welcome Kun to Griffin community! Thanks, Lionel On 09/03/2019 10:21, Kevin Yao wrote: Welcome Kun to join Griffin. Thanks, kevin On Mon, Sep 2, 2019 at 9:28 AM Eugene Liu wrote: > Welcome Kun to formally join Apache Griffin! > > Making Griffin go ahead. > > Thx > Eugene >

Re: [ANNOUNCE] New Committer: Eric Wang

2019-08-22 Thread Lionel Liu
Great news, welcome Eric! Thanks, Lionel On 08/22/2019 14:22, Eugene Liu wrote: Welcome Eric to join! Eugene From: William Guo Sent: Wednesday, August 21, 2019 6:33 PM To: dev@griffin.apache.org Subject: [ANNOUNCE] New Committer: Eric Wang Hi all, The Proje

RE: RE: what is the measure sink recoreds method used for

2019-08-21 Thread Lionel, Liu
would be thankful if any other sinks are contributed. Thanks Lionel, Liu From: 万昆 Sent: 2019年8月21日 10:07 To: dev@griffin.apache.org Cc: d...@griffin.incubator.apache.org Subject: Re:RE: what is the measure sink recoreds method used for Thanks , Lionel . 1. If the result records can only be sink

RE: what is the measure sink recoreds method used for

2019-08-20 Thread Lionel, Liu
Lionel, Liu From: 万昆 Sent: 2019年8月20日 12:30 To: d...@griffin.incubator.apache.org Subject: what is the measure sink recoreds method used for Hi,All : I don't know the sinkRecords method is used or not? Can anyone help me? Thanks In ElasticSearchSink def sinkRecords(records: RDD[String],

RE: Apache Griffin compatible with Hive 2.3.3?

2019-08-16 Thread Lionel, Liu
Hi, Not sure about that, maybe a new version of Hive metastore service dependency is required to adapt to Hive 2.3 You can create a JIRA ticket (https://issues.apache.org/jira/projects/GRIFFIN/issues) for us, we’ll take an investigation, thank you. Thanks Lionel, Liu From

RE: Metrics not stored in ElasticSearch

2019-08-01 Thread Lionel, Liu
Lionel, Liu From: jose.martin_santacruz@boehringer-ingelheim.com Sent: 2019年8月1日 17:06 To: dev@griffin.apache.org Subject: Metrics not stored in ElasticSearch Hello, We have installed Apache Griffin, but when jobs are executed from the UI we are not able to see the results because metrics are

Re:Apache Griffin without ElasticSearch

2019-07-24 Thread Lionel Liu
e the supported sink types in environment configuration, and enable the ones you like in dq job configuration. -- Regards, Lionel, Liu At 2019-07-24 20:04:08, jose.martin_santacruz@boehringer-ingelheim.com wrote: >Hello, > >Does anybody know if it is possible to work with Apache G

RE: Griffin storage requirements

2019-07-19 Thread Lionel, Liu
configured in the service, like MySQL, PostgreSQL, etc. The metrics, which is the calculation results of each submitted measure job instance, is stored in Elasticsearch by default, and the sink types could also be configured as HDFS, or any other ways you implemented for. Thanks Lionel, Liu

Re: Apache Griffin storage requirements

2019-07-17 Thread Lionel Liu
Hi, You are right, Griffin can persist metrics to different sinks like ES and HDFS, with the missing records in HDFS in accuracy measurements. The storage requirement depends on your data size, metrics are always small, the missing records might be large if the accuracy is not good, up to the

RE: About apache griffin notification feature

2019-07-15 Thread Lionel, Liu
Hi Jerry, Griffin doesn’t support email notification by default, the feature could be integrated by leveraging Elasticsearch. Thanks Lionel, Liu From: Jerry Sent: 2019年7月15日 10:34 To: dev Subject: About apache griffin notification feature Hi, I make a research for apache griffin recent

Re:Apache Griffin UI

2019-06-14 Thread Lionel Liu
Hi Jayashree, Griffin UI only reads the metrics from Elasticsearch by default, thus you need to enable the es sink in griffin jobs. -- Regards, Lionel, Liu At 2019-06-15 08:29:02, "Jayashree Mohanta" wrote: >Hi, > >Can you please tell me how to use Griffin

RE: [DISCUSS]Alternatives way to access hive metadata?

2019-06-13 Thread Lionel, Liu
, how to fetch the data schemas? Maybe something like schema management is what we need. Thanks Lionel, Liu From: Johnnie ZHANG Sent: 2019年6月12日 14:56 To: dev@griffin.apache.org Subject: Re: [DISCUSS]Alternatives way to access hive metadata? Hi All, I think this is reasonable and it would be

Re: Griffin docker image cannot start on Ubuntu

2019-05-07 Thread Lionel Liu
Hi Qian, Thanks for your information, we will have a look at this. Thanks, Lionel On Tue, May 7, 2019 at 11:26 AM Qian Wang wrote: > Hi, > > When I followed the guide of > https://github.com/apache/griffin/blob/master/griffin-doc/docker/griffin-docker-guide.md > to > deploy the griffin docker

Re: [VOTE] Release of Apache Griffin-0.5.0

2019-04-09 Thread Lionel Liu
+1 I verified the following - signature - hash - LICENSE - NOTICE - Compile and build - third party license Thanks Lionel On Mon, Apr 8, 2019 at 8:39 PM Kevin Yao wrote: > +1 > > I checked: > > - signature file and hash file correct > - LICENSE and NOTICE look good > - mvn apache-rat:che

RE: ElasticSearchSink modification question

2019-02-20 Thread Lionel, Liu
Yes, that’s a good feature. I think you can add it in all the sink types. Thanks Lionel, Liu From: Nick Sokolov Sent: 2019年2月20日 1:11 To: dev@griffin.apache.org Subject: Re: ElasticSearchSink modification question Sounds interesting! Looks like it also allows to implement API to retrieve

Board Report of Griffin in February 2019

2019-02-12 Thread Lionel Liu
Hi Griffin's Folks, This is the draft of our board report, please comment if any questions. ## Description: - Apache Griffin is an open source Data Quality solution for Big Data, which supports both batch and streaming mode. It offers an unified process to measure your data quality from differen

RE: publish data to new Kafka topic

2019-02-10 Thread Lionel, Liu
che/griffin/blob/master/measure/src/main/scala/org/apache/griffin/measure/configuration/enums/SinkType.scala Thanks Lionel, Liu From: Jeff Zemerick Sent: 2019年2月10日 23:56 To: dev@griffin.apache.org Subject: Re: publish data to new Kafka topic Yes, a "data filter" describes it well. I th

RE: publish data to new Kafka topic

2019-02-10 Thread Lionel, Liu
you data? Thanks Lionel, Liu From: Jeff Zemerick Sent: 2019年2月6日 21:41 To: dev@griffin.apache.org Subject: publish data to new Kafka topic Hi Griffin devs, Continuing my email thread from users@ and to better clarify it, I have a Kafka topic with JSON data on it. I would like to perform quality

RE: Measure creation with DSL Type as "DF-OPS"

2019-02-10 Thread Lionel, Liu
That’s right, Griffin depends on the operations of spark sql, transferring a data frame into another, but for the operations which could not be covered by spark sql, some pre-defined “df-ops” could help on this. Users can implement their own “df-ops” for such specific operations. Thanks Lionel

Re: Simplify Griffin-DSL implementation

2019-01-29 Thread Lionel Liu
I agree with the separation of Griffin-DSL and Spark-SQL, I have some concerns and suggestions for detail: 1. The rule in the "accuracy" example above is only part of sql, not a complete sql, users would be confused if the "dsl.type" is set as "spark-sql". 2. The benefits to separate Griffin-DSL an

RE: ask for question

2019-01-23 Thread Lionel, Liu
Hi XiaoDong, Griffin doesn’t support the alert internally at current. We recommend users leverage the alert configuration in elasticsearch, which is the default metric storage in Griffin. Thanks Lionel, Liu From: 查晓东 Sent: 2019年1月23日 22:02 To: dev Subject: ask for question hi The

Re: The ES metric data loss problem

2019-01-21 Thread Lionel Liu
gt; > In other words, I did not set the IP of es server in /etc/hosts on the > yarn server, so the metric save request could not reach the es server and > the metric data was lost > > On 01/17/2019 16:52,Lionel Liu > wrote: > > I'm not sure about it, maybe I'm

Re: What's the purpose of the checkpoint.dir parameter

2019-01-17 Thread Lionel Liu
Hi DaPeng, The parameter "checkpoint.dir" in spark configuration just works for spark streaming calculation in streaming mode, the directory will hold the spark checkpoint data for failure recovery of spark streaming applications. Thanks, Lionel On Thu, Jan 17, 2019 at 11:42 AM 大鹏 <18210146...@1

Re: The ES metric data loss problem

2019-01-17 Thread Lionel Liu
n. It should be that I did not configure es > service IP on yarn server, so there is no metric data in es > On 01/15/2019 09:34,Lionel Liu > wrote: > > I think you need to check the logs of livy and spark applications. > In livy log, you can find how many jobs are submitted to spark

Re: The ES metric data loss problem

2019-01-14 Thread Lionel Liu
points, if the job actually executed, to find the error message in the log, then we can get more information about this case. Thanks, Lionel On Mon, Jan 14, 2019 at 1:36 PM 大鹏 <18210146...@163.com> wrote: > > This is attachment > > On 01/14/2019 13:22,Lionel Liu >

Re: The ES metric data loss problem

2019-01-13 Thread Lionel Liu
Hi DaPeng, Griffin reads your data, execute the rule steps on the data, then persist the metrics. If there's any exception like data can not find or execution error, the rule step might fail, and the following steps will not success either, the metrics is collected after the last step, thus there

RE: Griffin - how to configure authentication for ElasticSearchentity/http request

2019-01-09 Thread Lionel, Liu
maybe it can help you. https://github.com/apache/griffin/pull/463 Thanks Lionel, Liu From: Prachi Kishore Hunnargikar Sent: 2019年1月9日 16:22 To: dev@griffin.apache.org Cc: Selvaraj K Subject: RE: Griffin - how to configure authentication for ElasticSearchentity/http request Hello Griffin

Re: [RESULT][VOTE] Release Apache Griffin 0.4.0

2019-01-04 Thread Lionel Liu
votes. > > > > > > The tally is as follows. > > 4 binding +1s: > > * William Guo > > * Eugene Liu > > * Jason Liao > > * Lionel Liu > > > > > > No 0s or -1s. > > > > > > Therefore I am delighted to announ

[jira] [Commented] (GRIFFIN-223) "Post to livy error. 401 Credentials missing or Auth is not Basic"

2019-01-04 Thread Lionel Liu (JIRA)
[ https://issues.apache.org/jira/browse/GRIFFIN-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16734199#comment-16734199 ] Lionel Liu commented on GRIFFIN-223: Hi [~dna.dgw.engg], you can send emails to

RE: griffin技术交流

2019-01-04 Thread Lionel, Liu
Hi Zhen, Sorry for that, maybe there’s some issue because I sent the email via my phone yesterday. Thanks Lionel, Liu From: Zhen Li Sent: 2019年1月4日 11:13 To: Lionel Liu Cc: dev@griffin.apache.org Subject: Re: griffin技术交流 Hi Lionel, Excuse me, but I didn’t see the QR code in mail attachment

回复:griffin技术交流

2019-01-03 Thread Lionel Liu
Hi Dapeng, Thanks for your interest, we have a wechat group for griffin users, attachment is the QR code. Thanks, Lionel 在2019年01月03日 11:37,大鹏 写道: hi,我是来自神州优车架构部的工程师,最近正在打算使用griffin搭建数据质量监测平台,目前鉴于网上的资料比较少,请问有微信群或者其他交流方式吗,希望进一步学习研究一下。

[jira] [Commented] (GRIFFIN-223) "Post to livy error. 401 Credentials missing or Auth is not Basic"

2019-01-03 Thread Lionel Liu (JIRA)
[ https://issues.apache.org/jira/browse/GRIFFIN-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733043#comment-16733043 ] Lionel Liu commented on GRIFFIN-223: That's cool to be able to submit job

[jira] [Issue Comment Deleted] (GRIFFIN-223) "Post to livy error. 401 Credentials missing or Auth is not Basic"

2019-01-03 Thread Lionel Liu (JIRA)
[ https://issues.apache.org/jira/browse/GRIFFIN-223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lionel Liu updated GRIFFIN-223: --- Comment: was deleted (was: Hi [~dna.dgw.engg], I still think it is some issue about lilvy

[jira] [Commented] (GRIFFIN-223) "Post to livy error. 401 Credentials missing or Auth is not Basic"

2019-01-03 Thread Lionel Liu (JIRA)
[ https://issues.apache.org/jira/browse/GRIFFIN-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733023#comment-16733023 ] Lionel Liu commented on GRIFFIN-223: Hi [~dna.dgw.engg], I still think it is

Re: [VOTE] Release of Apache Griffin-0.4.0

2018-12-28 Thread Lionel Liu
+1 I've checked: - CHANGES.txt updated - source-release.zip and pom files are listed - no md5 or sha1 files - LICENSE file is good - NOTICE file is good - signature file is good - hash file is good - licenses in file header check success - source compile success - third-party licenses are good PS

[jira] [Commented] (GRIFFIN-223) "Post to livy error. 401 Credentials missing or Auth is not Basic"

2018-12-27 Thread Lionel Liu (JIRA)
[ https://issues.apache.org/jira/browse/GRIFFIN-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16729637#comment-16729637 ] Lionel Liu commented on GRIFFIN-223: Hi [~dna.dgw.engg], for the livy error, w

[jira] [Resolved] (GRIFFIN-224) No JSON file was found in docker container.

2018-12-16 Thread Lionel Liu (JIRA)
[ https://issues.apache.org/jira/browse/GRIFFIN-224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lionel Liu resolved GRIFFIN-224. Resolution: Resolved > No JSON file was found in docker contai

[jira] [Commented] (GRIFFIN-224) No JSON file was found in docker container.

2018-12-16 Thread Lionel Liu (JIRA)
[ https://issues.apache.org/jira/browse/GRIFFIN-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722426#comment-16722426 ] Lionel Liu commented on GRIFFIN-224: Yes, the file copy happens in the star

[jira] [Commented] (GRIFFIN-224) No JSON file was found in docker container.

2018-12-15 Thread Lionel Liu (JIRA)
[ https://issues.apache.org/jira/browse/GRIFFIN-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722359#comment-16722359 ] Lionel Liu commented on GRIFFIN-224: There should be the json files on hdfs,

Re: Board Report of Griffin

2018-12-10 Thread Lionel Liu
LGTM Thanks, Lionel On Sun, Dec 9, 2018 at 12:25 PM William Guo wrote: > Hi All, > > Here is the draft of the board report this month (We need to submit it > to the board before 12 Dec). Please let me know if any thing is > missing or you have any questions about it. > > ## Description: > - Ap

Re: Accuracy measure fails on large dataset

2018-12-04 Thread Lionel Liu
nd after that job got succeeded. >> There are 23 columns in table and both source and target has same number >> of columns. >> >> But If I use data with all the columns, it’s failing. >> >> Thanks, >> Dhiren >> >> From: Lionel Liu >> Sent

RE: Accuracy measure fails on large dataset

2018-12-04 Thread Lionel, Liu
Hi Dhiren, Seems like the resources are not your limit. I doubt that there might be some data skew in your data, you can find this by monitoring the spark job running status. If there’s data skew, there’re several ways to fix it, you can also google for it. Thanks Lionel, Liu From: Dhiren

Re: Accuracy measure fails on large dataset

2018-12-03 Thread Lionel Liu
Hi Dhiren, How many resources are your using when submit this job? For large scale data, the simple solution is to use more resources for the calculation job. For limited resources, a common solution is to partition your data by date or hour, to make it smaller in each partition, then you can calc

Re: [VOTE] Release of Apache Griffin-0.4.0-incubating [RC0]

2018-12-02 Thread Lionel Liu
+1 I've checked: - incubating in the name of release files - CHANGES.txt exists - signature verified - checksum good - LICENSE, NOTICE, DISCLAIMER are good - apache-rat:check SUCCESS for licenses in the head of each source file - source complie SUCCESS - Third-Party Licenses are good Since Griffi

[jira] [Assigned] (GRIFFIN-215) Accuracy Rate in measure-detail.component.html maybe wrong

2018-11-26 Thread Lionel Liu (JIRA)
[ https://issues.apache.org/jira/browse/GRIFFIN-215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lionel Liu reassigned GRIFFIN-215: -- Assignee: Yuqin Xuan > Accuracy Rate in measure-detail.component.html maybe wr

Re: Graduated to TLP

2018-11-23 Thread Lionel Liu
Cool, we're stepping to a new journey. Congratulations to every contributor! Let's make it better. Thanks, Lionel On Fri, Nov 23, 2018 at 2:39 AM William Guo wrote: > Hi all, > > As you all probably are aware, this week ASF's board > of directors passed a resolution to establish a Griffin TLP.

Re: ASF Board Meeting Summary - November 21, 2018

2018-11-22 Thread Lionel Liu
Great news, it's a big step. Let's make it better. Thanks, Lionel On Thu, Nov 22, 2018 at 4:53 PM William Guo wrote: > Hi all. > > Status updated. > > Thanks, > William > > -- Forwarded message - > From: Phil Steitz > Date: Fri, Nov 23, 2018 at 1:03 AM > Subject: ASF Board Meet