Hi Zhao,
Your requirement makes sense, that would be a common usage of COMPLETENESS
cases.
You can submit a JIRA ticket for Griffin community with the description:
https://issues.apache.org/jira/browse/griffin, and then someone would pick
the ticket and do the implementation.
Thanks,
Lionel
On M
Welcome Kun to Griffin community!
Thanks,
Lionel
On 09/03/2019 10:21, Kevin Yao wrote:
Welcome Kun to join Griffin.
Thanks,
kevin
On Mon, Sep 2, 2019 at 9:28 AM Eugene Liu wrote:
> Welcome Kun to formally join Apache Griffin!
>
> Making Griffin go ahead.
>
> Thx
> Eugene
>
Great news, welcome Eric!
Thanks,
Lionel
On 08/22/2019 14:22, Eugene Liu wrote:
Welcome Eric to join!
Eugene
From: William Guo
Sent: Wednesday, August 21, 2019 6:33 PM
To: dev@griffin.apache.org
Subject: [ANNOUNCE] New Committer: Eric Wang
Hi all,
The Proje
would be thankful
if any other sinks are contributed.
Thanks
Lionel, Liu
From: 万昆
Sent: 2019年8月21日 10:07
To: dev@griffin.apache.org
Cc: d...@griffin.incubator.apache.org
Subject: Re:RE: what is the measure sink recoreds method used for
Thanks , Lionel .
1. If the result records can only be sink
Lionel, Liu
From: 万昆
Sent: 2019年8月20日 12:30
To: d...@griffin.incubator.apache.org
Subject: what is the measure sink recoreds method used for
Hi,All :
I don't know the sinkRecords method is used or not?
Can anyone help me?
Thanks
In ElasticSearchSink
def sinkRecords(records: RDD[String],
Hi,
Not sure about that, maybe a new version of Hive metastore service dependency
is required to adapt to Hive 2.3
You can create a JIRA ticket
(https://issues.apache.org/jira/projects/GRIFFIN/issues) for us, we’ll take an
investigation, thank you.
Thanks
Lionel, Liu
From
Lionel, Liu
From: jose.martin_santacruz@boehringer-ingelheim.com
Sent: 2019年8月1日 17:06
To: dev@griffin.apache.org
Subject: Metrics not stored in ElasticSearch
Hello,
We have installed Apache Griffin, but when jobs are executed from the UI we are
not able to see the results because metrics are
e the supported sink types in environment configuration, and
enable the ones you like in dq job configuration.
--
Regards,
Lionel, Liu
At 2019-07-24 20:04:08, jose.martin_santacruz@boehringer-ingelheim.com
wrote:
>Hello,
>
>Does anybody know if it is possible to work with Apache G
configured in the service, like MySQL, PostgreSQL,
etc.
The metrics, which is the calculation results of each submitted measure job
instance, is stored in Elasticsearch by default, and the sink types could also
be configured as HDFS, or any other ways you implemented for.
Thanks
Lionel, Liu
Hi,
You are right, Griffin can persist metrics to different sinks like ES and HDFS,
with the missing records in HDFS in accuracy measurements. The storage
requirement depends on your data size, metrics are always small, the missing
records might be large if the accuracy is not good, up to the
Hi Jerry,
Griffin doesn’t support email notification by default, the feature could be
integrated by leveraging Elasticsearch.
Thanks
Lionel, Liu
From: Jerry
Sent: 2019年7月15日 10:34
To: dev
Subject: About apache griffin notification feature
Hi,
I make a research for apache griffin recent
Hi Jayashree,
Griffin UI only reads the metrics from Elasticsearch by default, thus you need
to enable the es sink in griffin jobs.
--
Regards,
Lionel, Liu
At 2019-06-15 08:29:02, "Jayashree Mohanta" wrote:
>Hi,
>
>Can you please tell me how to use Griffin
, how to fetch
the data schemas?
Maybe something like schema management is what we need.
Thanks
Lionel, Liu
From: Johnnie ZHANG
Sent: 2019年6月12日 14:56
To: dev@griffin.apache.org
Subject: Re: [DISCUSS]Alternatives way to access hive metadata?
Hi All,
I think this is reasonable and it would be
Hi Qian,
Thanks for your information, we will have a look at this.
Thanks,
Lionel
On Tue, May 7, 2019 at 11:26 AM Qian Wang wrote:
> Hi,
>
> When I followed the guide of
> https://github.com/apache/griffin/blob/master/griffin-doc/docker/griffin-docker-guide.md
> to
> deploy the griffin docker
+1
I verified the following
- signature
- hash
- LICENSE
- NOTICE
- Compile and build
- third party license
Thanks
Lionel
On Mon, Apr 8, 2019 at 8:39 PM Kevin Yao wrote:
> +1
>
> I checked:
>
> - signature file and hash file correct
> - LICENSE and NOTICE look good
> - mvn apache-rat:che
Yes, that’s a good feature.
I think you can add it in all the sink types.
Thanks
Lionel, Liu
From: Nick Sokolov
Sent: 2019年2月20日 1:11
To: dev@griffin.apache.org
Subject: Re: ElasticSearchSink modification question
Sounds interesting!
Looks like it also allows to implement API to retrieve
Hi Griffin's Folks,
This is the draft of our board report, please comment if any questions.
## Description:
- Apache Griffin is an open source Data Quality solution for Big Data,
which supports both batch and streaming mode. It offers an unified process
to measure your data quality from differen
che/griffin/blob/master/measure/src/main/scala/org/apache/griffin/measure/configuration/enums/SinkType.scala
Thanks
Lionel, Liu
From: Jeff Zemerick
Sent: 2019年2月10日 23:56
To: dev@griffin.apache.org
Subject: Re: publish data to new Kafka topic
Yes, a "data filter" describes it well. I th
you data?
Thanks
Lionel, Liu
From: Jeff Zemerick
Sent: 2019年2月6日 21:41
To: dev@griffin.apache.org
Subject: publish data to new Kafka topic
Hi Griffin devs,
Continuing my email thread from users@ and to better clarify it, I have a
Kafka topic with JSON data on it. I would like to perform quality
That’s right, Griffin depends on the operations of spark sql, transferring a
data frame into another, but for the operations which could not be covered by
spark sql, some pre-defined “df-ops” could help on this.
Users can implement their own “df-ops” for such specific operations.
Thanks
Lionel
I agree with the separation of Griffin-DSL and Spark-SQL, I have some
concerns and suggestions for detail:
1. The rule in the "accuracy" example above is only part of sql, not a
complete sql, users would be confused if the "dsl.type" is set as
"spark-sql".
2. The benefits to separate Griffin-DSL an
Hi XiaoDong,
Griffin doesn’t support the alert internally at current. We recommend users
leverage the alert configuration in elasticsearch, which is the default metric
storage in Griffin.
Thanks
Lionel, Liu
From: 查晓东
Sent: 2019年1月23日 22:02
To: dev
Subject: ask for question
hi
The
gt;
> In other words, I did not set the IP of es server in /etc/hosts on the
> yarn server, so the metric save request could not reach the es server and
> the metric data was lost
>
> On 01/17/2019 16:52,Lionel Liu
> wrote:
>
> I'm not sure about it, maybe I'm
Hi DaPeng,
The parameter "checkpoint.dir" in spark configuration just works for spark
streaming calculation in streaming mode, the directory will hold the spark
checkpoint data for failure recovery of spark streaming applications.
Thanks,
Lionel
On Thu, Jan 17, 2019 at 11:42 AM 大鹏 <18210146...@1
n. It should be that I did not configure es
> service IP on yarn server, so there is no metric data in es
> On 01/15/2019 09:34,Lionel Liu
> wrote:
>
> I think you need to check the logs of livy and spark applications.
> In livy log, you can find how many jobs are submitted to spark
points, if the
job actually executed, to find the error message in the log, then we can
get more information about this case.
Thanks,
Lionel
On Mon, Jan 14, 2019 at 1:36 PM 大鹏 <18210146...@163.com> wrote:
>
> This is attachment
>
> On 01/14/2019 13:22,Lionel Liu
>
Hi DaPeng,
Griffin reads your data, execute the rule steps on the data, then persist
the metrics.
If there's any exception like data can not find or execution error, the
rule step might fail, and the following steps will not success either, the
metrics is collected after the last step, thus there
maybe it can help
you.
https://github.com/apache/griffin/pull/463
Thanks
Lionel, Liu
From: Prachi Kishore Hunnargikar
Sent: 2019年1月9日 16:22
To: dev@griffin.apache.org
Cc: Selvaraj K
Subject: RE: Griffin - how to configure authentication for
ElasticSearchentity/http request
Hello Griffin
votes.
> >
> >
> > The tally is as follows.
> > 4 binding +1s:
> > * William Guo
> > * Eugene Liu
> > * Jason Liao
> > * Lionel Liu
> >
> >
> > No 0s or -1s.
> >
> >
> > Therefore I am delighted to announ
[
https://issues.apache.org/jira/browse/GRIFFIN-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16734199#comment-16734199
]
Lionel Liu commented on GRIFFIN-223:
Hi [~dna.dgw.engg], you can send emails to
Hi Zhen,
Sorry for that, maybe there’s some issue because I sent the email via my phone
yesterday.
Thanks
Lionel, Liu
From: Zhen Li
Sent: 2019年1月4日 11:13
To: Lionel Liu
Cc: dev@griffin.apache.org
Subject: Re: griffin技术交流
Hi Lionel,
Excuse me, but I didn’t see the QR code in mail attachment
Hi Dapeng,
Thanks for your interest, we have a wechat group for griffin users, attachment
is the QR code.
Thanks,
Lionel
在2019年01月03日 11:37,大鹏 写道:
hi,我是来自神州优车架构部的工程师,最近正在打算使用griffin搭建数据质量监测平台,目前鉴于网上的资料比较少,请问有微信群或者其他交流方式吗,希望进一步学习研究一下。
[
https://issues.apache.org/jira/browse/GRIFFIN-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733043#comment-16733043
]
Lionel Liu commented on GRIFFIN-223:
That's cool to be able to submit job
[
https://issues.apache.org/jira/browse/GRIFFIN-223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lionel Liu updated GRIFFIN-223:
---
Comment: was deleted
(was: Hi [~dna.dgw.engg], I still think it is some issue about lilvy
[
https://issues.apache.org/jira/browse/GRIFFIN-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733023#comment-16733023
]
Lionel Liu commented on GRIFFIN-223:
Hi [~dna.dgw.engg], I still think it is
+1
I've checked:
- CHANGES.txt updated
- source-release.zip and pom files are listed
- no md5 or sha1 files
- LICENSE file is good
- NOTICE file is good
- signature file is good
- hash file is good
- licenses in file header check success
- source compile success
- third-party licenses are good
PS
[
https://issues.apache.org/jira/browse/GRIFFIN-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16729637#comment-16729637
]
Lionel Liu commented on GRIFFIN-223:
Hi [~dna.dgw.engg], for the livy error, w
[
https://issues.apache.org/jira/browse/GRIFFIN-224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lionel Liu resolved GRIFFIN-224.
Resolution: Resolved
> No JSON file was found in docker contai
[
https://issues.apache.org/jira/browse/GRIFFIN-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722426#comment-16722426
]
Lionel Liu commented on GRIFFIN-224:
Yes, the file copy happens in the star
[
https://issues.apache.org/jira/browse/GRIFFIN-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722359#comment-16722359
]
Lionel Liu commented on GRIFFIN-224:
There should be the json files on hdfs,
LGTM
Thanks,
Lionel
On Sun, Dec 9, 2018 at 12:25 PM William Guo wrote:
> Hi All,
>
> Here is the draft of the board report this month (We need to submit it
> to the board before 12 Dec). Please let me know if any thing is
> missing or you have any questions about it.
>
> ## Description:
> - Ap
nd after that job got succeeded.
>> There are 23 columns in table and both source and target has same number
>> of columns.
>>
>> But If I use data with all the columns, it’s failing.
>>
>> Thanks,
>> Dhiren
>>
>> From: Lionel Liu
>> Sent
Hi Dhiren,
Seems like the resources are not your limit.
I doubt that there might be some data skew in your data, you can find this by
monitoring the spark job running status.
If there’s data skew, there’re several ways to fix it, you can also google for
it.
Thanks
Lionel, Liu
From: Dhiren
Hi Dhiren,
How many resources are your using when submit this job?
For large scale data, the simple solution is to use more resources for the
calculation job.
For limited resources, a common solution is to partition your data by date
or hour, to make it smaller in each partition, then you can calc
+1
I've checked:
- incubating in the name of release files
- CHANGES.txt exists
- signature verified
- checksum good
- LICENSE, NOTICE, DISCLAIMER are good
- apache-rat:check SUCCESS for licenses in the head of each source file
- source complie SUCCESS
- Third-Party Licenses are good
Since Griffi
[
https://issues.apache.org/jira/browse/GRIFFIN-215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lionel Liu reassigned GRIFFIN-215:
--
Assignee: Yuqin Xuan
> Accuracy Rate in measure-detail.component.html maybe wr
Cool, we're stepping to a new journey. Congratulations to every
contributor! Let's make it better.
Thanks,
Lionel
On Fri, Nov 23, 2018 at 2:39 AM William Guo wrote:
> Hi all,
>
> As you all probably are aware, this week ASF's board
> of directors passed a resolution to establish a Griffin TLP.
Great news, it's a big step. Let's make it better.
Thanks,
Lionel
On Thu, Nov 22, 2018 at 4:53 PM William Guo wrote:
> Hi all.
>
> Status updated.
>
> Thanks,
> William
>
> -- Forwarded message -
> From: Phil Steitz
> Date: Fri, Nov 23, 2018 at 1:03 AM
> Subject: ASF Board Meet
48 matches
Mail list logo