Re: [DISCUSS] How to improve community activity?

2024-01-19 Thread Nick Sokolov
Do we roughly know current adoption numbers / artifact downloads, and understand what would improve it? I feel community activity (both conversations and contributions) are in strong relationship with usefulness of the project, and if there are any adoption blockers we should identify those and fix

Re: Apache Griffin connection to back end

2019-08-08 Thread Nick Sokolov
There is support for data source plugins. You can implement one and supply as extra jar in Livy config - you just need to implement interface and return dataframe create in plugin. On Thu, Aug 8, 2019, 15:24 Qian Wang wrote: > Hi Rajiv, > > What kind of data source do you want to use? You can im

Re: Griffin dashboard

2019-04-16 Thread Nick Sokolov
Unfortunately no, but it's possible to use third-party tools working with Elasticsearch for that, for example Grafana. On Tue, Apr 16, 2019, 17:50 zhaorongsheng wrote: > Dear all, > Can the griffin dashboad show profiling result? > > Thanks!

Documentation for merge process

2019-03-13 Thread Nick Sokolov
Hi! Question to committers: is there any documentation for 1) approval process (how many approvals PR should get to be merged), and 2) what is the process to merge a PR? There is a bunch of PRs that I think is ready to merge, I'd like to follow right process for them. Specifically: - https://

Re: Anamoly detection in Griffin

2019-03-12 Thread Nick Sokolov
>From git history in Griffin repository, it does not look like it was ever fully there, at least in part of observable git history in repository. However it should be possible to hook available anomaly detection libraries to Griffin via some custom DSL impl. On Mon, Mar 11, 2019, 15:39 Vissapragad

Re: Output in Griffin.

2019-03-09 Thread Nick Sokolov
Hi! As far as I understand, you want to save records from the job. To do that, you can use "record" output provided in "out" section of the rule . For spark-sql profiling, everything returned by

Re: Apache Griffin咨询

2019-03-09 Thread Nick Sokolov
For 1) and 2), alerting based on metrics and profiling graphs, Grafana works pretty good on my experience. Problem in 4) can be solved in several ways: - "predicates": it is possible to configure data source with predicate logic, which will be checked before job gets started. However there are fe

Re: 咨询问题

2019-03-09 Thread Nick Sokolov
I think, if there is clear use case, it makes sense to document that in Jira. For 1), it sounds like it requires some "state" to be preserved between job runs, something which is not available directly. This theoretically can be done with elasticsearch input reading latest previous result of the j

Re: JobController extension

2019-03-01 Thread Nick Sokolov
Getting particular entity by id is very natural part of any API, makes total sense to me. Together with GRIFFIN-229 it would allow to wait for job completion, unlocking lots of potential use cases. On Fri, Mar 1, 2019 at 7:01 AM Dmitry Ershov wrote: > Hi all, > > in addition to task Griffin-229

Re: ElasticSearchSink modification question

2019-02-19 Thread Nick Sokolov
Sounds interesting! Looks like it also allows to implement API to retrieve metric values for particular job instance (by correlating results using this id). On Tue, Feb 19, 2019 at 8:45 AM Dmitry Ershov wrote: > Hi all, > > We are considering to use Griffin in our project and facing with a probl

Re: Measure creation with DSL Type as "DF-OPS"

2019-02-07 Thread Nick Sokolov
I did not see any documentation on it, but from source code, it is doing some pre-defined transformation based on "rule" parameter (from_json, clear, accuracy), with in.dataframe.name as input and out.dataframe.name as output. Transformations themselves are defined in DataFrameOps.scala

Re: Simplify Griffin-DSL implementation

2019-01-29 Thread Nick Sokolov
I think we need to maintain backward compatibility or provide easy (automated?) migration -- otherwise existing users will be stuck in older versions. On Tue, Jan 29, 2019 at 2:28 PM William Guo wrote: > Thanks Grant. > > I agree Griffin-DSL should leverage spark-sql for sql part , and > Griffin

Re: No partition columns under Hive Table

2019-01-19 Thread Nick Sokolov
Hello Zhen, If question is about visibility of such column on UI for measure creation, this is very likely a bug in partition column handling, when API is reporting list of columns. In either case, regardless of what UI is showing, you can create measures on top of such columns, by composing respe

Re: api problem

2019-01-11 Thread Nick Sokolov
What's your request body? Easiest way would be probably trace it to message text returned. One unobvious API aspect - all enums should be capitalized (for example, "PROFILING" instead of "profiling"). On Fri, Jan 11, 2019, 11:59 Eugene Liu wrote: > Do you know why 40111 is returned after a job

Re: Griffin Integration With AWS S3

2019-01-09 Thread Nick Sokolov
Small side note: if you are storing data as hive tables backed by S3, and if you are running on cluster already configured to work with your S3 buckets, no additional changes will be needed. I'm using Griffin on GCP in similar situatio, everything worked out of the box. On Wed, Jan 9, 2019, 10:23

Re: MySQL data source statistics

2019-01-07 Thread Nick Sokolov
There is no built-in JDBC source, but there is plugin mechanism for data sources, so you can implement a custom one. Another alternative would be to use third-party hive-to-jdbc adapter, like this: https://docs.qubole.com/en/latest/user-guide/hive/hive-connectors/JDBC-connector.html , to use stand

Re: [DISCUSS] hive server2 vs hive metastore

2018-12-08 Thread Nick Sokolov
Is that just for Service or for Measure as well? As far as understand, Spark relies on direct metastore availability, and there is no good way around it for Measure. If that's just for Service, it might be tricky to parse out all column metadata correctly, as there might be differences in the way

Re: Accuracy measure fails on large dataset

2018-12-04 Thread Nick Sokolov
Error looks like Spark is trying to do broadcast join, but data size for broadcast is too large. It makes sense to try adjusting spark properties in Griffin to disable auto broadcast join or adjusting broadcast join threshold. On Mon, Dec 3, 2018, 9:20 AM Dhiren Sangani wrote: > Hi Lionel, > > T