from:"fmcquillan99"

[GitHub] madlib issue #342: Minibatch Preprocessor for Deep learning

2018-12-19 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/342 https://issues.apache.org/jira/browse/MADLIB-1290 associated JIRA ---

[GitHub] madlib issue #329: Release/prep 1.15.1

2018-10-03 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/329 Comments on release notes: (1) MADLIB-1171 does not apply to AO tables, maybe a typo in this release note: "Build: Disable AppendOnly if available (MADLIB-1171, MADLIB

[GitHub] madlib issue #321: RF: Increase the dataset size of dev-check test

2018-09-21 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/321 in that case LGTM ---

[GitHub] madlib issue #321: RF: Increase the dataset size of dev-check test

2018-09-21 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/321 Does this fix the sporadic IC/DC issues that we have been seeing with RF? ---

[GitHub] madlib issue #319: Allocator: Remove 16-byte alignment for pointers in GP6

2018-09-13 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/319 ok, thx LGTM ---

[GitHub] madlib issue #319: Allocator: Remove 16-byte alignment for pointers in GP6

2018-09-13 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/319 @rahiyer doesn't this line mean vectorization is turned off for PG 9.x+ too? `#if PG_VERSION_NUM >= 9` ---

[GitHub] madlib issue #319: Allocator: Remove 16-byte alignment for pointers in GP6

2018-09-12 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/319 So this PR only affects GP 6+ ? It means that GP 4.3.x, GP 5 and all supported PG versions will continue to work as is, and use Eigen vectorization, if the underlying infra supports it? ---

[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/315 re-running the failed test, seems to pass now: ``` SELECT * FROM knn_result_list_neighbors ORDER BY id; ``` produces ``` id | data | k_nearest_neighbours

[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/315 Actually the earlier issue above ^^^ is OK, where I said `I'm not sure what this is doing` because forcing all training data to be a single point means that the distance to all test poin

[GitHub] madlib issue #317: Fixed trailing whitespace in many sql_in files

2018-09-07 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/317 then let's merge it ---

[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/315 (1) expression for test data array: ``` DROP TABLE IF EXISTS knn_result_classification; SELECT * FROM madlib.knn( 'knn_train_data', -

[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/315 I'm not sure what this is doing: ``` %%sql DROP TABLE IF EXISTS knn_result_classification; SELECT * FROM madlib.knn( 'knn_train_data',

[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/315 load data: ``` DROP TABLE IF EXISTS knn_train_data; CREATE TABLE knn_train_data ( id integer, data integer

[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...

2018-09-05 Thread fmcquillan99

Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/315#discussion_r215462116 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -53,22 +55,12 @@ def knn_validate_src(schema_madlib, point_source, point_column_name

[GitHub] madlib issue #317: Fixed trailing whitespace in many sql_in files

2018-09-05 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/317 Please add a hook to remove trailing white spaces in `*.sql_in` files automatically. I have changed my sublime settings to remove trailing white spaces, but other people may edit docs

[GitHub] madlib issue #313: MLP: Simplify momentum and Nesterov updates

2018-09-04 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/313 is this ready to merge? ---

[GitHub] madlib issue #314: Ubuntu support: Enable creation of gppkg on Ubuntu

2018-08-27 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/314 Thanks @njayaram2 for the clarification. ---

[GitHub] madlib issue #314: Ubuntu support: Enable creation of gppkg on Ubuntu

2018-08-27 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/314 So this requires Alien, but we do not automatically download or bundle Alien, correct? ---

[GitHub] madlib pull request #:

2018-08-17 Thread fmcquillan99

Github user fmcquillan99 commented on the pull request: https://github.com/apache/madlib/commit/5e707f745c50343dd7395a3e8f86c04428210977#commitcomment-30142753 Also fixed some spacing issues ---

[GitHub] madlib issue #308: Release: Release Notes for v1.15

2018-08-06 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/308 LGTM ---

[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-08-01 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/295 LGTM, here is an RF example: ``` SELECT * FROM mt_imp_output ORDER BY am, oob_var_importance DESC; am | feature | oob_var_importance | impurity_var_importance

[GitHub] madlib issue #291: Feature: Vector-Column Transformations

2018-07-31 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 thanks, that makes sense. I added a type casting example to the user docs. LGTM ---

[GitHub] madlib issue #291: Feature: Vector-Column Transformations

2018-07-31 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 Wondering about order for varchar and text casting. For this data set: ``` DROP TABLE IF EXISTS golf CASCADE; CREATE TABLE golf ( id int, "OUTLOOK&quo

[GitHub] madlib issue #291: Feature: Vector-Column Transformations

2018-07-31 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 Where did we land on the boolean casting issue? Testing on Greenplum 5, I see: ``` (psycopg2.ProgrammingError) plpy.SPIError: ARRAY types boolean and text cannot be matched

[GitHub] madlib pull request #298: misc 1.15 user doc updates

2018-07-25 Thread fmcquillan99

Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/298#discussion_r205310071 --- Diff: doc/mainpage.dox.in --- @@ -100,13 +86,14 @@ complete matrix stored as a distributed table. @defgroup grp_matrix Matrix

[GitHub] madlib issue #298: misc 1.15 user doc updates

2018-07-25 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/298 This should be ready to merge if if looks OK. I don't have any other 1.15 doc related items to deliver. ---

[GitHub] madlib pull request #298: misc 1.15 user doc updates

2018-07-24 Thread fmcquillan99

GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/298 misc 1.15 user doc updates Added descriptions to left panel for modules that were missing. Fixed types and formatting in various places. Cleaned up main use doc page and removed links

[GitHub] madlib issue #291: Feature: Vector-Column Transformations

2018-07-23 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 In cols2vec and vec2cols, ordering has been fixed so new columns are always on the right of the source table columns in the output (if any). In cols2vec, casting seems OK now. I tested

[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-20 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/295 I like this last suggestion from @iyerr3, that we report raw values for oob and impurity VI in the model output file. (OK to keep the shifted oob > 0 as we do now.) For the hel

[GitHub] madlib issue #291: Feature: Vector-Column Transformations

2018-07-20 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 (1) Now I think it is casting all numeric to DOUBLE and all non-numeric to TEXT? But if all the columns are INT, should not cast them to DOUBLE, rather should create an array of INTs

[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-19 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/295 Another run I got ``` grp 0 grp1 31.01364943 31.6576 22.85881741

[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-19 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/295 Should impurity_var_importance always add up to 100? From the regression example in the user docs: ``` DROP TABLE IF EXISTS mt_imp_output; SELECT madlib.get_var_importance

[GitHub] madlib issue #291: Feature: Vector to Columns

2018-07-19 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 After the above 2 issues I mentioned are fixed, I will have 1 more commit on user docs to this PR ---

[GitHub] madlib issue #291: Feature: Vector to Columns

2018-07-19 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 In vec2cols, ``` SELECT madlib.vec2cols( 'golf', -- source table 'vec2cols_result',

[GitHub] madlib issue #291: Feature: Vector to Columns

2018-07-19 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 In cols2vec, For this table: ``` CREATE TABLE golf ( id integer NOT NULL, "OUTLOOK" text, temperature double precision, humidity double

[GitHub] madlib issue #289: RF: Add impurity variable importance

2018-07-17 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/289 ``` The model table produced by the training function contains the following columns: gid INTEGER. Group id that uniquely identifies a set of grouping column values. sample_id

[GitHub] madlib issue #291: Feature: Vector to Columns

2018-07-12 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 user docs seem incomplete ---

[GitHub] madlib issue #282: Utilites: Add CTAS while dropping some columns

2018-07-12 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/282 ah, i see. I think it is fine as you have put it. LGTM ---

[GitHub] madlib issue #282: Utilites: Add CTAS while dropping some columns

2018-07-11 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/282 looks like user docs lost the params description for dropcols() ---

[GitHub] madlib issue #282: Utilites: Add CTAS while dropping some columns

2018-07-11 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/282 There is a bit of inconsistency related to the last param `cols_to_drop` ``` SELECT madlib.dropcols( 'houses', &

[GitHub] madlib issue #287: Fix incorrect dict expansion in table header

2018-07-11 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/287 This latest commit makes the following changes to use docs: 1) clarify cv for SVM and add user examples 2) clarify cv for elastic net and fix user examples 3) correct rmse calc

[GitHub] madlib issue #288: Jira:1239: Converts features from multiple columns into a...

2018-07-06 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/288 update my comment above to remove the rows processed and skipped. ---

[GitHub] madlib issue #288: Jira:1239: Converts features from multiple columns into a...

2018-07-06 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/288 Since we are writing out a summary table, may as well add more info in it. {code} A summary table named _summary is also created at the same time, which has the following columns

[GitHub] madlib pull request #288: Jira:1239: Converts features from multiple columns...

2018-07-05 Thread fmcquillan99

Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/288#discussion_r200510366 --- Diff: src/ports/postgres/modules/cols_vec/cols2vec.py_in --- @@ -0,0 +1,110 @@ +""" +@file cols2vec.py_in + +@b

[GitHub] madlib issue #270: Jira 1172

2018-05-31 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/270 @rahiyer @iyerr3 please review this PR. thanks ---

[GitHub] madlib issue #269: Statistics: Add grouping support for correlation function...

2018-05-16 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/269 Thanks for the explanation. I pushed one additional small commit that changes the name of the module from "Pearson's Correlation" to "Covariance and Correlat

[GitHub] madlib issue #269: Statistics: Add grouping support for correlation function...

2018-05-11 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/269 Also wondering what the nature of the testing is to ensure that covariance and correlation are being calculated properly with the added groups? ---

[GitHub] madlib issue #269: Statistics: Add grouping support for correlation function...

2018-05-11 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/269 (1) ``` DROP TABLE IF EXISTS example_data_output, example_data_output_summary; SELECT madlib.correlation( 'example_data', 'exam

[GitHub] madlib issue #267: Multiple: Remove support for HAWQ from all modules

2018-05-04 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/267 There is some reference to HAWQ in https://github.com/apache/madlib/blob/master/ReadMe_Build.txt which I donât see removed in the PR. Otherwise seems OK though I did not do

[GitHub] madlib issue #265: Release: Add v1.14 release notes

2018-04-19 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/265 there are 33 JIRAs in 1.14 https://issues.apache.org/jira/projects/MADLIB/versions/12342305 but only about 26 or 27 JIRAs in listed in these release notes. If that is

[GitHub] madlib pull request #264: updated pagerank docs for PPR, minor formating and...

2018-04-17 Thread fmcquillan99

GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/264 updated pagerank docs for PPR, minor formating and such 1) minor formatting improvements 2) added reference for PPR and changed PR reference to paper and not wikipedia You can merge

[GitHub] madlib issue #257: mini-batch user docs

2018-04-17 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/257 OK done now ---

[GitHub] madlib issue #257: mini-batch user docs

2018-04-17 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/257 Main changes: 4. Clarified grouping as per https://github.com/apache/madlib/pull/263 This is final change so you can review and merge if it looks good. ---

[GitHub] madlib issue #263: Bugfix/mlp minibatch grouping

2018-04-17 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/263 I tested this quite a bit and it seems to work nicely for me. LGTM ---

[GitHub] madlib issue #257: mini-batch user docs

2018-04-15 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/257 Main changes: 1) Updated minibatch docs to show use of encoding scalar integer dep var 2) Added minibatch examples and explanations to MLP 3) Reduced the number of redundant

[GitHub] madlib issue #256: Minibatch Preprocessing: change default buffer size formu...

2018-04-10 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/256 LGTM Default selection looks reasonable: (0) data DROP TABLE IF EXISTS iris_data; CREATE TABLE iris_data( id serial, attributes numeric

[GitHub] madlib issue #255: MLP: Remove source table dependency for predicting regres...

2018-04-10 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/255 LGTM, see https://issues.apache.org/jira/browse/MADLIB-1223 for tests i ran ---

[GitHub] madlib pull request #257: mini-batch user docs

2018-04-06 Thread fmcquillan99

GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/257 mini-batch user docs This commit is for the preprocessor user docs. MLP user doc updates to follow in subsequent commit. Can someone please review this content? thx You can

[GitHub] madlib issue #256: Minibatch Preprocessing: change default buffer size formu...

2018-04-06 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/256 Oh I see, with the averaging approach: buffer_size = avg_num_rows_per_segment / num_segments = 21.5 / 2 = 10.75 and rounding up

[GitHub] madlib issue #256: Minibatch Preprocessing: change default buffer size formu...

2018-04-06 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/256 Is this expected behavior? last group for NJ gets only 1 observation ``` DROP TABLE IF EXISTS iris_data; CREATE TABLE iris_data( id serial, attributes numeric

[GitHub] madlib issue #256: Minibatch Preprocessing: change default buffer size formu...

2018-04-05 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/256 We seem to be computing batch size using master but prob should just consider num segments. ---

[GitHub] madlib issue #251: MLP: Simplify initialization of model coefficients

2018-04-04 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/251 Using the data set from http://madlib.apache.org/docs/latest/group__grp__nn.html#example the warm start seems to be functioning OK in the sense that it is picking up where it left off

[GitHub] madlib issue #250: MLP: Allow one-hot encoded dependent var for classificati...

2018-04-04 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/250 See JIRA https://issues.apache.org/jira/browse/MADLIB-1222 for examples showing this works for IGD and mini-batch LGTM I think u can go ahead and merge this PR to master ---

[GitHub] madlib pull request #252: leftover minor RF user doc update

2018-03-28 Thread fmcquillan99

GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/252 leftover minor RF user doc update A few remaining RF user doc changes I missed in https://github.com/apache/madlib/commit/7f3aae92f2d84bf7e4501ac5efec1ebfc7a80834 Also added

[GitHub] madlib issue #246: DT and RF user doc updates

2018-03-26 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/246 https://issues.apache.org/jira/browse/MADLIB-1217 https://issues.apache.org/jira/browse/MADLIB-1218 https://issues.apache.org/jira/browse/MADLIB-1219 have all been fixed so I made

[GitHub] madlib issue #249: RF: Use NULL::integer[] when no continuous features

2018-03-26 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/249 See https://issues.apache.org/jira/browse/MADLIB-1219 for results from my tests. LGTM ---

[GitHub] madlib issue #248: DT: Ensure proper quoting in grouping coalesce

2018-03-23 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/248 I checked against the examples in JIRA: MADLIB-1217 JIRA: MADLIB-1218 and both work OK for me. So from the fix to the functionality perspective, LGTM. Other

[GitHub] madlib issue #246: DT user doc updates

2018-03-22 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/246 @rahiyer RF docs ready for review too. ---

[GitHub] madlib issue #246: DT user doc updates

2018-03-22 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/246 Waiting on these bugs to be fixed before I can finish: https://issues.apache.org/jira/browse/MADLIB-1217 https://issues.apache.org/jira/browse/MADLIB-1218 https://issues.apache.org

[GitHub] madlib issue #242: PCA: Fix issue with text grouping col input

2018-03-21 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/242 LGTM, this can be merged ---

[GitHub] madlib pull request #246: DT user doc updates

2018-03-21 Thread fmcquillan99

Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/246#discussion_r176152043 --- Diff: src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in --- @@ -418,7 +468,10 @@ tree_predict(tree_model

[GitHub] madlib pull request #246: DT user doc updates

2018-03-21 Thread fmcquillan99

Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/246#discussion_r176150844 --- Diff: src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in --- @@ -127,7 +132,11 @@ tree_train( weights (optional

[GitHub] madlib pull request #246: DT user doc updates

2018-03-20 Thread fmcquillan99

GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/246 DT user doc updates @rahiyer please review DT user doc updates Will start working on RF in parallel. You can merge this pull request into a Git repository by running: $ git pull

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

2018-03-20 Thread fmcquillan99

Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/244#discussion_r175833176 --- Diff: src/ports/postgres/modules/graph/pagerank.sql_in --- @@ -120,6 +121,10 @@ distribution per group. When this value is NULL, no grouping is

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-08 Thread fmcquillan99

Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r173254469 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +545,95 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-08 Thread fmcquillan99

Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r173239594 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +545,95 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-08 Thread fmcquillan99

Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r173238804 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +545,95 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread fmcquillan99

Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r172949439 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread fmcquillan99

Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r172922334 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread fmcquillan99

Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r172921714 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread fmcquillan99

Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r172921328 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread fmcquillan99

Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r172920935 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread fmcquillan99

Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r172920825 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread fmcquillan99

Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r172920581 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -149,8 +149,10 @@ non-stratified, that is, the whole table is treated as a

[GitHub] madlib issue #238: MLP: Use array_upper to get the last array element

2018-02-22 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/238 LGTM ---

[GitHub] madlib issue #234: Create lower case column name in encode_categorical_varia...

2018-02-21 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/234 ``` DROP TABLE IF EXISTS abalone_out, abalone_out_dictionary; SELECT madlib.encode_categorical_variables ( 'abalone', -- So

[GitHub] madlib issue #234: Create lower case column name in encode_categorical_varia...

2018-02-20 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/234 was just testing 1.13 on postgres 9.6 and found this error ``` DROP TABLE IF EXISTS abalone_out, abalone_out_dictionary; SELECT madlib.encode_categorical_variables

[GitHub] madlib issue #234: Create lower case column name in encode_categorical_varia...

2018-02-16 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/234 Similarly ``` DROP TABLE IF EXISTS abalone_out, abalone_out_dictionary; SELECT madlib.encode_categorical_variables ( 'abalone', -- So

[GitHub] madlib issue #234: Create lower case column name in encode_categorical_varia...

2018-02-16 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/234 ``` DROP TABLE IF EXISTS abalone_out, abalone_out_dictionary; SELECT madlib.encode_categorical_variables ( 'abalone', -- So

[GitHub] madlib pull request #235: update KNN, DT and RF docs to match recent commits

2018-02-15 Thread fmcquillan99

Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/235#discussion_r168557191 --- Diff: src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in --- @@ -208,13 +208,26 @@ forest_train(training_table_name

[GitHub] madlib pull request #232: Multiple LDA improvements and fixes

2018-02-14 Thread fmcquillan99

Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/232#discussion_r168248554 --- Diff: src/ports/postgres/modules/lda/lda.sql_in --- @@ -182,324 +105,789 @@ lda_train( data_table, \b Arguments data_table

[GitHub] madlib pull request #235: update KNN, DT and RF docs to match recent commits

2018-02-13 Thread fmcquillan99

GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/235 update KNN, DT and RF docs to match recent commits KNN * describe weighted average in more detail DT & RF * correct some doc errors and omissions * update example to

[GitHub] madlib issue #232: Multiple LDA improvements and fixes

2018-02-08 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/232 Functional test of these 4 commits seem fine to me. I added comments and examples in: MADLIB-1160 MADLIB-1201 Will create a PR for associated user doc changes shortly. ---

[GitHub] madlib issue #231: RF: Output non-negative importance values

2018-02-06 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/231 Does this mean, then, that all var importance values are >= 0 now, and that the largest positive value corresponds to the most "important" variable? Also, what is the rang

[GitHub] madlib issue #223: Balance datasets : re-sampling technique

2018-01-16 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/223 Regarding (2) and (3) above, looks like it does not fail with `'red:7, blue:7'` but the MADlib convention is 'red=7, blue=7' so need to change to use `=`. (4)

[GitHub] madlib issue #223: Balance datasets : re-sampling technique

2018-01-12 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/223 Can you please double check that install checks are robust with respect to different Python rounding on different hardware? ---

[GitHub] madlib issue #223: Balance datasets : re-sampling technique

2018-01-12 Thread fmcquillan99

Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/223 Started testing, some early observations: (1) class_size default should be âuniformâ, it seems to be set to âundersampleâ currently (2) ` SELECT

[GitHub] madlib pull request #222: minor update to summary() user docs

2018-01-02 Thread fmcquillan99

GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/222 minor update to summary() user docs to finish off https://issues.apache.org/jira/browse/MADLIB-1167 You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] madlib-site pull request #10: 1dot13 website updates

2017-12-29 Thread fmcquillan99

Github user fmcquillan99 closed the pull request at: https://github.com/apache/madlib-site/pull/10 ---

[GitHub] madlib pull request #220: Add more stats to summary function

2017-12-22 Thread fmcquillan99

Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/220#discussion_r158572631 --- Diff: src/ports/postgres/modules/summary/Summarizer.py_in --- @@ -199,6 +200,22 @@ class Summarizer: args['max_columns']

1 2 >

1 - 100 of 133 matches

Mail list logo