Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/342
https://issues.apache.org/jira/browse/MADLIB-1290
associated JIRA
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/329
Comments on release notes:
(1)
MADLIB-1171 does not apply to AO tables, maybe a typo in this release note:
"Build: Disable AppendOnly if available (MADLIB-1171, MADLIB
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/321
in that case LGTM
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/321
Does this fix the sporadic IC/DC issues that we have been seeing with RF?
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/319
ok, thx
LGTM
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/319
@rahiyer doesn't this line mean vectorization is turned off for PG 9.x+ too?
`#if PG_VERSION_NUM >= 9`
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/319
So this PR only affects GP 6+ ?
It means that GP 4.3.x, GP 5 and all supported PG versions will continue to
work as is, and use Eigen vectorization, if the underlying infra supports it?
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/315
re-running the failed test, seems to pass now:
```
SELECT * FROM knn_result_list_neighbors ORDER BY id;
```
produces
```
id | data | k_nearest_neighbours
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/315
Actually the earlier issue above ^^^ is OK, where I said `I'm not sure what
this is doing` because forcing all training data to be a single point means
that the distance to all test poin
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/317
then let's merge it
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/315
(1)
expression for test data array:
```
DROP TABLE IF EXISTS knn_result_classification;
SELECT * FROM madlib.knn(
'knn_train_data', -
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/315
I'm not sure what this is doing:
```
%%sql
DROP TABLE IF EXISTS knn_result_classification;
SELECT * FROM madlib.knn(
'knn_train_data',
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/315
load data:
```
DROP TABLE IF EXISTS knn_train_data;
CREATE TABLE knn_train_data (
id integer,
data integer
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/315#discussion_r215462116
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -53,22 +55,12 @@ def knn_validate_src(schema_madlib, point_source,
point_column_name
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/317
Please add a hook to remove trailing white spaces in `*.sql_in` files
automatically.
I have changed my sublime settings to remove trailing white spaces, but
other people may edit docs
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/313
is this ready to merge?
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/314
Thanks @njayaram2 for the clarification.
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/314
So this requires Alien, but we do not automatically download or bundle
Alien, correct?
---
Github user fmcquillan99 commented on the pull request:
https://github.com/apache/madlib/commit/5e707f745c50343dd7395a3e8f86c04428210977#commitcomment-30142753
Also fixed some spacing issues
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/308
LGTM
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/295
LGTM, here is an RF example:
```
SELECT * FROM mt_imp_output ORDER BY am, oob_var_importance DESC;
am | feature | oob_var_importance | impurity_var_importance
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/291
thanks, that makes sense.
I added a type casting example to the user docs.
LGTM
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/291
Wondering about order for varchar and text casting.
For this data set:
```
DROP TABLE IF EXISTS golf CASCADE;
CREATE TABLE golf (
id int,
"OUTLOOK&quo
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/291
Where did we land on the boolean casting issue? Testing on Greenplum 5, I
see:
```
(psycopg2.ProgrammingError) plpy.SPIError: ARRAY types boolean and text
cannot be matched
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/298#discussion_r205310071
--- Diff: doc/mainpage.dox.in ---
@@ -100,13 +86,14 @@ complete matrix stored as a distributed table.
@defgroup grp_matrix Matrix
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/298
This should be ready to merge if if looks OK. I don't have any other 1.15
doc related items to deliver.
---
GitHub user fmcquillan99 opened a pull request:
https://github.com/apache/madlib/pull/298
misc 1.15 user doc updates
Added descriptions to left panel for modules that were missing.
Fixed types and formatting in various places.
Cleaned up main use doc page and removed links
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/291
In cols2vec and vec2cols, ordering has been fixed so new columns are always
on the right of the source table columns in the output (if any).
In cols2vec, casting seems OK now. I tested
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/295
I like this last suggestion from @iyerr3, that we report raw values for oob
and impurity VI in the model output file. (OK to keep the shifted oob > 0 as
we do now.)
For the hel
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/291
(1)
Now I think it is casting all numeric to DOUBLE and all non-numeric to TEXT?
But if all the columns are INT, should not cast them to DOUBLE, rather
should create an array of INTs
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/295
Another run I got
```
grp 0 grp1
31.01364943 31.6576
22.85881741
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/295
Should impurity_var_importance always add up to 100?
From the regression example in the user docs:
```
DROP TABLE IF EXISTS mt_imp_output;
SELECT madlib.get_var_importance
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/291
After the above 2 issues I mentioned are fixed, I will have 1 more commit
on user docs to this PR
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/291
In vec2cols,
```
SELECT madlib.vec2cols(
'golf', -- source table
'vec2cols_result',
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/291
In cols2vec,
For this table:
```
CREATE TABLE golf (
id integer NOT NULL,
"OUTLOOK" text,
temperature double precision,
humidity double
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/289
```
The model table produced by the training function contains the following
columns:
gid INTEGER. Group id that uniquely identifies a set of grouping column
values.
sample_id
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/291
user docs seem incomplete
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/282
ah, i see. I think it is fine as you have put it.
LGTM
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/282
looks like user docs lost the params description for dropcols()
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/282
There is a bit of inconsistency related to the last param `cols_to_drop`
```
SELECT madlib.dropcols(
'houses',
&
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/287
This latest commit makes the following changes to use docs:
1) clarify cv for SVM and add user examples
2) clarify cv for elastic net and fix user examples
3) correct rmse calc
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/288
update my comment above to remove the rows processed and skipped.
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/288
Since we are writing out a summary table, may as well add more info in it.
{code}
A summary table named _summary is also created at the same time,
which has the following columns
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/288#discussion_r200510366
--- Diff: src/ports/postgres/modules/cols_vec/cols2vec.py_in ---
@@ -0,0 +1,110 @@
+"""
+@file cols2vec.py_in
+
+@b
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/270
@rahiyer @iyerr3 please review this PR.
thanks
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/269
Thanks for the explanation.
I pushed one additional small commit that changes the name of the module
from "Pearson's Correlation" to "Covariance and Correlat
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/269
Also wondering what the nature of the testing is to ensure that covariance
and correlation are being calculated properly with the added groups?
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/269
(1)
```
DROP TABLE IF EXISTS example_data_output, example_data_output_summary;
SELECT madlib.correlation( 'example_data',
'exam
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/267
There is some reference to HAWQ in
https://github.com/apache/madlib/blob/master/ReadMe_Build.txt
which I donât see removed in the PR.
Otherwise seems OK though I did not do
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/265
there are 33 JIRAs in 1.14
https://issues.apache.org/jira/projects/MADLIB/versions/12342305
but only about 26 or 27 JIRAs in listed in these release notes.
If that is
GitHub user fmcquillan99 opened a pull request:
https://github.com/apache/madlib/pull/264
updated pagerank docs for PPR, minor formating and such
1) minor formatting improvements
2) added reference for PPR and changed PR reference to paper and not
wikipedia
You can merge
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/257
OK done now
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/257
Main changes:
4. Clarified grouping as per
https://github.com/apache/madlib/pull/263
This is final change so you can review and merge if it looks good.
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/263
I tested this quite a bit and it seems to work nicely for me.
LGTM
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/257
Main changes:
1) Updated minibatch docs to show use of encoding scalar integer dep var
2) Added minibatch examples and explanations to MLP
3) Reduced the number of redundant
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/256
LGTM
Default selection looks reasonable:
(0) data
DROP TABLE IF EXISTS iris_data;
CREATE TABLE iris_data(
id serial,
attributes numeric
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/255
LGTM, see https://issues.apache.org/jira/browse/MADLIB-1223 for tests i ran
---
GitHub user fmcquillan99 opened a pull request:
https://github.com/apache/madlib/pull/257
mini-batch user docs
This commit is for the preprocessor user docs.
MLP user doc updates to follow in subsequent commit.
Can someone please review this content? thx
You can
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/256
Oh I see, with the averaging approach:
buffer_size = avg_num_rows_per_segment / num_segments
= 21.5 / 2
= 10.75
and rounding up
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/256
Is this expected behavior? last group for NJ gets only 1 observation
```
DROP TABLE IF EXISTS iris_data;
CREATE TABLE iris_data(
id serial,
attributes numeric
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/256
We seem to be computing batch size using master but prob should just
consider num segments.
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/251
Using the data set from
http://madlib.apache.org/docs/latest/group__grp__nn.html#example
the warm start seems to be functioning OK in the sense that it is picking
up where it left off
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/250
See JIRA https://issues.apache.org/jira/browse/MADLIB-1222 for examples
showing this works for IGD and mini-batch
LGTM
I think u can go ahead and merge this PR to master
---
GitHub user fmcquillan99 opened a pull request:
https://github.com/apache/madlib/pull/252
leftover minor RF user doc update
A few remaining RF user doc changes I missed in
https://github.com/apache/madlib/commit/7f3aae92f2d84bf7e4501ac5efec1ebfc7a80834
Also added
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/246
https://issues.apache.org/jira/browse/MADLIB-1217
https://issues.apache.org/jira/browse/MADLIB-1218
https://issues.apache.org/jira/browse/MADLIB-1219
have all been fixed so I made
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/249
See https://issues.apache.org/jira/browse/MADLIB-1219 for results from my
tests.
LGTM
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/248
I checked against the examples in
JIRA: MADLIB-1217
JIRA: MADLIB-1218
and both work OK for me.
So from the fix to the functionality perspective, LGTM.
Other
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/246
@rahiyer RF docs ready for review too.
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/246
Waiting on these bugs to be fixed before I can finish:
https://issues.apache.org/jira/browse/MADLIB-1217
https://issues.apache.org/jira/browse/MADLIB-1218
https://issues.apache.org
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/242
LGTM, this can be merged
---
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/246#discussion_r176152043
--- Diff:
src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in ---
@@ -418,7 +468,10 @@ tree_predict(tree_model
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/246#discussion_r176150844
--- Diff:
src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in ---
@@ -127,7 +132,11 @@ tree_train(
weights (optional
GitHub user fmcquillan99 opened a pull request:
https://github.com/apache/madlib/pull/246
DT user doc updates
@rahiyer please review DT user doc updates
Will start working on RF in parallel.
You can merge this pull request into a Git repository by running:
$ git pull
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/244#discussion_r175833176
--- Diff: src/ports/postgres/modules/graph/pagerank.sql_in ---
@@ -120,6 +121,10 @@ distribution per group. When this value is NULL, no
grouping is
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/239#discussion_r173254469
--- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in ---
@@ -543,6 +545,95 @@ SELECT * FROM output_table ORDER BY mainhue, name;
(25
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/239#discussion_r173239594
--- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in ---
@@ -543,6 +545,95 @@ SELECT * FROM output_table ORDER BY mainhue, name;
(25
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/239#discussion_r173238804
--- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in ---
@@ -543,6 +545,95 @@ SELECT * FROM output_table ORDER BY mainhue, name;
(25
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/239#discussion_r172949439
--- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in ---
@@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name;
(25
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/239#discussion_r172922334
--- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in ---
@@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name;
(25
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/239#discussion_r172921714
--- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in ---
@@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name;
(25
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/239#discussion_r172921328
--- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in ---
@@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name;
(25
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/239#discussion_r172920935
--- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in ---
@@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name;
(25
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/239#discussion_r172920825
--- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in ---
@@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name;
(25
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/239#discussion_r172920581
--- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in ---
@@ -149,8 +149,10 @@ non-stratified, that is, the whole table is treated as
a
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/238
LGTM
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/234
```
DROP TABLE IF EXISTS abalone_out, abalone_out_dictionary;
SELECT madlib.encode_categorical_variables (
'abalone', -- So
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/234
was just testing 1.13 on postgres 9.6 and found this error
```
DROP TABLE IF EXISTS abalone_out, abalone_out_dictionary;
SELECT madlib.encode_categorical_variables
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/234
Similarly
```
DROP TABLE IF EXISTS abalone_out, abalone_out_dictionary;
SELECT madlib.encode_categorical_variables (
'abalone', -- So
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/234
```
DROP TABLE IF EXISTS abalone_out, abalone_out_dictionary;
SELECT madlib.encode_categorical_variables (
'abalone', -- So
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/235#discussion_r168557191
--- Diff:
src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in ---
@@ -208,13 +208,26 @@ forest_train(training_table_name
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/232#discussion_r168248554
--- Diff: src/ports/postgres/modules/lda/lda.sql_in ---
@@ -182,324 +105,789 @@ lda_train( data_table,
\b Arguments
data_table
GitHub user fmcquillan99 opened a pull request:
https://github.com/apache/madlib/pull/235
update KNN, DT and RF docs to match recent commits
KNN
* describe weighted average in more detail
DT & RF
* correct some doc errors and omissions
* update example to
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/232
Functional test of these 4 commits seem fine to me. I added comments and
examples in:
MADLIB-1160
MADLIB-1201
Will create a PR for associated user doc changes shortly.
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/231
Does this mean, then, that all var importance values are >= 0 now, and that
the largest positive value corresponds to the most "important" variable?
Also, what is the rang
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/223
Regarding (2) and (3) above, looks like it does not fail with `'red:7,
blue:7'` but the MADlib convention is 'red=7, blue=7' so need to change to use
`=`.
(4)
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/223
Can you please double check that install checks are robust with respect to
different Python rounding on different hardware?
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/223
Started testing, some early observations:
(1)
class_size default should be âuniformâ, it seems to be set to
âundersampleâ currently
(2)
`
SELECT
GitHub user fmcquillan99 opened a pull request:
https://github.com/apache/madlib/pull/222
minor update to summary() user docs
to finish off
https://issues.apache.org/jira/browse/MADLIB-1167
You can merge this pull request into a Git repository by running:
$ git pull https
Github user fmcquillan99 closed the pull request at:
https://github.com/apache/madlib-site/pull/10
---
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/220#discussion_r158572631
--- Diff: src/ports/postgres/modules/summary/Summarizer.py_in ---
@@ -199,6 +200,22 @@ class Summarizer:
args['max_columns']
1 - 100 of 133 matches
Mail list logo