Alright, Frank. I see. Thank you very much for your reply! On Sat, May 1, 2021 at 1:49 AM Frank McQuillan <[email protected]> wrote:
> Hi Nantia, > > You need to provide a regular postgres type vector for the parameter > `independent_varname` . It does not support a run length encoded svec > type unfortunately. > > Often I use the utility > http://madlib.apache.org/docs/latest/group__grp__cols2vec.html > in such cases, making use of '*' for the list_of_features​ parameter so I > don't need to list all the columns out. And the > list_of_features_to_exclude​ parameter may come in handy too. > > Frank > > ------------------------------ > *From:* Nantia Makrynioti <[email protected]> > *Sent:* Friday, April 30, 2021 9:33 AM > *To:* [email protected] <[email protected]> > *Subject:* Re: GLM with svec column in independent variables > > Hello Frank, > > Thanks a lot for your message and I'm sorry for my late response to this. > > So, if I have categorical features ending up in large vectors after > one-hot encoding, is there a way to run glm without generating a huge > denormalized representation of the features? > > Nantia > > On Fri, Apr 2, 2021 at 6:51 PM Frank McQuillan <[email protected]> > wrote: > > Hi Nantia, > > I replied to this but somehow I don't think my response got to the mailing > list. > > The GLM method > http://madlib.apache.org/docs/latest/group__grp__glm.html > <https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmadlib.apache.org%2Fdocs%2Flatest%2Fgroup__grp__glm.html&data=04%7C01%7Cfmcquillan%40vmware.com%7C5a8520779c5541371a9008d90bf5bb1b%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637553972306082610%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2F3ipTjWMWNqOyoR6YF4myhEEaeo%2FRVg4c5zYex2cpwo%3D&reserved=0> > does not support SVEC inputs for the parameter `independent_varname` . > That parameter can be any expressions that resolves to an array, as in the > example from the user docs: > > SELECT glm('warpbreaks_dummy', > 'glm_model', > 'breaks', > 'ARRAY[1.0,"wool_B","tension_M", "tension_H"]', > 'family=poisson, link=log'); > > Frank > > ------------------------------ > *From:* Nantia Makrynioti <[email protected]> > *Sent:* Saturday, March 13, 2021 10:46 AM > *To:* [email protected] <[email protected]> > *Subject:* GLM with svec column in independent variables > > Hello, > > Is there a way to run the glm training function using a svec (sparse > vector) column in the independent variables? I'm using the > encode_categorical_variables function to transform a set of categorical > features to a sparse vector for every tuple, but glm does not seem to > accept this column as an independent variable. > > Thank you very much in advance, > Nantia > >
