Re: [scikit-learn] Decsion tree Visualization

2024-01-26 Thread Christian Braune
Hello Apoorva, have you tried this function: https://scikit-learn.org/stable/modules/generated/sklearn.tree.plot_tree.html ? It has a max_depth parameter which might just do, what you need. Have a nice weekend! Kulkarni, Apoorva schrieb am Fr., 26. Jan. 2024, 19:49: > Hello, > > For an academi

Re: [scikit-learn] level search traversal on binary decision regression tree with recursive calls returning wrong node order

2024-01-14 Thread Christian Braune
Hello Marc, you might want to look at the intro to algorithms and data structures course from Sedgewick (your specific problem is discussed here: https://www.cs.princeton.edu/courses/archive/spring15/cos226/lectures/31ElementarySymbolTables+32BinarySearchTrees.pdf, p50/51 (slide 22 specifically).

Re: [scikit-learn] How to extract subtree from a RegressionTree using the tree attribute?

2024-01-09 Thread Christian Braune
Hi Marc, a first observation: stack.get(0) returns but does NOT remove the first element from a list (even if you name it stack). If you want a stack, you need to use the pop method. See also here: https://docs.python.org/3/tutorial/datastructures.html#using-lists-as-stacks Best regards Christ

Re: [scikit-learn] What are the stopwords used by CountVectorizer?

2020-01-27 Thread Christian Braune
Hi, https://github.com/scikit-learn/scikit-learn/blob/b194674c42d54b26137a456c510c5fdba1ba23e0/sklearn/feature_extraction/_stop_words.py Regards Christian Peng Yu schrieb am Mo., 27. Jan. 2020, 21:31: > Hi, > > I don't see what stopwords are used by CountVectorizer with > stop_wordsstring =

Re: [scikit-learn] Clustering Algorithm based on correlation distance

2019-09-03 Thread Christian Braune
Using correlation as a similarity measure leads to some problems with k-means (mainly because the arithmetic mean is not at all an estimator that can be used with correlation). If you properly normalized the correlation DBSCAN might be an alternative. The minpts parameter will still have the same

Re: [scikit-learn] fit before partial_fit ?

2019-06-09 Thread Christian Braune
The clusters produces by your examples are actually the same (despite the different labels). I'd guess that "fit" and "partial_fit" draw a different amount of random_numbers before actually assigning a label to the first (randomly drawn) sample from "x" (in your code). This is why the labeling is

Re: [scikit-learn] Can cluster help me to cluster data with length of continuous series?

2019-04-03 Thread Christian Braune
Hi, that does not really sound like a clustering but more like a preprocessing problem to me. For each item you want to calculate the length of the longest subsequence of "1"s. That could be done by a simple function and would create a new (one-dimensional) property for each of your items. You cou

Re: [scikit-learn] Jeff Levesque: association rules

2018-06-11 Thread Christian Braune
Hey, Christian Borgelt currently has several itemset mining algorithms online with a python interface: http://borgelt.net/pyfim.html . Best regards, Chris Sebastian Raschka schrieb am Mo., 11. Juni 2018 um 19:30 Uhr: > Hi Jeff, > > had a similar question 1-2 years ago and ended up using Chris

Re: [scikit-learn] Getting the indexes of the data points after clustering using Kmeans

2018-02-21 Thread Christian Braune
Hi, if you have your original points stored in a numpy array, you can get all points from a cluster i by doing the following: cluster_points = points[kmeans.labels_ == i] "kmeans.labels_" contains a list labels for each point. "kmeans.labels_ == i" creates a mask that selects only those points t