Dear all,

I am fitting a very trivial GMM with 2-10 components on 100 samples and 5 features in pyspark and observe some of the log-likelihoods being positive (see below). I don't undestand how this is possible. Is this a bug or an intended behaviour? Furthermore, for different seeds, sometimes the likelihoods even change sign. Is this due to the EM only converging to a local maximum?

Cheers and thanks for your help,

Simon


```

for i in range(2, 10 + 1):

km = GaussianMixture(tol=0.00001, maxIter=1000, k=i, seed=23)

model = km.fit(df)

print(i, model.summary.logLikelihood)

2 -197.37852947736653
3 -129.9873268616941
4 252.856072127079
5 58.104854133211305
6 102.05184634221902
7 -438.69872950609897
8 -521.9157414809579
9 684.7223627089136
10 -596.7165760632951

for i in range(2, 10 + 1):

km = GaussianMixture(tol=0.00001, maxIter=1000, k=i, seed=5)

model = km.fit(df)

print(i, model.summary.logLikelihood)

2 -237.6569055995205
3 193.6716647064348
4 222.8175404052819
5 201.28821925102105
6 74.02720327261291
7 -540.8607659051879
8 144.837051544231
9 -507.48261722455305
10 -689.1844483249996
```

Reply via email to