Clustering#

Unsupervised grouping of related items

KModes: Clustering of Categorical Data#

KModes: Clustering of Categorical Data WITHOUT One-Hot encoding!

  • KMeans clustering using Euclidean distance to find clusters so you need to one-hot encode all categorical data

  • In KModes clustering, you can directly cluster categorical data without one-hot encoding!

  • KModes uses the number of dissimilar values between two categorical vectors as a distance metric to assign each data point to its nearest cluster at each clustering step.

  • Mode is the most observed value for each column in the cluster

  • Implement in one line of code!

  • It also supports the K-Prototypes algorithm for combining k-modes and k-means on mixed categorical + numerical data.

pip install kmodes

🌟 Github: https://github.com/nicodv/kmodes