Clustering
Contents
Clustering#
Unsupervised grouping of related items
KModes: Clustering of Categorical Data#
KModes: Clustering of Categorical Data WITHOUT One-Hot encoding!
KMeans clustering using Euclidean distance to find clusters so you need to one-hot encode all categorical data
In KModes clustering, you can directly cluster categorical data without one-hot encoding!
KModes uses the number of dissimilar values between two categorical vectors as a distance metric to assign each data point to its nearest cluster at each clustering step.
Mode is the most observed value for each column in the cluster
Implement in one line of code!
It also supports the K-Prototypes algorithm for combining k-modes and k-means on mixed categorical + numerical data.
pip install kmodes
πΒ Github: https://github.com/nicodv/kmodes