A data cluster is a group of objects that are more similar to each other than to those in other groups. Learn here about K-means clustering technique and how to use it for (unsupervised) anomaly detection.
On November 4th and 5th, BigML joined the Qatar Computing Research Institute (QCRI), part of Hamad Bin Khalifa University, to bring a Machine Learning School to Doha, Qatar! We are very excited to have this opportunity to collaborate with QCRI.
During the conference, Dr. Sanjay Chawla discussed his algorithm for clustering with anomalies, k-means–. We thought it would be a fun exercise to implement a variation of it using our domain-specific language for automating Machine Learning workflows, WhizzML.
The usual process for the k-means– algorithm is as follows. It starts with some dataset, some number of clusters k, and some number of expected outliers l. It randomly picks k centroids, and assigns every point of the dataset to one of these centroids based on which one is closest. So far, it’s just like vanilla k-means. In vanilla k-means, you would now find…
View original post 602 more words