K-means — Finding Anomalies while Clustering

A data cluster is a group of objects that are more similar to each other than to those in other groups. Learn here about K-means clustering technique and how to use it for (unsupervised) anomaly detection.

The Official Blog of BigML.com

On November 4th and 5th, BigML joined the Qatar Computing Research Institute (QCRI), part of Hamad Bin Khalifa University, to bring a Machine Learning School to Doha, Qatar! We are very excited to have this opportunity to collaborate with QCRI.

During the conference, Dr. Sanjay Chawla discussed his algorithm for clustering with anomalies, k-means–. We thought it would be a fun exercise to implement a variation of it using our domain-specific language for automating Machine Learning workflows, WhizzML.

Applying BigML to ML research

The Algorithm

The usual process for the k-means– algorithm is as follows. It starts with some dataset, some number of clusters k, and some number of expected outliers l. It randomly picks k centroids, and assigns every point of the dataset to one of these centroids based on which one is closest. So far, it’s just like vanilla k-means. In vanilla k-means, you would now find…

View original post 602 more words

This entry was posted in Computers. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.