The Data Science of Hockey Scoring

The Washington Post/Getty Images

Hockey is the coolest game on Earth.  At least, that is what the NHL reminds us about all the time. We hockey fans love seeing smart team plays, great passes, powerful slapshots, and flashy body checks. And even some limited fighting… But most of all, we love goals!  And we love and glorify the best goal scorers.  If you are a hockey fan, you will understand exactly what I am talking about. And this article is about those scorers.

Right now, the greatest goal scorer of all time is Wayne Gretzky, who reached a record of 894 goals.  The active players closest to him are Jaromir Jagr (766) and Alex Ovechkin (658). Jagr, who is a hero for many fans, is 47 now and semi-retired.  Ovi is 33 and is still going strong but, likely, has just a few years left for his high-scoring game (I hope to be proven wrong here!).

They, and a couple of others, have been the most elite scorers and we want to have more players like them in the future!  But, will we? Are we going to get many new superstar scorers capable of rivaling Gretzky, for example?  

Unfortunately, it is unlikely, unless the game is changed.  To show my point, I will use a bit of simple data analysis…

First, this is the list of the top 25 leading goal scorers per season in the history of NHL (source: NHL.com). Basically, these 25 players led the league in goal scoring ahead of everyone else.  Notice, the only active player on the list is Alex Ovechkin with 65 goals.

Alex’s best season was at 65 goals, which places him 23rd in history. Notice that Wayne Gretzky is at the top of the list with an incredible 92 and 87 goals.  Why do I say “incredible”? These numbers are unbelievably high in the modern hockey era. Think about it this way: during the last season, the entire team of Anaheim Ducks scored 196 goals in 82 games vs. 92 goals scored by just Wayne Gretzky alone in 80 games in the 1981-82 season.  

Continue reading

Posted in Amazing technology, data, and people, Analytics, data analytics, big data, big data analytics, data on the internet, data analytics meaning, Computers, Past, present, and future | Tagged , | Leave a comment

Data Science With Wine

My wife is an expert wine-buyer and every good wine bottle she brings home has a little story attached to it. Even though, occasionally, I (secretly) don’t enjoy the taste of some of those wines, I know they are all considered to be of “high quality” and “very popular”. Some of them simply just don’t match my taste.

However, there have been plenty of wines I’ve tried in the past that were just plain bad. This made me think about the wine manufacturers – why do they even sell a particular (bad) wine? Can’t they just predict a customer’s response by tasting their own wine or by measuring objectively a few things about a wine’s chemistry and physics?

Wine-making is a big industry and there are already quite a few studies done and papers published in this field trying to answer this question. In fact, after a quick search online, I found a few data science studies trying to address it, and many of them were referring to the following paper:

  • P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

This particular publication came with a couple of datasets available for everyone to play around with, so I decided to use them for my next project. As an advisor to BigML, which is a leading analytics company, I wanted to analyze this wine quality data using their online platform and see 1) can I answer my question about predicting wine quality from some objective measurements and 2) how quickly could this be accomplished using the BigML online solution?

First, I downloaded the wine composition and quality assessment data from here: there are two datasets available with 1599 entries for red wine and 4898 entries for white. Even if I prefer red wine, I decided to go with the larger dataset for my study.

The dataset included 11 wine features such as residual sugar, density, pH, alcohol and few others (check the dataset if interested) and one numerical value for quality of each wine, which was expressed as a number between 0 (very bad) and 10 (excellent).

I felt that the regression analysis (having a numerical output in mind) will be too noisy and inaccurate, I decided to simply split the dataset into two classes: bad wine (0-6) and good wine (7-10).

I used ExcelⓇ to do these initial data manipulations and then imported the dataset into the BigML online portal (a simple drag and drop). Notice in the picture below how convenient it is to see all the distributions for each data column and their descriptive statistics.

Continue reading

Posted in AI, artificial intelligence, machine learning, deep learning, Analytics, data analytics, big data, big data analytics, data on the internet, data analytics meaning, Computers, Data Analysis and Visualization, Humor | Tagged , , , , | Leave a comment

Artificial Intelligence (AI) vs. Machine Learning (ML) vs. Deep Learning

I already had a post on this subject before (link) but want to summarize it again (with an added timeline):deep_learning_icons_r5_png-jpg

Deep learning is a sub-area of Machine Learning, which is a sub-area of Artificial Intelligence.

Also, here you can find the main definitions of Big Data Analytics, Machine Learning, and other terms.

source
Posted in AI, artificial intelligence, machine learning, deep learning, Analytics, data analytics, big data, big data analytics, data on the internet, data analytics meaning, Computers, Data Analysis and Visualization, Past, present, and future, The future of artificial intelligence | Tagged , , , , , , , | Leave a comment

Amazing Supercomputer Art – Part 2

The first post on this subject was focused on Cray supercomputers, which place beautiful images on the front to add an artistic touch to their technically-impressive machines.

In this (second) post, I will mostly address the “beauty through design” approach taken by Cray and a few other supercomputer makers.

Let’s start with the Thinking Machines Corporation.  Founded in 1983, it has delivered some of the most advanced (for its time) and good-looking computers ever.  A brief promotional video for its first models is available on YouTube.

Thinking Machines’ CM-5 Supercomputer, also known as FROSTBURG , was installed at the US National Security Agency (NSA) in 1991 for code-breaking tasks, and was operational until 1997:

NSA's thinking machine supercomputer - Blackboxparadox.com

No decorations, no frills.  However, this supercomputer still remains one of the most futuristic-looking supercomputers ever. Its flashing and constantly changing red light panels showed processing node usage, and were also used for its diagnostics. In fact, this old supercomputer looks so good it ended up in a Jurassic Park movie:

cm5 supercomputer in jurasic park - blackboxparadox.com

To me, the CM-5 design actually looks inspired by the WOPR computer from WarGames (1983), which wasn’t a real computer, of course, but a realistically-looking movie prop:

Continue reading

Posted in Amazing technology, data, and people, Computers | Tagged , , , , , , , , , | 1 Comment

Computer Humor

Image | Posted on by | Tagged | Leave a comment

Computer Humor

Image | Posted on by | Tagged | Leave a comment

Computer Humor

Image | Posted on by | Tagged | Leave a comment

Computer Humor

Image | Posted on by | Tagged | Leave a comment

Computer Humor

Image | Posted on by | Tagged | Leave a comment

Computer Humor

Image | Posted on by | Tagged | Leave a comment

Which Countries are EU Contributors and Beneficiaries?

Interesting data…

https://www.statista.com/chart/18794/net-contributors-to-eu-budget/

Posted in Data Analysis and Visualization, Money, business, investments, statistics, trends, Random | Tagged , , | Leave a comment