This weekend, the Columbus Blue Jackets played host to the inaugural CBJHAC, a hockey analytics conference featuring panels and presentations from NHL employees, data collection companies, and some of the internet’s most well-known hockey stats bloggers. The event was the brainchild of Jackets assistant GM Josh Flynn and The Athletic’s Alison Lukan. They did a tremendous job with organization and the set-up was highly professional. Alison kept things on schedule as the MC, and deserves her status as superstar of the stats community. Attendees I spoke with who have been to other hockey analytics conference said this experience stood out as one of the best.
One major benefit of having the resources of a team is that they had a high quality live stream of all of Saturday’s presentations. You can watch them for yourself here:
I had three takeaways from the day: first, the hockey analytics community is highly collaborative. Many of the same names were credited in presentation after presentation, as they built their research on prior work of others. The second, related, takeaway is that while the research has come a long way, there is still so much left to uncover. Many presentations featured a discussion of “future work” at the end, where the work to answer one question led to several other questions that merit deeper consideration.
Finally, the conference has changed the way I watch hockey live. I noticed many different details while watching the Colorado game than I did the Detroit game the previous night. I found myself looking at things like special teams formations, goalie positioning, transition play, shift changes, etc.
Beyond the numbers, the event was a great chance to socialize and network with people from all over the continent who love hockey. All attendees got to attend the Colorado game, and there was much merriment at R Bar on Friday and Pins Mechanical on Saturday. I have to give a special shout-out to my SBN colleagues Sie from Fear the Fin, Bryan from On the Forecheck (who got credentials to cover the Colorado game. Check out his Twitter for charts he made during the game, and for tweets throughout the conference), and Kurt and Steph from Broad Street Hockey. Also, Cannon commenter the_tom_burns was there, we drank too much both nights, and in the comments he can point out anything I forget to mention below.
Here’s a full list of panels and presentations. It is worth your time to check out the video above to watch these for yourself (not all at once, it will take longer than watching The Irishman twice). Also, it’s worth checking the websites and Twitter feeds for each presenter, as most will share their slides and data. I’ve included some observations I made which I feel apply to the Blue Jackets.
Opening Remarks from Jarmo Kekalainen. The Jackets’ GM welcomed everyone with a brief statement, where he encouraged them to keep moving forward with their work, with the goal that eventually all “hockey people” will fully buy in to using analytics as a tool.
NHL Panel (Analysis Focus). Moderated by Craig Custance (The Athletic), featuring Richard Dry (Rangers), Arik Parnass (Avalanche), Zac Urback (Blue Jackets), and Tom Poraszka (Golden Knights). These men shed light into how NHL teams make use of advanced statistics. All agreed that analytics does not consist of stats alone. It must combine the numbers with observation from coaches and scouts, whether in person or via video review. They acknowledged that they do read the work of the independent researchers. Poraszka noted that it saves him time to read that work rather than have him and his team crunch the numbers themselves. Urback compared the publicly available data as watching grainy video of a game. The tracking data that the NHL will soon have will be more like watching the game in HD. The goal, he said, will be to improve data collection and processing so it matches the clarity of watching the game live.
When asked about intangibles like grit, heart, character, team chemistry, etc., Parnass made an interesting point that those elements do show themselves in the numbers, because those traits make players and teams better, so their numbers will be better.
Expected Goals in Hockey: A Review. Josh and Luke Younggren. The Younggren twins run the site Evolving Hockey. In this presentation they discussed the history of Expected Goals as a concept in hockey, going back to 2004. They created graphs that compared the various xG models, including their own. As we discussed in the comments last week, xG is a model which predicts the probability that a shot will become a goal. Instead of the traditional “home plate” between the goal and the faceoff dots, the heat map of the highest probability shots actually more closely resembles a baseball field.
We are specifically quite proud of these heatmaps showing how different xG models compare in terms of shot location/value. Not sure if this is something we can feasibly add for players/teams for other on-ice metrics, but we will definitely be looking into this. pic.twitter.com/VoS0CgVpx9— EvolvingWild (@EvolvingWild) February 9, 2020
The Anatomy of a Power Kill. Meghan Hall and Alison Lukan. The data for this talk came mostly from Hall, with Lukan providing Columbus-specific examples via video. (It became a theme of the conference that presenters would pander to the crowd with a CBJ example. I approved.) The Jackets were trend-setters of this concept, as evidenced by them setting an NHL record by getting nearly 25% of the unblocked shot attempts while shorthanded.
How do they do this? First, they set up in a triangle formation with 2D low and 1F above them, between the circles. Then the second forward floats around up top, aggressively seeking to disrupt the passing lanes or steal the puck. The two forwards can switch off as necessary as the puck moves in and out. The second key is that offensively-skilled players like Cam Atkinson and Josh Anderson get many PK shifts. Other teams have picked up on this, with guys like Mitch Marner and Sebastian Aho getting PK minutes despite not being known for their defense.
Enhancing Traditional Shift Analysis with Transitional Play Data. Alex Novet. This was easily one of the most entertaining presentations, with Novet displaying much wit and personality in addition to sharing his data clearly and concisely. He began with an example that I won’t spoil, but which pointed out an embarrassing moment for a much-hated player. (Skip to the 1:40:00 mark of the video above)
His analysis used tracking data to see when players chose to end a shift. On average, shift changes occur two seconds after a zone entry. This makes sense, as it is safest to change when the puck is farthest from your net, and you have a higher chance of scoring when you have fresh players coming into the offensive zone with possession.
Defensemen were generally better at making timely changes than forwards, and the Jackets were in the bottom third of the league in making good changes (within three seconds of entry). Markus Hannikainen was the seventh worst bad changer in the league. Bo Horvat was the worst.
A comprehensive analysis of pass difficulty, value and tendencies in ice hockey. David Yu. Yu works for Sportslogiq, one company that sells proprietary data to NHL teams. With their tracking data they were able to build a chart which shows the most common passing paths and then heat maps for NHL teams which show which types of passes they make or less often than average. They can see which players make and receive which passes most often, and that is linked to videos so you could see, for example, every pass through the neutral zone from Erik Karlsson to Evander Kane.
He pulled up the chart for the Blue Jackets. On the power play chart, it showed very infrequent passing in the offensive zone. That tracks with the eye test, doesn’t it? At even strength, however, the Jackets are shown to spend a lot of time below the goal, with passes to that area as well as passes from behind the goal line to the area in front of the crease.
Defending the blue line, the impact of a pass. Corey Sznajder. Quanitifying zone exits and entries is not something that is currently provided in publicly available data. So it falls to someone to watch every game and make note of each entry and exit attempt. It’s a daunting task and Sznajder is the hero that does it. In this talk, he looked at the other side, which is which kind of entries teams allow. Do defenders position themselves to stop a carry-in, which allows room for a pass? Or do they defend the pass and allow a carry-in instead? His data showed that defensemen are more likely to allow a controlled entry than a forward is. Usually when a forward is defending an entry, they have defensive help behind them so they can be more aggressive. If a defenseman is defending the entry, they are the last line of defense and so instead try to steer the player to the side and wait for help.
One recent development this year is that fewer defensemen are good at just defending the carry-in or the pass. Most are either good at both, or bad at both.
Which League is Best? Using Paired Comparison Models to Estimate Hockey League Strength and Project Player Performance. Katerina Wu. This was one of my favorite presentations. Wu is still a college student (UNC) who performed this research at a data camp last summer. Previous NHL equivalency models looked exclusively at player points at their pre-NHL season and rookie NHL season. Her analysis went deeper by looking at transitions between all leagues (for example, SHL to AHL) and also considered factors like age, position, league scoring, etc.
She revisited the debate between Pierre-Luc Dubois and Jesse Puljujarvi and her data revealed that PLD did, in fact, have a better 2015-16 season than Puljujarvi did. I wish I had this information back then, when I was one of the few not upset by the move.
Public prospect analysis lacks crucial nuance. Hannah Stuart. In the first of three”lightning” presentations, Stuart discussed not what we know about prospects, but how much we do NOT know. There is no reliable data for time on ice, shots, and even information about who is on ice during a goal is not always accurate. Furthermore, people often lose sight of different schemes and roles in the NHL and lower leagues. That is, just because a player is not a first liner does not make him bad or not worth drafting.
A Data-Driven Model for Predicting NCAA D1 Hockey Game Outcomes. Nayan Patel. Patel is a video intern for the Jackets and a student at Ohio State. Inspired by models at Money Puck and The Athletic, Patel decided to build a similar one for college hockey. You can check out his daily predictions and other charts at HockeyU Analytics.
NHL-drafted skater development in NCAA hockey. Trevor Greissinger. Greissinger is a Michigan student who works as an analyst for the Wolverines hockey team. This presentation was interesting because his research failed to confirm his hypothesis: would college players benefit from spending more seasons in college before signing their ELC? The data did not show that evidence, though the sample also was not large. Still, the methodology was sound and it was good to hear him walk us through his process.
Extracting Player Tracking Data from Video Using Non-Stationary Cameras and a Combination of Computer Vision Techniques. Neil Johnson. Johnson is an OSU grad who works for ESPN. Unlike the upcoming player tracking data which uses stationary cameras installed in the arenas and chips in the puck and the players’ pads, Johnson’s system is able to provide player tracking data from the broadcast footage. Even when players are off-frame, the model can account for their likely movement between known positions. It’s still a work in progress, but the behind-the-scenes look is fascinating.
Q&A Panel. Moderated by Seth Partnow (The Athletic). Featuring Chris MacFarland (Avalanche, previously of the Blue Jackets), Josh Flynn (Blue Jackets), Chris Boucher (Sports Logiq), and Andrew Thomas (SMT, formerly of the Wild and War On Ice). Boucher represents Sports Logiq, which I mentioned earlier provides video-based tracking data to NHL teams. Thomas works for SMT, which is the company responsible for the upcoming chip-based, real-time player tracking data. They discussed the nature of their data, while MacFarland and Flynn discussed how teams can use that data. Thomas made the case that the tracking data should be made public, so it increases the amount of research that can be done with it. He pointed out that SMT is primarily a broadcast company (they’re the ones that developed the first down line for football), so the tracking will enhance TV coverage. We saw some examples during the All-Star game.
Another factor they discussed is that the new data will start to fill in some large gaps in measuring defensive play. Defenders who don’t score as much are currently undervalued. Now teams can see more clearly who the shutdown players are, and then will have to decide how much that is worth in terms of contract value.
Goaltender Positioning & Applications. Cole Anderson. Anderson is another Sports Logiq employee, and his specialty is goaltender analysis. Here he presented data regarding goalie position and how that can be added to an Expected Goals model. For example, he used this play (Cam’s first 5v5 goal of the year!):
if you've ever wondered how the probability of goal might have changed if we knew where the goalie was relative to the Atkinson shot (or Dubois pass!) check out my talk at #CBJHAC tomorrow! then stay for the other, better ones pic.twitter.com/fsTKeuU4iD— Cole Anderson (@ice_cole_data) February 7, 2020
Most xG models would only give probability based on the shot type and Cam’s location. Some do consider cross-ice passes, which is obviously a key contributing factor here. Anderson’s data would show Georgiev’s positioning, which obviously makes the shot much easier (above 80% probability per his model).
He showed a chart which plotted all goalies this year with one axis being their average distance from the goal line when a shot is taken, and the other axis is their average distance from the optimal angle to block the shot. Joonas Korpisalo was the farthest from the optimal angle, which shows that he’s an aggressive goalie, but also one at risk of being caught out of position.
He spent some time discussing the change in Sergei Bobrovsky from last year to this year. Bob’s distance from the goal has decreased this year. It’s likely an adjustment to the play in front of him, as he is facing more one-timers, lateral passes, and high shot velocity.
NHL Panel (Management Focus). Moderated by Craig Custance (The Athletic). Featuring Bill Zito (Blue Jackets), Matt Cane (Devils), Ryan Miller (Blues), Alexandra Mandrycky (Seattle). The most interesting part of this panel was hearing from Mandrycky, who is helping to assemble a team - and a front office - from scratch in Seattle. Whereas the others on stage came into an existing system and have to work to integrate statistical concepts, Seattle can form an analytical identity from Day One. My favorite quote of hers was calling stats a “BS detector” when asked about evaluating potential coaches. If a coach says something is a strength of his system, the numbers from his previous stops can confirm or deny this.
Zito said that Kekalainen is always looking for the cutting edge in analysis. His questions to guide the analytics department are “why” and “what next.” He said that to this point goaltending evaluation has been the most heavily reliant on scouting. He gave credit to former goalie coach Ian Clark for having done the work to scout goalies like Elvis Merzlikins and Daniil Tarasov. He is eager for more data on goalies to enhance that scouting.
As for Torts, Zito said that he is receptive to data, and it has had an impact. Torts has become more liberal at pulling the goalie than he used to be. There was another late game strategy (he wouldn’t give more details than that) which Torts has adopted - in large part because the analysts described themselves as “99% certain) it would work. That confidence is key to persuading old school types to change their ways.
Miller gave a good example of how the data can lead to video, which can then lead a necessary coaching correction. They were curious why a certain unnamed player was underperforming. They were able to isolate certain plays to examine, which showed that when chasing a loose puck the player would not turn his head to look behind him. They showed the clips to the coach, who then worked with the player. After that, the players’ numbers improved.
Cane embraced conflict between the coaches and the analysts. The ensuing discussion would lead to greater understanding on both sides about how they saw the game. It is important to establish some points of agreement first, however. That is how they build trust in each other.
Rush/Non-Rush contributions at a player level. Meghan Chayka. Chayka (whose brother, John, is GM of the Coyotes) co-founded the company Stathletes, which does video-based player tracking. She showed one particular stat they track, which looks at players creating opportunities off the rush. PLD was one of the top 20 players listed in even strength rush effects. (Artemi Panarin and Anthony Duclair were up there as well). The most impressive player was Connor McDavid, who was twice as good as the next player. He’s so good that there is just a 1 in 5000 chance of there being another player as good. For reference, there have only been 7000 players in NHL history. McDavid may literally be one of a kind.
Goal Sequencing and its contribution to Score Effects. Micah Blake McCurdy. Here at The Cannon, we’ve always been big fans of McCurdy and his site, HockeyViz. His graphs are frequently included in our articles. He was the hero of CBJHAC but not for his presentation (even though it was excellent) but because of the harrowing journey it took to get from Halifax to Columbus. He left his home at 9 a.m. Atlantic time on Friday and didn’t get to Nationwide until after 2 p.m. Eastern on Saturday.
When I finally persuaded them to let me through (I showed them tweets from @AlisonL about how I was on the presentation schedule!!) a customer service agent radiod the just-closed flight and persuaded them to let me on. Then when we got to the gate, they changed their mind.— Micah Blake McCurdy (@IneffectiveMath) February 8, 2020
Ok one final tweet— Micah Blake McCurdy (@IneffectiveMath) February 8, 2020
Planes: three taken, out of seven booked
Then he delivered his talk while wearing a tiara. What a badass. (Did he keep wearing it at the game, and at the bar? Dear reader, yes he did.)
Micah Blake McCurdy @IneffectiveMath traveled through hell (Pittsburgh, and elsewhere) over 24 hours to get here, but it's worth it. Entertaining and informative presentation on score effects #CBJHAC pic.twitter.com/OxKGy6ozoU— The Cannon (@cbjcannon) February 8, 2020
We’ve long known that “score effects” exist. This concept holds that teams that are trailing will produce more shots than the leading team as the game goes on, in an attempt to erase the deficit. What McCurdy found was that this effect is determined not by the trailing team, but rather the leading team’s behavior is the driving factor.
If teams are tied in the third period, the “threat” goes way down because the teams are incentivized by the points system to play for the regulation tie. Regardless of score, the threat level is highest in the second period, due in part to the long change. There are big changes in the numbers immediately after the intermission, which makes sense because “the coaches just yelled at the players for 10 minutes.”
In the third period, the biggest boost to threat comes when the lead goes from +1 to +2. The least threat comes when it goes from -1 to tied (that is, work to salvage one point, then hold on to it for dear life). Generally there is an advantage to the team that scored most recently.
To learn more, watch the presentation above or read his article about it here:
Also, I've already written this talk up and you can read the first draft here: https://t.co/h4q3snzQom— Micah Blake McCurdy (@IneffectiveMath) February 8, 2020
(The final draft will have a raft of citations, there aren't any yet which is indefensible scholarship)
Data-Driven Story Telling. Alison Lukan. Lukan returned to do a second presentation. I’ve always felt that she is the best at making advanced statistical concepts accessible to the average hockey fan. This presentation revealed her philosophy and techniques that make her so good. My biggest takeaway: focus on one thing, but acknowledge that other factors exist. This approach keeps your work concise and clear, and also opens the door for further examination.
The conference then wrapped up with three quick “lightning” presentations:
A datavis-first approach to figuring out the world of hockey analytics. Bill Tran. Tran walked through the basics of choosing the right visualization to present your data.
Does Expansion Dilute the Talent Base? Nathan Gabay. This is a relevant topic again with Seattle joining the league in 2021. Gabay’s research showed that - defying expectation - the talent level in the league was not greatly effected by expansion. There is a small blip each year a new team entered, but it corrected quickly thereafter.
Impact of a sellout crowd on shots on goal totals. Phil Krumm. Another example of a presentation that focused on the methodology to answer a straight-forward question. Again, the answer was that the expected effect did not exist. After joking that sellouts would produce more shots due to more fans shouting “SHOOT” Krumm found that home teams get no boost in shots in front of sellouts, but away teams do see their shot rate decrease, and that is the factor that increases home team win probability.
I encourage you to look up all of these presenters and follow more of their work on any hockey topic that sounds interesting to you. If the Jackets host another conference in the future (I really hope they do), you should definitely consider attending.