Are Data Tools Gamifying Data?
It bugs me how many data tools today seem designed to gamify data
Here’s the typical experience: These tools expose data scientists to hundreds of metrics and cuts. The idea is simple: more metrics = more opportunities to find patterns. But is 𝑚𝑜𝑟𝑒 really better?
I think there’s a better way.
Instead of throwing endless options at data scientists, tools should have an opinion. They should algorithmically identify the 𝑡𝑜𝑝 𝑝𝑎𝑡𝑡𝑒𝑟𝑛𝑠 that matter—patterns I should care about. I don’t want to waste time chasing rabbit holes. 🐇 🕳️
A Smarter Approach to Finding Patterns
This new way means taking an 𝐚𝐥𝐠𝐨𝐫𝐢𝐭𝐡𝐦𝐢𝐜 𝐚𝐩𝐩𝐫𝐨𝐚𝐜𝐡 to finding patterns in data. For example, we can adapt machine learning methods to discover what splits or subgroups are meaningful.
One example of this approach is EconML from Microsoft Research. It’s one of GitHub’s top causal inference packages, and it helps automate the discovery of what patterns and splits matter most in data (it can even tell us what matters at the level of the individual): https://www.microsoft.com/en-us/research/project/econml/
Balancing Algorithms and Expertise
To be clear, I still believe data scientists should have access to the underlying data. There will always be unique patterns (like logging issues or edge cases) that only deep domain experts can uncover.
But this should be the 𝐬𝐞𝐜𝐨𝐧𝐝 𝐬𝐭𝐞𝐩 after an algorithm has done the heavy lifting to first surface the most important patterns.