As a trained data scientist, I am very sensitive to metrics. Recently, I encountered several interesting ideas on measurement.
The latest paper on measuring AI performance is Are Emergent Abilities of Large Language Models a Mirage? (Schaeffer et al.). To give you some background knowledge, it is observed that AI performance has improved drastically from GPT2 to 3, and the main difference is the model scale, which is mostly the number of parameters when training the model (GPT-2 only leveraging 1.5 billion parameter, GPT-3 is over 100 times bigger).
Though emergent properties1 have been observed in large language models, this paper shows that: “emergent abilities seem to appear only under metrics that nonlinearly or discontinuously scale any model’s per-token error rate.” Because when they change to other metrics, that spike in performance no longer exists.
What you measure is a very interesting concept as Peter Drucker also said, “[only] what gets measured, gets managed.” In the art of doing science and engineering, Hamming also said, “you get what you measure.”
In giving a final exam in a course, say in the calculus, I can get almost any distribution of grades I want. If I could make up an exam which was uniformly hard, then each student would tend either to get all the answers right or all wrong. Hence I will get a distribution of grades which peaks up at both ends. If, on the contrary, I asked a few easy questions, many moderately hard, and a few very hard ones, I would get the typical normal distribution; a few at each end and most of the grades in the middle.
As you can probably see from all the examples, measurement is manipulatable. You can always show people crazy business growth to your both if you know how to manipulate it well. That’s why most of my data science work is thinking about what metrics to use among so many for integrity (you are using the right metrics), accuarcy (the metric is measured and compute correctly), and relatedness (business objective can be represented by the metrics). And this applies to life implicitedly too.
When I was fired, while a friend of mine got a easily return offer. I felt life quite unfair. What is my measurement here? It seems to be money, ease to coventional success, reputation, and life comfort. However, is that the north star metric to measure my career? Overtime, it seems to be a strong no for me.
Nowadays, I resonated with what Paul Graham said:
I do make some amount of effort to focus on important topics. Many problems have a hard core at the center, surrounded by easier stuff at the edges. Working hard means aiming toward the center to the extent you can. Some days you may not be able to; some days you'll only be able to work on the easier, peripheral stuff. But you should always be aiming as close to the center as you can without stalling. — How to work hard
This is also how I decide on what work to do and what job to take nowadays. Am I become closer to the center of my great work? Or I’m avoiding it because I’m afraid of failure, afraid of being poor, afraid of being undervalued, afraid of discomfort?
I’m not saying that we always need to going on the straight line toward our great work, because it’s not possible. Life is messy and we are messy too. However, if we have a rough sense of direction, we can get closer. Let me prove you with this drunken sailor problem.
It is well known the drunken sailor who staggers to the left or right with n independent random steps will, on the average, end up about square roote of n steps from the origin. But if there is a pretty girl in one direction, then his steps will tend to go in that direction and he will go a distance proportional to n. In a lifetime of many, many independent choices, small and large, a career with a vision will get you a distance proportional to n, while no vision will get you only the distance In a sense, the main difference between those who go far and those who do not is some people have a vision and the others do not and therefore can only react to the current events as they happen.
If you are interested in how does it play out, check out these resources that I found: Interesting drunken sailor animation2 and your career is a random walk3. Yet, the essence of the concept is that even though we are lost and constantly make stupid and inaccurate decisions in our messy lives, if we know what our north star is, we can still be biased toward where we want to be. So, what metric is it that I choose for my life?
I have two: one for my career and one for relationships.
For career: Get closer to working on my self-defined great work. I define great work as work that enables learning to be valued as an experience, not a transaction.4
For relationships: The ability to learn about love and experience love.
Why does this matter? For instance, my old metric for my romantic relationship is "to be a great partner." And that's a bad metric because when I was treated miserably, I thought, "I'm still learning to be a great partner." So, I modified it to "be a great partner and find a great life partner." But that's not good enough either because I experienced a lot of growth and love through heartbreak, while under this metric, it is just a failure. That's why I refined it as "the ability to learn about love and experience love."
As for my career, as you can see, it is still very vague because I don't know much yet. However, no matter how lost I feel and how bleak the reality looks, with a rough idea of great work, I believe I will get proportionally closer as long as I keep that north star in my heart.
p.s.: Damn. Living like a drunken sailor, I feel so hopeful after writing this post. Thank you science and math!
Emergent properties: This improvement of AI through having a bigger model scale, cannot be predicted or expected merely by extrapolating the improvements achieved in smaller models as the model size grows. If you have studied complex system, you will know this concept called emergent properties. To understand this, you can probably imagine that if you train a smaller model that does basic language tasks like text generation and translation. However, when you increase the model scale, it all of the sudden can understand the text, like summarizing specific scientific papers or generating human-like text in creative writing.
Monte Carlo Experiments: "Drunken Sailor's" Random Walk
React demonstration for biased random walk. Video explanation:
I want to redefine why we learn. So far, most of us associate learning with assignments, grades, job searches, and promotions. Society views learning as a means to an end (survival, reputation, success, obligation). I think that's completely wrong. Learning is an experience worth pursuing, much like the way people go to an amusement park just for the experience.




