About

I'm Sam, currently an MS student studying Natural Language Processing (NLP) at UC Santa Cruz. Before that, I was a backend software engineer working in regenerative agriculture and electric micromobility at Regrow, Indigo, and Bird.

As I've been learning over the last year, I've kept the habit of building an Obsidian knowledge base to organize concepts, but I never got around to crystallizing my thoughts into formal blog posts. Now that I'm at UCSC, I wanted this site to be somewhere I could point classmates to for more polished explanations of topics we cover in class. I want the site to be both a consistently-improving resource for myself and others when it comes to evergreen topics, as well as a place to share interesting and high-quality content as I come across it. Lastly, I hope that it serves as evidence of passion when interviewing for internships or full-time roles.

Why did I pick this name for a blog? In neural scaling laws, compute-optimality refers to the optimal allocation of resources across model size, dataset size, and training compute (FLOPs). If your language model is too small relative to your compute budget, you're leaving performance on the table; if it's too large, you won't have enough compute to train it properly. If the dataset is too small, you'll overfit, and if it's too large, you waste compute on unnecessary training examples. There's a narrow band of model and dataset sizes that maximizes model performance, given a compute budget. I do think there's some sort of analogy that can be drawn between the concept of compute-optimality and the pursuit of self-education -- just like training runs often have a fixed compute budget, we have finite time and mental energy for learning. And just like compute-optimal training, effective learning requires balancing multiple factors: What should we prioritize learning first? How much time do we spend building intuition? How deeply should we learn a concept? How much can we reasonably have on our plate while still making forward progress? How do we retain what we've learned? It's tricky -- the answer to these questions is a personal and ever-moving target as our knowledge grows and interests shift. I hope that the creation of this blog will help me on my journey to an optimal learning outcome, and that it can be a useful resource for others on theirs.

I've been helped along my journey so far by the work of too many people to name, but I'd like to highlight a few individuals who've created educational content that I've found particularly helpful. I hope that my own writing can one day produce even a fraction of the value for others that these folks have provided to me:

Andrej Karpathy on Youtube
Lilian Wang at Lil'Log
Nate Lambert at Interconnects and The Retort
Tim and others at Machine Learning Street Talk
Swyx and Alessio at Latent Space
Jeremy Howard and Rachel Thomas at Fast.ai
Grant Sanderson at 3Blue1Brown
Josh Starmer at StatQuest
Chris Manning at Stanford's CS 224N
Chris Potts at Stanford's CS 224U
Samuel Albanie on Youtube
Letitia Parcalabescu at AI Coffee Break with Letitia
The lovely people at Cohere's C4AI
Ritvik Kharkar at Ritvikmath
Cameron Wolfe at Deep (Learning) Focus
Andrew Ng at Stanford and elsewhere
Bai Li at EfficientNLP
David Silver on Youtube
Sasha Rush on Youtube
Sebastian Rashka's Blog
Steve Brunton on Youtube
DJ Rich at Mutual Information
Luis Serrano at Serrano Academy
And many more who have undoubtedly contributed to my understanding of the world, in ways too numerous to list!