About
I'm Sam, currently an MS student studying Natural Language Processing (NLP) at UC Santa Cruz. Before that, I was a backend software engineer working in regenerative agriculture and electric micromobility at Regrow, Indigo, and Bird.
As I've been learning over the last year, I've kept the habit of building an Obsidian knowledge base to organize concepts, but I never got around to crystallizing my thoughts into formal blog posts. Now that I'm at UCSC, I wanted this site to be somewhere I could point classmates to for more polished explanations of topics we cover in class. I want the site to be both a consistently-improving resource for myself and others when it comes to evergreen topics, as well as a place to share interesting and high-quality content as I come across it. Lastly, I hope that it serves as evidence of passion when interviewing for internships or full-time roles.
Why did I pick this name for a blog? In neural scaling laws, compute-optimality refers to the optimal allocation of resources across model size, dataset size, and training compute (FLOPs). If your language model is too small relative to your compute budget, you're leaving performance on the table; if it's too large, you won't have enough compute to train it properly. If the dataset is too small, you'll overfit, and if it's too large, you waste compute on unnecessary training examples. There's a narrow band of model and dataset sizes that maximizes model performance, given a compute budget. I do think there's some sort of analogy that can be drawn between the concept of compute-optimality and the pursuit of self-education -- just like training runs often have a fixed compute budget, we have finite time and mental energy for learning. And just like compute-optimal training, effective learning requires balancing multiple factors: What should we prioritize learning first? How much time do we spend building intuition? How deeply should we learn a concept? How much can we reasonably have on our plate while still making forward progress? How do we retain what we've learned? It's tricky -- the answer to these questions is a personal and ever-moving target as our knowledge grows and interests shift. I hope that the creation of this blog will help me on my journey to an optimal learning outcome, and that it can be a useful resource for others on theirs.
I've been helped along my journey so far by the work of too many people to name, but I'd like to highlight a few individuals who've created educational content that I've found particularly helpful. I hope that my own writing can one day produce even a fraction of the value for others that these folks have provided to me:
- Andrej Karpathy on Youtube
- Lilian Wang at Lil'Log
- Nate Lambert at Interconnects and The Retort
- Tim and others at Machine Learning Street Talk
- Swyx and Alessio at Latent Space
- Jeremy Howard and Rachel Thomas at Fast.ai
- Grant Sanderson at 3Blue1Brown
- Josh Starmer at StatQuest
- Chris Manning at Stanford's CS 224N
- Chris Potts at Stanford's CS 224U
- Samuel Albanie on Youtube
- Letitia Parcalabescu at AI Coffee Break with Letitia
- The lovely people at Cohere's C4AI
- Ritvik Kharkar at Ritvikmath
- Cameron Wolfe at Deep (Learning) Focus
- Andrew Ng at Stanford and elsewhere
- Bai Li at EfficientNLP
- David Silver on Youtube
- Sasha Rush on Youtube
- Sebastian Rashka's Blog
- Steve Brunton on Youtube
- DJ Rich at Mutual Information
- Luis Serrano at Serrano Academy
- And many more who have undoubtedly contributed to my understanding of the world, in ways too numerous to list!