The A* and D* Algorithms, A Programmer's Guide to Life Decisions

def action_value(action):
    return -hardness(action) + expected_regret(action)

best_action = min(possible_actions, key=action_value)

Recently I was listening to a Y Combinator podcast on how to make the most out of your 20s. They all agreed that you should tackle hard problems while you’re young. In fact they all encouraged people to “do the most hardcore thing early in your career, because you can always mellow out and take on less demanding work… but it’s very hard to go the other way around”. While I fundamentally agree with this, I think there’s more to it. More specifically, you need to factor in whether these difficult challenges you’re undertaking are going to be useful in your life. In other words, you want to minimise the regret of having taken such actions.

Anyways, this immediately reminds me of the A* algorithm.

For a given set of actions \(n\), the A* algorithm aims to minimise the function \(f(n) = g(n) + h(n)\), where \(g(n)\) represents the distance from the start node, and \(h(n)\) the remaining estimated distance to the goal according to a certain heuristic function \(h\). Coming back to the analogy, we could think of \(-g(n)\) as how hard an action is to complete, and \(h(n)\) would be the expected regret. In other words, our aim would be to minimise the combination of how easy an action is to complete, and how much we would regret taking such action. If you prefer to read code rather than equations, I have attached a pythonesque version of the equation at the top of this blog. Just like how the A* algorithm searches for the best path through a maze, we’re searching for the best actions through life, trying to maximize personal growth (through hard actions) while minimising future regrets. If we were to follow Y Combinator’s advice, this would be equivalent to using Dijktra’s algorithm. Solely doing hard things works, but it wastes time exploring directions that aren’t worth pursuing. If we only take the regret minimisation part that I added, this would be equivalent to using Greedy Best First Search, where we are just too conservative and just perform safe actions that we are guaranteed not to regret taking.

Coming back to the A* algorithm, here’s where it gets interesting. Unlike A*, we can’t possibly evaluate \(f(n)\) for every action available to us – there are infinitely many paths our lives could take at any moment, and we shouldn’t always sit and think for 10 minutes about what action to take next. In fact, given an infinite action space, it is theoretically impossible to evaluate the value function for all possible actions. Sometimes we need to decide quickly. Therefore, we need a tool to filter the action space in order to focus solely on the choices that actually matter. In the computer science literature, this is referred to as pruning. Pruning is a technique used to reduce search spaces by eliminating branches that are unlikely to lead to optimal solutions. A useful technique that helps people prune action branches is the Pareto Principle, also called the 80/20 rule. It states that roughly 80% of the results come from 20% of the actions. I personally don’t find thinking about this rule very useful, as evaluating whether a certain action belongs to the 20% that will lead to 80% of the results is itself an incredibly hard problem to solve. A tool which actually helps me prune actions is thinking about the opportunity cost of not taking the action. In other words, what am I giving up? I won’t expand much more on this, but I must reinforce that this doesn’t go against the “doing hard things” or “minimise regret” premises. In fact, I think it complements both. In terms of doing hard things, when you consider opportunity cost, you’re also evaluating what other challenging paths you might be giving up. In terms of regret minimisation, opportunity cost is more focused on “what else could I be doing instead?” as opposed to “will I regret doing this?”.

Another interesting aspect about thinking of life choices algorithmically is that, unlike the traditional scenarios where A* is applied, the value function of actions change over time. Putting a value on how hard an action is and how much we will regret doing it is very subjective, and constantly shifts as we grow and change. What seems incredibly hard today might feel very natural tomorrow, and what we think we’ll regret often positively surprises us. This is where Dynamic A, or D, comes in. While A* assumes a static environment where all costs are known upfront, D* was specifically designed for scenarios where new information is discovered during execution, and only updates the relevant parts of the path. This is much like how we discover new opportunities and challenges as we walk through life and don’t need to completely reinvent ourselves.

What makes A* , D* , and life decisions even more interesting is that we’re always working with approximations and incomplete information. In pathfinding algorithms, the heuristic function \(h(n)\) must be “admissible” - meaning it should never overestimate the true cost to the goal. But in life, our estimates of both hardness and regret are rarely so well-behaved. We try to make educated guesses about how difficult a career change might be, or how much we might regret not taking a certain opportunity, but these are inherently imperfect approximations. Maybe a better algorithm to use would be D* Lite (a popular variant of D*), which handles uncertainty by continuously updating its value function as new information becomes available. We too must regularly update our estimates of life’s “costs” and “benefits” as we gain experience. What we thought would definitely be a minor career pivot may turn out to be a major challenge, or what may a priori seem like a risky decision may prove to be less scary than anticipated. The key insight here is that while we can’t 100% eliminate this uncertainty - just as no pathfinding algorithm can perfectly predict unknown terrain - we can and must build systems that gracefully handle and adapt to changing environments. This is perhaps where the analogy between algorithms and life decisions is strongest: success doesn’t come from having perfect information, but from having robust methods for handling imperfect information and updating our beliefs as we go.

Enjoy Reading This Article?