This is a tangential rant about things I didn't talk about in the main post to keep it on-topic.

First, a positive note

Before we begin, I want to highlight how nice OpenAI's free token program is. If you share your API traffic with them, they'll give you 2.5 million tokens per day for their smaller model tiers (I used gpt-5-mini for the duration of the blockworld sims). I needed a LOT of sims to get any kind of generalizable result for the post, so after spreading this out across a few days this program saved me about $20 in tokens and I spent less than $0.50 in total. Very nice!

OpenAI's API typing

The typing system is bad. I was originally going to write something about how I bet the folks behind AWS's API finally got fired but fell upwards into a job writing OpenAI's API, but I did some investigating and noticed this gem at the top of the files for the Python API:

api.py

# File generated from our OpenAPI spec by Stainless

Ah! Someone to blame! OpenAI outsourced their API SDKs to some codegen company that turns API specs into multi-language SDKs. It's a neat idea, but an autogenerated solution will always suffer from being a one-size-fits-all approximation of a reasonable API.

I could give a one-side opinion about why having a module called openai.types.shared is a fundamentally bad idea, but this story tells it better: there's so much imported autogenerated bloat in the API typings that my language server can't keep up with it. When I open a file importing OpenAI's API, I'll frequently see the editor freak out about the API calls until I trigger some re-analysis of the file with a manual save. Nuts.

Stainless has a featured quote on their website about how much OpenAI loves using their product to generate their SDKs, so I suspect this won't be changing anytime soon.

OpenAI's function tool

By far my biggest struggle with making my Blockworld demo was figuring out how to expose an instance method (any method defined within a class) to their API. After far too much tinkering, I'm convinced it can't be done without bypassing the function_tool helper they provide entirely. When OpenAI's function_tool is used, it generates a schema for their LLM to invoke your function based on the name of the function, its arguments, and its docstring. This is a pretty straightforward idea that I implemented myself (against their Beta API) when I worked on the Wizaidry demo. But time has passed and they have a shiny new API and a helper that promises to make it incredibly simple to have an LLM invoke your Python functions locally. But it can't handle an instance method.

The issue is that it reads the self argument and tries to add it to the schema, which isn't a primitive object the LLM can create. And for the life of me I could not figure out how to make this tool ignore the self parameter or bind it to the callable their API invokes. It just doesn't work.

Here's some ideas I exhausted:

Passing an instance method anyways and hoping the OpenAI filters it on their end.
Using a lambda function. Even if you specify a name or spec override, the introspection will attempt to read the function name anyways, so an anonymous function fails.
For similar reasons, a functools.partial fails, as it produces a callable but not a first-class function.

Now you can, at LLM invocation time, bind some context for tool invocations to reference, so I ended up in a pattern where my function tools were all defined after the class definitions and grossly access some private attributes of the class:

open_loop_llm.py

class OpenLoopPlanner:
  ...

@agents.function_tool
def create_plan(
  clz: agents.RunContextWrapper[OpenLoopPlanner], plan: list[environment.Action]
) -> None:
  """Create a plan of actions to sort the blocks.

  Args:
      plan: A list of actions to be taken.
  """
  clz.context._plan = plan

In my view, binding an instance method is basic functionality that should absolutely work. But reading through their docs and examples - again more time than I wanted to spend on this - it just isn't designed for that.

Prompt hacking

I mentioned in the main post about wanting to create short, simple, human-readable prompts. Because the counterexample is so long, I omitted it from the post. But here's what I mean about a lot of LLM-based planning methods relying on ridiculous prompt hacking:

prompt_hacking

## Section 4: COMBINATION GENERATION
### Step1 :  objnode - CENTRIC COMBINATION ENGINE
- Generate up to k object - only combinations from the pool of valid objects .
- No target_area assignments yet .
- Each combination must contain no more than 4 objects .
- Prioritize logical groupings and auto - prune duplicates or redundant patterns .
### Step2 : POST - HOC TARGET_  area ASSIGNMENT
- For each combination from Step1 :
1. Per - object resolution
- Select the highest - validity target_area option ( per Section 2) .
2. Cross - combination locking
- The first assignment chosen for an object -> target_area locks that mapping .
- Subsequent combinations must reuse the same mapping .
### Step3 : CROSS - MATRIX VALIDATION
- Consistency Audit
- Check that every object consistently uses the same target_area in all generated combinations .
- Failure Modes
- If any target_area mismatch is detected , remove all conflicting combinations .
- If an object conflict arises , revisit Section 4 step 1 with penalty weighting .
#### Final Safeguards ( Section 4)
1. Sequential Locking Protocol
- The first valid combination ’ s object - > target_area assignments bind subsequent combinations .
2. Retroactive Consistency
- Any new combinations must respect existing locked mappings .
3. Combination Quarantine
- Combinations involving any unvalidated object - target pair are kept aside until validated .
#### Example ( Section 4)
[detailed example continues]

Can you imagine reading four sections of that and finding an answer that meets all of the constraints? If we wouldn't give it a human, why give it to an AI and expect good results when the alternative is to simplify the prompt?

Blog.
Over the Clif

Rant: Oh, you blockhead!

First, a positive note

OpenAI's API typing

OpenAI's function tool

Prompt hacking