Using OpenAI's Codex to add a real feature to a real SaaS
Follow along as I use Codex to add a feature to my deep work timer app (and compare it to Cursor!)
Not long ago I shared a deep dive of how I used AI tools to build a simple SaaS app. Well, that app has been chugging along with for a few weeks with a very simple feature set.
Since I use Deep Focus Timer everyday, there’s one thing I’ve found lacking.
You have to pay attention to know when the timer ends.
So I decided I want to add two things:
The app should play a ding to audibly indicate when the timer ends
The app should show a simple browser notification to indicate when the timer ends
I figured this was a good opportunity to try out Codex!
What’s the point?
I’m a big believer that AI isn’t going to replace software engineers, but it is rapidly changing the profession.
I write this newsletter for other software engineers and tinkerers who are looking to ship more code in the pursuit of earning more at their job or simply launching a side project.
I write a lot about tools like Cursor, and I’m always experimenting with how AI tools can generate real economic value.
If you want to use AI to build better software (and earn more money), you’re in the right spot. If you subscribe you can expect an article like this one in your inbox roughly twice a month (paid members get 2x the content and deeper dives)
What is Codex?
Codex was recently released in research preview by OpenAI.
Here’s how OpenAI describes it:
Codex is a cloud-based software engineering agent that can work on many tasks in parallel. Codex can perform tasks for you such as writing features, answering questions about your codebase, fixing bugs, and proposing pull requests for review; each task runs in its own cloud sandbox environment, preloaded with your repository.
I wasn’t too eager to try it out because I really like still having my hand in the codebase, but building such a discrete feature like the one described above seemed like a really good chance to see what they hype is about.
Getting Codex set up
I’m a ChatGPT Plus subscriber already, so I get some “generous” amount of usage for free. Interestingly all I had to do was enable MFA on my ChatGPT account and then use OAuth to connect to my GitHub account.
From there I selected my repository and created a new environment.
Asking Codex to build my feature
I’m a little curious how well this is going to work, especially because I haven’t provided any secret keys that will likely be needed to “manually test” the application.
I’m starting by providing some basic and explicit instructions:
As soon as I hit “Code”, the task went into what looks like a queue as the only item. Interesting enough, I also got a notification on my phone indicating the task had started, much like I do for Deep Research tasks.
I clicked into the task to see that the agent was searching around the codebase.
Now I gather that the big unlock with using an async agent like this is you don’t have to babysit, so I left to go eat some lunch.
When I came back, I was greeted with a summary and a diff. I’m not really a typescript guy so I’m going to have to run this to test it out.
Creating pull request failed initially, presuambly because the diff includes .wav file :(
It did however give me an option to “copy for git apply”, so I did that, checkout out a new branch locally, then pasted. That seemed to do the trick.
Debugging Codex’s work
Funny enough, it didn’t work. Like it didn’t work at all.
I ran the timer on localhost and it when the time ran out I got neither a notification nor a ding. The JS console gave me a useful error on the notification front, so I reprompted in the existing Codex session with:
Neither the ding sound nor the notification work when the timer hits 0. The js console did however give this error: "The Notification permission may only be requested from inside a short running user-generated event handler." Fix both please.
Watching the agent start digging, I noticed it was using npm while my project uses yarn, so I killed the execution early and clarified. Even still, the agent didn’t seem to be too concerned with getting the feature working but instead seemed to care a lot about linting. wtf??
I think that the agent is somewhat handicapped by not being able to truly run the app and “use” it. But I’m scratching my head here because Cursor’s agent doesn’t “use” the app either. You can tell that it doesn’t really know how to test what’s going on 🤦♂️
This is somewhat solveable (more on that later), but I expected better performance out of the box.
I applied the next set of changes locally, but still no luck. In 3 attempts, Codex failed at both adding a browser notification and the sound for the timer.
Fixing it with Cursor
At this point I wanted to see if it was just a skill issue on my part, so I dropped all the changes, downloaded a royalty-free “ding” sound, and dropped the same prompt into Cursor with the model set to o3 (to be fair!)
It made some changes, then I could tell the browser console still had the notification permissions error, so I pasted that into the Cursor chat. It followed up with a small change, and I could tell right away that we were making progress!
I granted the permission, then waited out the 5 minute timer. When the timer ran out, then ding played but no notification! Still, so much more progress in so much less time than using Codex.
Getting even more out of Codex
To be honest, I’m not super impressed with this first use. I think Cursor’s agent with cleary outperforms on this task.
There are certainly some things I expect to do next to make working with agentic tools like this easier. The first is writing a proper test suite. I think one of the best ways to improve your experience coding with agents is to give them something they can use in a feedback loop to see if they’re making progress.
The next is proper linting, which I thought I already had setup.
Finally, I really want to get both the tests and linter running in CI (maybe just GitHub actions) so that I can fire off agents like Codex and open PRs with confidence.
What do you think? Have you used Codex for anything interesting?