Cursor's Debug Mode Is Arguably Its Best Feature
Back in December, Cursor announced "Debug Mode". The way it works is fairly straightforward:
- User says "I want to fix this bug!!!"
- Cursor prompts the model to come up with several hypothesis for why the bug exists
- Cursor prompts the model to add a bunch of instrumentation (in the form of HTTP log-requests) to better understand the various hypothesis
- In certain programming languages and environments, Cursor writes to files instead of using HTTP logs
- Cursor spins up a HTTP server to listen to these "logs"
- Finally, Cursor asks the user to reproduce the bug
Then, as the user reproduces the bug manually, the Cursor agent can "listen in" and see which code paths are being followed, and it can also see the values for various variables and timestamps that it previously decided to track through the "logs".
(In the Debug Mode UI, it’s very easy for a user to say ”Cursor you’ve fixed my bug!” which will cause the agent to clean up all of its logs)
But all of this is done with HTTP-based logs, not with any "fancy" instrumentation such as trying to leverage LSPs or other classic debugger features. And that's why it is so genius — it works for basically any programming language, and on any environment provided the user can reproduce the bug locally or via Remote SSH.
I have used Debug Mode basically every day since it came out. And yes, my current role involves more bug fixing than what is probably the industry average, but I still think this feature puts Cursor a level above other AI coding agents.
You can technically do what Cursor does with other coding agents; and I know that there's some skills out there to facilitate doing this kind of thing with other products. But Cursor has it as a native feature in the product and with a great UI/UX to boot!
Anyway, this feature makes me constantly think about the future of coding with agents.
I routinely see people trying to fix bugs with AI without debug mode (99% of bugs fixed with LLMs in the world today are fixed without Cursor's Debug Mode). And sure, most of these are totally fine but also many of them are "sloppy" in the sense that models come up with mediocre fixes for bugs, and we merge those changes in because we're happy that the bug is fixed.
Cursor's "Debug Mode" could arguably have been called "Instrumentation Mode". All it does is make the agent aware of the actual runtime characteristics of the code by allowing it to instrument the code and listen to the different code execution paths. Moreover, this is all done with textual logs, and LLMs are really good at parsing text.
Because of this, the quality of the bug fixes that Cursor can come up with with Debug Mode is much higher than if it doesn't have any instrumentation at all. I have even used Cursor's Debug Mode to fix bugs that happen in frontend<>backend (client<>server) architectures where it'll add logs to both the frontend and the backend in order to understand the order of events that is causing a specific bug. It really does feel magical.
Another example of the prowess of LLMs with textual logs as context are the MCP servers for observability tools such as Datadog and Sentry. At Cursor, we use these a lot and they make the agent insanely more likely to root out bugs and fix them.
But how do we get this in the hands of more people?
Most of Cursor's users are not using Debug Mode. It requires that the user:
- Can reproduce the bug somehow
- Knows about the feature (by far, the hardest problem to solve here!)
- Decides to use it ahead of trying to fix a bug
The Cursor product can sometimes recommend people switch to "Debug Mode", but it is not very easy to guess when the product should do that or not.
This leads me to what I think the future looks more like...
What if coding agents were always instrumenting the code they're working on? And then we'd just have to remove these logs before sending a PR (or maybe even keep them in the code, but not make them visible to humans)? That would be very interesting to try out.
And what if the models themselves were trained to do this kind of thing more often? If we can get this kind of "debugging" behavior into the weights, then maybe we can get LLMs to rely more on humans for testing things, which will lead to them generating better code.
If you prompt it right, Cursor's Debug Mode can try to reproduce bugs on its own (e.g., if you have a npm run test command that can reproduce the bug). However, unfortunately, the problem with this type of feature is that it not only requires a human in the loop, but it also requires that the human is very engaged. So, for that reason alone, it'll be hard for a feature like this to find true "PMF" (Product Market Fit) — this is because I believe most code written from today onwards will be written by humans who are not particularly "engaged".
Anyway, one can dream, and as someone who loves coding agents and doesn't write a single line of code on my own anymore, I do want to be "engaged" most of the time. And I think features like Cursor's Debug Mode are extremely underused by most engineers today! So let's share the word.