How does this differ from cursor cloud agents where I can hook up MCPs, etc and even launch the agent in my own cloud to connect directly to internal hosts like dbs?
Thanks. Yeah, Cursor / Claude code + MCP is powerful. We differentiate on two fronts, mainly:
1) Greater accuracy with our specialized tools: Most MCP tools allow agents to query data, or run *ql queries - this overwhelms context windows given the scale of telemetry data. Raw data is also not great for reasoning - we’ve designed our tools to ensure that models get data in the right format, enriched with statistical summaries, baselines, and correlation data, so LLMs can focus on reasoning.
2) Product UX: You’ll also find that text based outputs from general purpose agents are not sufficient for this task - our notebook UX offers a great way to visualize the underlying data so you can review and build trust with the AI.
To be clear, are the main differentiators basically better built-in MCPs and better UX? Not knocking just trying to understand the differences.
I have had incredible success debugging issues by just hooking up Datadog MCP and giving agents access to it. Claude/cursor don't seem to have any issues pulling in the raw data they need in amounts that don't overload their context.
Do you consider this a tool to be used in addition to something like cursor cloud agents or to replace it?
For the debugging workflow you described, we would be a standalone replacement for cursor or other agents. We don't yet write code so can't replace your cursor agents entirely.
Re: diffentiation - yes, faster, more accurate and more consistent. Partially because of better tools and UX, and partially because we anchor on runbooks. On-call engineers can quickly map out that the AI ran so-and-so steps, and here's what it found for each, and here's the time series graph that supports this.
Interesting that you have had great success with Datadog MCP. Do you mainly look at logs?
heh, I was just about to post the following on your previous comment re: reproducible benchmark results. Thanks for posting the blog.
With the docker images that we offer, in theory, people can re-run the benchmark themselves with our agent. But we should document and make that easier.
At the end of it, you really would have to evaluate on your own production alerts. Hopefully the easy install + set up helps.
How does this differ from cursor cloud agents where I can hook up MCPs, etc and even launch the agent in my own cloud to connect directly to internal hosts like dbs?
Thanks. Yeah, Cursor / Claude code + MCP is powerful. We differentiate on two fronts, mainly:
1) Greater accuracy with our specialized tools: Most MCP tools allow agents to query data, or run *ql queries - this overwhelms context windows given the scale of telemetry data. Raw data is also not great for reasoning - we’ve designed our tools to ensure that models get data in the right format, enriched with statistical summaries, baselines, and correlation data, so LLMs can focus on reasoning.
2) Product UX: You’ll also find that text based outputs from general purpose agents are not sufficient for this task - our notebook UX offers a great way to visualize the underlying data so you can review and build trust with the AI.
To be clear, are the main differentiators basically better built-in MCPs and better UX? Not knocking just trying to understand the differences.
I have had incredible success debugging issues by just hooking up Datadog MCP and giving agents access to it. Claude/cursor don't seem to have any issues pulling in the raw data they need in amounts that don't overload their context.
Do you consider this a tool to be used in addition to something like cursor cloud agents or to replace it?
For the debugging workflow you described, we would be a standalone replacement for cursor or other agents. We don't yet write code so can't replace your cursor agents entirely.
Re: diffentiation - yes, faster, more accurate and more consistent. Partially because of better tools and UX, and partially because we anchor on runbooks. On-call engineers can quickly map out that the AI ran so-and-so steps, and here's what it found for each, and here's the time series graph that supports this.
Interesting that you have had great success with Datadog MCP. Do you mainly look at logs?
They claim a 12% lead (from 36% to 48%) over Opus 4.6 in a RCA benchmark: https://www.relvy.ai/blog/relvy-improves-claude-accuracy-by-...
heh, I was just about to post the following on your previous comment re: reproducible benchmark results. Thanks for posting the blog.
With the docker images that we offer, in theory, people can re-run the benchmark themselves with our agent. But we should document and make that easier.
At the end of it, you really would have to evaluate on your own production alerts. Hopefully the easy install + set up helps.
Woohoo!!! Congrats on the big launch y'all
Congrats on the launch! I dig the concept, seems like a good tool :)
Thank you :)