Cursor Introduces Composer 2.5

throwaw12 · 2026-05-19T07:57:24 1779177444

> Composer 2.5 is built on the same open-source checkpoint as Composer 2, Moonshot's Kimi K2.5.

Really nice to see they're giving credit to the company and I am optimistic Kimi K open models soon will outperform Opus models

vessenes · 2026-05-19T10:59:55 1779188395

Sounds like it's the last Kimi-line model at Cursor? As expected they say they'll be training a larger model on the SpaceX infrastructure, or have already started most likely.

I'm very curious to read about the Composer 3 architecture when it comes out. More frontier coding models are a good thing, especially if they diversify into different strengths/weaknesses.

bfeynman · 2026-05-19T16:15:14 1779207314

That only seems plausible if whatever corpse of xAI is around is giving them engineering time. I don't know if they hired a bunch of ex frontier lab staff but its unlikely they have the technical capability to train their own frontier models especially the pretraining. Because the thing is if its not competitive with claude/codex it will be panned.

vessenes · 2026-05-19T22:00:03 1779228003

Hmm, I read the situation a little differently. Grok is not a slouchy model. It’s not the best, but it’s not the worst. X currently has one source of proprietary data, Twitter, and grok is by far the best at all the things you might imagine there - today’s zeitgeist, who’s saying what, current news, etc.

Cursor adds in a large corpus of proprietary coding data — I think this is actually fairly hard to acquire right now, because claude and codex are so good.

I bet there’s enough talent at the Grok team to work with the cursor team and data to get something good out the door.

That said, I don’t track Grok’s engineering leads — I’m not sure who’s currently around, and who is not.

ccimmergreen · 2026-05-20T00:09:24 1779235764

Unlikely, given that large swathes of talent have already left xAI, ostensibly due to poor leadership management. Simply throwing money in to build the biggest datacenters in the world doesn't do much good without bright minds to back it up. https://www.fastcompany.com/91531084/inside-the-xai-exodus

vessenes · 2026-05-20T11:48:02 1779277682

Be careful taking the headlines at face value - that list of people leaving was mostly product and redundant senior execs to my eyes, post spacex merger. You’d expect those folks to be asked to leave as part of a re-org in any event. I don’t think it’s dispositive one way or the other on the tech org.

ccimmergreen · 2026-05-21T00:58:12 1779325092

You are wrong, they were not redundant execs.

They were world-class senior developers and AI engineers most renowned in the AI research communities (e.g. Jimmy Ba the legend, Christian Szegedy, Igor Babuschkin, Greg Yang), poached from other companies to join xAI and they were getting very high salaries.

The mass exodus has been happening way before spacex merger though.

vessenes · 2026-05-21T10:23:12 1779358992

Interesting. Agreed that’s a significant list.

Post model 3 launch, Tesla had a number of senior folks leave almost immediately. My read at that time was they had hit or exceeded pareto-optimal on the suffering:wealth scale —- Tesla was clearly going to make it, and they had already vested 90% of the value they’d receive from Tesla ownership: why go suffer through the massive build out?

And in fact, in that era, Tesla did bring in a bunch of auto industry types to help scale, who as it happens also certainly did very well, but order of magnitude less well than the early peeps.

There might be some similar economics here: change of control will often fully vest early founders. Combined with incoming SX IPO, these guys are done financially — as in, already multibillionaires pre-IPO. You’d have to want to stay and the company would have to really want you to stay as well before it made economic sense to re-up.

People say a lot of things about working for Elon; things like “hardest work I ever did,” and “he made me extremely rich”, but you don’t read “that was easy” very often.

I have no idea if there’s enough talent right now at xAI to go build a foundation model, but in the immortal worlds of Carl Icahn: “don’t bet against Elon”

zxspectrum1982 · 2026-05-20T07:17:46 1779261466

There's been also a lot of good talent joining xAI lately.

scosman · 2026-05-19T12:01:21 1779192081

> I am optimistic Kimi K open models soon will outperform Opus models

Hard to outperform the model you distill...

nl · 2026-05-19T12:53:27 1779195207

Most of the performance on coding comes from RL, not distillation.

Distillation helps with world knowledge and things like that.

Bolwin · 2026-05-19T21:01:54 1779224514

They're not distilled. Stop spreading anthropics misuse of the term.

They do use it for synthetic data/judging though, so yes, hard to outperform.

Not that they need to. If they can basically match it for a fifth of the price.

intrasight · 2026-05-19T12:17:28 1779193048

Is that true? If the distillation is not lossy and the model runs much faster due to less resource consumption, then it may outperform.

mwigdahl · 2026-05-19T12:20:58 1779193258

One of those conditionals is a pretty huge assumption.

intrasight · 2026-05-19T19:07:05 1779217625

It's an assumption and it can be tested

howdareme9 · 2026-05-19T08:34:33 1779179673

Only because last time they tried to hide it lol

trymas · 2026-05-19T10:40:56 1779187256

Yes and if I remember the drama correctly - Kimi's license or terms of use says that for commercial use cases (or was it user count?) - you must declare credit to Moonshot and Kimi.

Lennie · 2026-05-19T10:56:24 1779188184

It's important to mention: they were compliant, because they trained the model at an AI hosting provider that had a partnership with Moonshot AI, but Moonshot didn't know Cursor was a customer.

Aurornis · 2026-05-19T14:07:34 1779199654

This was misinformed Twitter and Reddit drama.

They had properly licensed it and were complying with the terms of the license.

davidatbu · 2026-05-19T17:29:32 1779211772

Note that something that helped the misinformation was that, on Twitter, there were Kimi employees expressing their surprise that the base model was Kimi K2.5, and their indignation that Cursor didn't credit Kimi. They later deleted their tweets (what I infer from that is that some employees were not aware of some pre-existing agreement or understanding between Cursor and Kimi until the drama happened).

maxdo · 2026-05-19T10:59:53 1779188393

How can distilled opus become better than original? There are numbers of reports including anthropic that kimi team was participating in fraudulent activities

throwa356262 · 2026-05-19T11:41:51 1779190911

Do we know the "fraudulent " requests really came from moonshot engineers and was not QA team running a ton of benchmarks against other models?

I feel distilling something as big as Opus would require many many more samples, but I dont really know much about this subject

maxdo · 2026-05-19T16:40:39 1779208839

sure, sounds like QA lol

Scale: Over 3.4 million exchanges

The operation targeted:

Agentic reasoning and tool use Coding and data analysis Computer-use agent development Computer vision Moonshot (Kimi models) employed hundreds of fraudulent accounts spanning multiple access pathways. Varied account types made the campaign harder to detect as a coordinated operation. We attributed the campaign through request metadata, which matched the public profiles of senior Moonshot staff. In a later phase, Moonshot used a more targeted approach, attempting to extract and reconstruct Claude’s reasoning traces.

ta20240528 · 2026-05-19T17:06:16 1779210376

And when you here unsubstantiated rumours* that say Anthropic has been sending exchanges to say Alibaba's Qwen, will you als oconclude the same about the entire US AI industry?

I doubt it.

* publish the logs.

ifwinterco · 2026-05-19T17:21:05 1779211265

Even if it's true, it's not like US AI companies can complain, given their entire business is based on ripping off text without attribution

maxdo · 2026-05-20T03:04:53 1779246293

chinese ai is not doing the same? or they don't parse?

they do except they also send thousands of sex-spies to do espionage of this kind on the scale.

ifwinterco · 2026-05-20T05:08:49 1779253729

Of course they’re also doing this, my point is this is a grubby business where ethics went out of the window a long time ago.

If you’re playing this game in 2026 you know the rules - anything goes

ta20240528 · 2026-05-22T09:10:47 1779441047

"they also send thousands of sex-spies"

Could they send one (or two) my way?

goyozi · 2026-05-19T06:40:12 1779172812

I kind of want to try it, to see if and how far they can take an open model and improve it but I really don’t miss the Cursor user experience. Constant UI changes, half-baked features, smaller and smaller limits, useless AI change attribution; I think I’ll wait for others to report if it’s any good.

whywhywhywhy · 2026-05-19T09:17:14 1779182234

Noticed recently they keep opening their “Agents” window when the project was last opened in the VSCode fork window in the hopes I’ll just continue working in that when the UI is totally different and missing things I need.

For a professional tool it’s getting egregious how little respect they have for my workflows and flow state they way they keep moving, changing iconography and flipping switches of the UI.

It’s clearly being ran by someone who comes from a social app or sales app growth hacking background.

znpy · 2026-05-20T09:40:12 1779270012

> It’s clearly being ran by someone who comes from a social app or sales app growth hacking background.

I fixed that by using cursor the agent but not the UI.

I'm just running cursor in GNU Emacs via agent-shell (https://github.com/xenodium/agent-shell). Their cli client (aptly named "agent") supports ACP (agent client protocol) so the UI can be skipped altogether.

I know this sounds like a meme ("use x in emacs") but at this point at the very least i can keep my workflows and my UI all the same and focus on my work rather than "where did $company put $feature this month".

dmix · 2026-05-19T12:10:07 1779192607

I’ve personally never experienced that issue with Cursor. I never use the agents window and it always shows me the editor.

whywhywhywhy · 2026-05-19T15:54:27 1779206067

You're not in the A/B test. I've never opened the agents window consensually.

SebastianKra · 2026-05-19T17:52:32 1779213152

It seems obvious that they plan to eventually drop VSCode. I'd be willing to take them up on that offer. Their agent window is genuinely better as a starting point.

What annoys me is how little they want to integrate with ...anything. Wanna open a link in your default browser? Use our built-in chromium fork, we insist. Wanna open a location in Zed? No, please use our half-baked editor re-implementation. Wanna open a location in Cursors own vscode-based editor? You can't. Managed to work around that somehow? We changed your files to "Worktree TS", disabling all your language servers. It's like programming on an iPhone.

rubyn00bie · 2026-05-19T07:56:08 1779177368

Damn do I feel the UI changes being a pain point.

It’s a near constant regression in my workflows. “Multiple agents” got destroyed recently, and the new interface for it some sort of command isn’t as good or reliable. Then you’ve got modals everywhere[1] and truncated bits (like long branch names) that make it insanely frustrating to use.

They’re constantly changing the UI without actually improving it at all. I’ll likely cancel it and use opencode for personal stuff with Deepseek and only use it at work because I have to. There was a time when I appreciated the harness but it’s becoming less useful, or at least noticeable, over time… all the while the actual UI becomes substantially more painful and awkward to use (like @ in the “agents” window being completely unable to find a file because it’s some sort of “global” scope).

One thing that surprises me about this whole segment is that JetBrains haven’t eaten these folks lunch. Their IDEs are leagues better than VSCode but their AI integration is awful by comparison (and the bar is low). I can’t even see how much of the context window I have left.

[1] it’s insane I have to answer questions in a tiny input box I cannot resize or adjust the size of. Let alone the fact the text area I input prompts into cannot be resized. Truly feels like the UI/UX is done by people without any experience.

animuchan · 2026-05-19T09:51:48 1779184308

> Truly feels like the UI/UX is done by people

To me it feels like it's done entirely by an LLM, starting from the product vision.

omederos · 2026-05-19T14:06:22 1779199582

Use their cli?

https://cursor.com/docs/cli/installation

znpy · 2026-05-20T09:40:55 1779270055

I use it via the gnu emacs integration :P

https://github.com/xenodium/agent-shell

kilroy123 · 2026-05-19T11:10:17 1779189017

I 100% agree. It's soooo buggy.

I gave up, canceled my plan, and went back to boring old VSCode. It feels so much more stable, and my Mac no longer runs out of memory. With cursor I had to reboot my macbook several times a week and had to always be plugged in.

smnscu · 2026-05-19T16:06:21 1779206781

That's me with Google Antigravity. Switching back to vscode was such a breath of fresh air. Porting over my (extensive) settings/extensions/keyboard shortcuts was extremely easy too (just ask the agent to do it), and now I can use both Copilot models and Claude Code easily. More to your point though, the speed and stability is incomparable. I can't remember having many issues with Cursor last year when I used it at my last job, but still, vscode has been surprisingly pleasant for agentic use.

tomasz-tomczyk · 2026-05-19T09:15:06 1779182106

Yeah I have a soft spot for Cursor because it was my first tool that unlocked huge productivity with AI, but I avoid doing anything there now.

Should try their CLI!

Aurornis · 2026-05-19T16:04:41 1779206681

I try it from time to time and feel the same way. Some people I know really like it but I can’t tell if that’s because it’s good or just because it’s what they’ve become familiar with and they don’t like to change tools. Cursor had a good head start and a lot of early PR.

epolanski · 2026-05-19T07:48:53 1779176933

Good point.

One of the things I've came to appreciate about the cli tools like Codex or Claude is that the interface is so limited that every feature they release is still limited and constrained to the same UX limitations, whereas those "funkier" IDEs change from month to month giving me further fatigue.

fjdjshsh · 2026-05-19T18:19:36 1779214776

I've had good experiences with Cursor so far and it's my main IDE. I've noticed some UI changes, but I've switched fast and they didn't bug me

indiantinker · 2026-05-19T11:36:14 1779190574

I agree. I quit cursor and replaced it with conductor and a mix of Claude Code / Codex/ Copilot and i dont miss it as such. Maybe one day I will come back.

ttouch · 2026-05-19T10:36:50 1779187010

you can use either the cursor cli and/or zed editor with cursor as the underlying provider with ACP (agent context protocol)

presentation · 2026-05-19T11:19:32 1779189572

Tried that, it just seemed way dumber this way unfortunately. And the zed UI provided 0 visibility whenever it was doing tool calls, and for some reason it kept running sleep 30 calls because it couldn’t figure out how to see the results of its own tool calls for some reason.

jstummbillig · 2026-05-19T07:54:59 1779177299

Isn't there a cli version of cursor by now?

yourboirusty · 2026-05-19T09:46:17 1779183977

It's a bit better than the VSCode fork, but still much worse than competition:

- lags constantly,

- if you type while it's generating you'll get missed inputs,

- 'plan mode' doesn't clear context before starting work,

- you can't directly edit the plan, you can only ask the bot to do it,

- you can't immediately whitelist commands, only accept once or allow all.

vorticalbox · 2026-05-19T08:08:57 1779178137

Yes

https://cursor.com/cli

asar · 2026-05-18T17:40:06 1779126006

The model is (like Composer 2) based on Kimi K2.5 and they claim SOTA performance for 1/10th of the cost. The tweet also mentions that they've started a new model from scratch on Colossus 2 (xAI/SpaceX Cluster). Really impressive how they've made this jump from being called the vscode fork with no moat just a couple of months ago.

onlyrealcuzzo · 2026-05-18T18:47:31 1779130051

> Really impressive how they've made this jump from being called the vscode fork with no moat just a couple of months ago.

Impressive, yes. But they still don't have a moat...

infecto · 2026-05-18T22:26:04 1779143164

I am not sure we should dismiss what they have today. Nobody has yet to come close with a full package ide that works well for coding. Is that not a moat? It is easy for my to in my head discount it, thinking that I could build something myself but between autocomplete and their workflow for agent use, it feels like they have some tangible moat emerging.

virgilp · 2026-05-19T08:49:59 1779180599

If we ignore cost (which is kinda hard to ignore), I feel Codex kinda' does it for me. Sure it's not really an editor but I find I don't need that _that much_ and it's easy to launch an external editor (they actually have the feature).

The ironic thing is that half a year ago, after trying factory.ai I thought chat-first interface was a stupid idea that will never work.

chillfox · 2026-05-19T08:27:36 1779179256

Have you tried Zed?

I haven’t tried Cursor, so don’t know how they compare, but I like Zed a lot.

Anyway, would love to see a comparison from someone who has used a recent version of each.

turastory · 2026-05-19T08:49:26 1779180566

A few years ago I tried Zed when it was still pretty early, but eventually settled on Cursor. I gave Zed another shot a few days ago because Cursor’s worktree support still feels pretty weak.

In my setup I use multiple agents like Claude Code and Codex, and Zed’s ACP support makes it pretty nice to manage them all as “threads” in one place. Worktree switching also feels much smoother.

Overall the experience was pretty good, but the way the agent and editor are integrated still feels a bit lacking, and tab completion is the big one for me. Cursor’s tab completion is still the best I’ve used.

So now I’m using both. For work that needs a lot of focus and careful iteration, I use Cursor. For things that are easy to split into worktrees and hand off to agents, I use Zed with Claude/Codex.

chillfox · 2026-05-19T14:43:50 1779201830

Interesting, is it that the tab completion is giving better results, or how it works is better?

ramses0 · 2026-05-19T15:15:51 1779203751

The tab completion is "faster than vim" from a long-time vimmer. It's at the point where a lot of times i'll lead with the comment instead of the code:

    # now take the list and sort by x.lastName
    <tab>

...and it'll "do the thing" (w/ type hints, its own comments, etc). Obviously in this very simple, understandable, completely contrived example, it's "trivial" (but 3 years ago would have seemed like magic), but it'll also pick up on "continuation / more of the same" type edits. A comment like `# use random_utility to call the api and only accept matches which supplement addresses that have already been found` will (usually) autocomplete all the gobbledy-gook w.r.t. tokens, URL's, function names, etc. so it's effectively an "automatic omni-complete with simplistic post-processing"

Example #2: I was just fixing some vibe-coded slop, where it was taking `click.echo( some_api.whatever_endpoint() )` and the "slop" portion was literally emitting: `str('{ "A": 1, "B": 2 }')` and that function call was emitting it directly.

On the command line, I was doing `blah whatever-endpoint --something | jq '.'` and got tired of the JQ thing, so I'm like: "I'll just use `json.dumps(...,indent=2)`", but lo and behold, I'm getting a dumb JSON string literal, not a pretty printed object shape.

I start typing `json.loads(` to move from "str()" to "dict()" ... and it autocompletes the whole scenario (on that line), then I move to `def some_other_endpoint` and it basically has that same edit queued up. (ie: it "knows" what i'm about to do).

...so overall, "faster than vim", even with high skill bar for repetition, motion, macros, sed-style edits, etc. You can't beat: "<tab>", especially when it's lightly intelligent (ie: knows when/what/str/int, adapts do different function calls, etc).

nl · 2026-05-19T13:11:27 1779196287

I've tried Zed and really didn't like it.

I like VS Code with the Claude Plugin, and sometimes with the Codex Plugin

infecto · 2026-05-19T14:06:10 1779199570

Tried it and it’s fine but the AI integration is not tight enough for me.

jmcqk6 · 2026-05-19T17:51:31 1779213091

I've been using cursor for over a year for my personal projects. At work, I use Claude Code, and so I've been wondering if I'm missing something in the other agents.

Over the last week, I tried out two other agents on my personal projects: dirac and forgecode, after seeing impressive results from both of them on terminal bench.

After a good amount of testing, and over $100 in open router spend, I'm back to cursor.

I really liked forgecode the best, and it feels better than claude code, but cursor definitely feels best to me. Composer 2.5 is fast and effective, and it makes a huge difference. I was running `forge` with Opus, and it was taking dozens of minutes to do things, and the feedback loop was so slow.

The previous version of composer was also much faster, and it makes a difference. Maybe people like context switching, but I prefer to stay focussed on the task in front of me, and I'm reviewing the code carefully.

I think that's a pretty good moat. I was ready to end my subscription a week ago, and now I'm back after learning the grass is not necessarily greener on the other side of the fence.

alach11 · 2026-05-18T21:40:07 1779140407

Isn't a large user base and the data collected from those users a moat of sorts?

onlyrealcuzzo · 2026-05-18T22:06:01 1779141961

A moat is when you have something other's can't easily get.

Every MAG 7 / FAANG company already has more users and more data...

That's not a moat.

That's traction.

LinXitoW · 2026-05-19T12:34:07 1779194047

They don't have the same quality and kind of data. For example, Claude Code might have general conversation flow data for implementing feature X, but Cursor has users individual editing actions AND the chat flow. Which line did the user manually edit after the agent did it's thing? What's the commit message (if done manually)? Stuff like that is worth it's weight in gold.

wilg · 2026-05-19T05:46:15 1779169575

That's not X.

That's Y.

uxcolumbo · 2026-05-19T07:51:21 1779177081

Been a bit out of the loop.

What's wrong with using very short sentences like 'That's not X. That's Y.'?

arcanemachiner · 2026-05-19T08:13:15 1779178395

Commonly used phrase by LLMs. Gives people slop vibes these days.

Kiro · 2026-05-19T10:13:36 1779185616

"It's not X, it's Y" is a good way to illustrate a point. Same goes for many other common LLM phrases. It's used because it's effective.

monsieurbanana · 2026-05-19T13:28:12 1779197292

Huh. I associate it with LinkedIn slop, which is probably 100% ai nowadays but they certainly didn't wait for llms.

AussieWog93 · 2026-05-18T21:46:54 1779140814

Honestly the data itself is probably worth heaps even in the company itself collapses. Early attention engineering when humans were still in the loop!!!

NitpickLawyer · 2026-05-19T06:34:06 1779172446

> Early attention engineering when humans were still in the loop

Exactly. Cursor was the first product used by tons of devs on real codebases. Just the signal "acceptance rate" is huge and can't be easily captured w/ synthetic data.

kkukshtel · 2026-05-18T19:42:43 1779133363

And its still just a vscode fork

icemelt8 · 2026-05-19T07:49:26 1779176966

Cursor 3 is a complete rewrite, its no longer a fork.

gkbrk · 2026-05-19T12:17:57 1779193077

It's still a VSCode fork. Even Cursor's own About window tells you it's VSCode.

  Cursor
  Version: 3.4.20
  VSCode Version: 1.105.1

muhfournik · 2026-05-19T12:59:54 1779195594

I believe the agent view is a complete rewrite, and maybe the other parts but not the editor itself

antirez · 2026-05-19T06:19:27 1779171567

How much the RL they are doing really improves Kimi K2.5 is to be seen. So, right now, the ground truth is that they combined what they had with a strong open weights model. The RL improvement may be both marginal (since may folks report strong results with vanilla K2.6) and may mostly bias the model towards coding tasks: when a model like this is trained to be generalist, there is a tension between being good at one thing and the other, in terms of SFT and RL. You can see this in the DeepSeek v4 Flash training report for instance but it is a known fact. So if you have the GPUs and a decent RL pipeline that does not run the model you can indeed specialize it a bit more for a given task at the expenses of tasks people will not do inside Cursor. But, so far, the measurable reality is that Cursor uses an open weight model like most could do, and the RL story could be partilly a marketing move to call to Composer 2.5 more than a real strong gain, given that there is no way to verify and K2.5 was already strong. And we also know that they had to partner to do the training, which is also not a good news.

Lionga · 2026-05-18T18:10:43 1779127843

They are still a vscode fork with no moat? Like they lost about 70% of users in half a year which goes to show how there is not even the tiniest of moat.

GenerWork · 2026-05-18T18:21:53 1779128513

I feel like they've been targeting enterprise pretty hard. I know my company uses them, and the companies that hire us also use Cursor.

Squarex · 2026-05-19T05:46:00 1779169560

All enterprises I know use GitHub copilot as they already have Office, Teams, … wonder how will it change with the recent pricing changes

pjmlp · 2026-05-19T08:17:50 1779178670

I can tell my company wants nothing with them.

kvetching · 2026-05-19T00:09:44 1779149384

Cursor will definitely win the enterprise for coding. Enterprises aren't going to trust a TUI

esafak · 2026-05-19T05:21:01 1779168061

Why not? That makes no sense to me.

kilroy123 · 2026-05-19T11:17:57 1779189477

I think it's going to be brutal for them to compete with OpenAI and Anthropic.

I switched to claude code because of usage. For $200 a month, I would run out of usage halfway through the month. Then be forced to use their composer model or whatever slow, dumb model they served up in their "auto" mode.

For that same $200 a month, I could use claude code and basically never hit usage limits.

I don't understand what people are doing who run into the limits on that max x20 plan. I NEVER have.

liuliu · 2026-05-18T18:42:03 1779129723

Since the frontier is only 8-month ahead of DeepSeek, it is hard to see how model training can be a moat as all the tricks are available from open labs in China. You really just need <100m to bootstrap at this point.

wg0 · 2026-05-19T05:36:15 1779168975

This was the only way forward.

the_duke · 2026-05-19T07:14:51 1779174891

In my opinion cursor actually has one of the best harnesses again at the moment.

make3 · 2026-05-19T07:46:53 1779176813

why is that part impressive specifically? they got purchased by SpaceX, they have access to infinite compute and cash now.

& now they're still losing all of their users to Claude Code and Codex.

DeathArrow · 2026-05-19T09:10:26 1779181826

>& now they're still losing all of their users to Claude Code and Codex.

Why pay for Cursor when I can use GLM 5.1, Kimi K2.6, MiniMax M2.7, Xiaomi MiMo V2.5 Pro and Deepseek v4 for cheap and use whatever harness I want, including Claude Code.

It's not like Cursor harness is the best out there.

And even if I want to edit the code, I don't need to run the agent harness in an IDE.

wmichelin · 2026-05-20T04:44:13 1779252253

Not a cursor shill by any means, I do use it at work but that's because it's what they pay for.

But Cursor has a CLI harness.

make3 · 2026-05-19T11:17:56 1779189476

these are in the trillion parameters range, not sure it's actually that cheap to have at a reasonable speed without quality degradation & without like.. your own DGX B200

DeathArrow · 2026-05-19T12:25:12 1779193512

I didn't say to run them at home. There are some cheap coding plans that gets you plenty of usage for the Chinese models.

DeathArrow · 2026-05-19T08:33:46 1779179626

>Really impressive how they've made this jump from being called the vscode fork with no moat just a couple of months ago.

With so much money and computing from SpaceX, is not so impressive.

farco12 · 2026-05-19T17:20:12 1779211212

One would hope the vscode fork with a $50B valuation and no moat, would wisely spend the money they raised to build a moat.

whywhywhywhy · 2026-05-18T18:37:46 1779129466

It's still a VsCode fork just now with a Kimi fine tune and still no moat...

I won't debate that it turns out none of this mattered when it came to being as successful company though and kinda makes anyone who tried to roll their own instead of fork look a little silly.

hkleppe · 2026-05-19T06:29:37 1779172177

"No moat", well...

How I see this is that its so important to bundle the model with the right tooling.

Like a racecar, having the best engine doesn't help if the rest of the car lacks other winning properties (reliability, aerodynics etc).

So for Cursor, which IMO, they put themself in a strong position by having both a solid IDE __and__ a solid+cost efficient model. Those two working great in combination for the task they are designed to solve (coding) is more important than benchmarks

aurareturn · 2026-05-18T18:41:30 1779129690

I doubt it's a brand new model. It's likely just Kimi K2.5 further trained on coding.

enraged_camel · 2026-05-18T18:47:07 1779130027

They didn't say it's a new model... in fact they said exactly what you just said.

memoryleakgame · 2026-05-18T21:37:23 1779140243

If these benches from their site hold up (they likely wont)

Wouldn't this compress ai revenue like 15x quickly

If they really have a 4.7 opus high equivalent at 1/16 the cost wouldn't this significantly effect all the current capex and planing

Maybe they are getting elon to cover cost

vessenes · 2026-05-19T11:07:38 1779188858

It's worth being specific:

"Will this decrease Revenue?" -- only if demand for high quality tokens is inelastic. If demand is instead elastic (grows with cheaper pricing) then revenue will likely increase.

"Will this lower earnings?" -- they have a current inference margin for their old models, and with the Elon deal in place, they have a new inference margin. It might be better or worse than their old one. If it's worse, then they'd need to see a concomitant increase in usage. If they don't, then yes it might lower earnings.

"Will this lower corporate value?" -- no - not least because this company is going to be owned by SpaceX approximately 90 days after IPO -- so all the new owner will care about is being benchmark competitive with Anthropic and oAI for the first n quarters. If they can do that, it will massively increase the corporate value of SX; it's hard to build a frontier lab.

infecto · 2026-05-18T22:31:31 1779143491

The way I have read their benchmark results is that they trained a model to work insanely well in their coding workflow. It’s not a general purpose model.

One of the surprisingly hardest problems to solve is to get a model to use the tools you give it access to.

romanovcode · 2026-05-19T08:21:18 1779178878

The problem with this is that we do not know the actual cost. For all we know they might be pulling an Anthropic. Subsidizing costs to get users, then increasing them later on.

yorwba · 2026-05-19T09:51:28 1779184288

They're offering a model based on Kimi K2.5 for $0.50/M input and $2.50/M output while the cheapest third-party provider on OpenRouter charges $0.40/M input and $1.90/M output https://openrouter.ai/moonshotai/kimi-k2.5 Those third-party providers have little incentive to subsidize their customers, so Cursor probably has a margin >20% on their inference cost.

The real money furnace is the training, not just of models that get released, but also experimental training runs that fail to move benchmarks and are quietly thrown away. E.g. Cursor claim that 85% of the compute for Composer 2.5 comes from additional training on top of Kimi K2.5, where I'm not sure how they determined that, but it can't have been cheap. Then they say "Together with SpaceXAI, we're training a significantly larger model from scratch, using 10x more total compute."

So yes, they're probably attempting to replicate the Anthropic playbook of paying a large upfront cost for a very good model, and then rapidly acquiring paying customers, hoping that the inference margin will be enough to cover the training cost.

zackify · 2026-05-18T22:21:26 1779142886

this thing is so awesome on fast mode, so far i am impressed, some of its observations feel similar to opus.

i use gpt 5.5 and opus 4.7 a lot every day, if i can get good results at this speed, hopefully the usage level holds up on my team plan haha

2001zhaozhao · 2026-05-18T22:38:10 1779143890

> compress ai revenue like 15x

that roughly just puts it on par with OpenAI and Anthropic subscriptions in terms of pricing per token

smallnamespace · 2026-05-19T07:41:45 1779176505

AI revenue has been going up while the cost per token has been rapidly falling. The Jevons paradox applies here. The cheaper software is, the more software is written. There is not a finite demand for software.

rafaelmn · 2026-05-19T07:50:57 1779177057

> AI revenue has been going up while the cost per token has been rapidly falling

Every model release now has been straight price increases since what GPT 4 ? When was the last time a new flagship model decreased prices compared to the previous one ?

jstummbillig · 2026-05-19T08:54:31 1779180871

1. GPT 4 has gotten 6x cheaper over it's evolution (from initial release to Turbo to 4o). Maybe you meant "Only since 4o and only since its final release". Alas.

2. We are not interested in how different model naming schemes relate to prices, we are interested in the capabilities. So if you want to learn something about price development you need comparative levels of capabilities, and then look at the prices. 4o is not comparable to 5.5 in the first regard. It is (according to the benchmarks) maybe more comparable to current 5 nano - which is 98% cheaper.

dktp · 2026-05-19T08:44:06 1779180246

Opus 4.5 became significantly cheaper directly per token

rafaelmn · 2026-05-19T08:47:11 1779180431

You are right I forgot about that ! I think my point still stands - price per token is not decreasing for frontier capabilities, in fact it's increasing.

radu_floricica · 2026-05-19T12:19:15 1779193155

This only means the frontier is growing faster than the price is decreasing. It's just the sum of two separate tendencies, and has little predictive value. TBH, I'm ok with this tradeoff - higher capability at slightly higher cost is perfectly fine.

baq · 2026-05-19T07:59:49 1779177589

token efficiency

chillfox · 2026-05-19T08:15:33 1779178533

Not seeing that either, tried really using Opus 4.7 today, and it ended up at $50 for the same kida thing that came out to $25 last week with Opus 4.6.

baq · 2026-05-19T08:28:04 1779179284

each model is different and nothing should be taken for granted, run your evals for your use cases. I'm not using Opus 4.7 for almost anything. I've seen very good improvements in GPTs since 5.2 and Opus 4.5 to 4.6 was quite an upgrade.

wesammikhail · 2026-05-19T08:54:34 1779180874

Models consume more tokens than ever for the same tasks.

vb-8448 · 2026-05-19T13:57:02 1779199022

I, and I guess basically everyone here, don't have access to OAI or Anthropic books, and it's really difficult to disprove your statements but:

- AI revenue going up & cost/token are not related metrics, at least not in the way you are assuming - basically all players (except OAI for the moment) struggling with capacity and/or reducing-dismissing subscription based solutions in favour of pay-per-use. If token cost/token was falling, we would see quite the opposite.

lompad · 2026-05-19T09:05:04 1779181504

This is conjecture. There is a reason both openai and anthropic refuse to comment on inference costs. If it were falling so much, they would use it to brag. I really don't understand why so many people keep repeating it without any actual data for the frontier models.

Apart from that, I'm not sure if focusing on tokens is even a good idea, because they are so different from model to model. I'd almost consider them a red herring now.

We could look at tasks instead. Is there anything even remotely suggesting that your typical task you give an LLM now costs less in inference than before?

epolanski · 2026-05-19T07:49:28 1779176968

I'm not sure that to be the case, it seems like bringing capabilities up and costs down merely serves to induce more demand.

rcleveng · 2026-05-19T14:51:59 1779202319

I have to say the new model is quite good at the basics, I've been handing over more and more tasks from Linear straight to it instead of the copy-paste into Claude dance lately.

At this point, more of my complaints are on the harness side, which is odd since originally they were by far the best harness out there.

Support - This is pretty much non-existant, it's community support or sales support.

Interacting with GitHub - this should work and be awesome, Claude code does this well (responding to lint errors and comments). Cursor you have to poke the agent to look at the comments or lint errors, and even then it's about 10% good. Even GitHub Copilot is better here.

Bugbot - I have it setup to trigger manually, but it still seems to wake up and burn 80-120k tokens just to notice it's configured to be manually invoked. When it does run, it tells me there's no issues (but claude or copilot both find real things)

App - When you have both agent window and the ide windows, it's hard to open up the code in the right directory. A simple "cursor ." from the terminal used to do it, now it'll often open the agent window, you have to try a few times for it to work.

I love that they are running super fast, it's just hard when many of the basics break or don't work.

khazhoux · 2026-05-19T15:10:53 1779203453

> I've been handing over more and more tasks from Linear straight to it instead of the copy-paste into Claude dance lately

Tangent: we've been using Linear at work and I still don't understand why it claims to be "task tracking for agents". Is there anything at all that lends itself better to agentic workflows compared to JIRA or gitlab/github issues or whatever else?

Seems like Linear just hopped on the buzzword hype train at the exact right moment...

dbalatero · 2026-05-19T15:22:28 1779204148

> Seems like Linear just hopped on the buzzword hype train at the exact right moment...

I think you nailed it. Provided an agent can connect and ingest the information in the ticket, that's basically what's needed. I guess it's nice to be able to nudge ticket status and post back to it, but all of those seem like wiring up existing APIs to an MCP and calling it good. I don't see why JIRA couldn't execute on that, despite being Atlassian.

rcleveng · 2026-05-19T16:05:20 1779206720

Yup, honestly a google spreadsheet could probably do it as well.

I like the "copy prompt" feature, it's super simple but makes it just a few seconds to go from issue -> claude session.

Also assigning directly to cursor or codex, that's how I handle the easier tasks.

We also have scheduled tasks that elaborate existing tickets with information where needed, again that's just MCP but it works well enough

brunooliv · 2026-05-19T11:19:55 1779189595

Any reason why they indexed on Kimi K2.5 model? I have tried many open-source ones in Opencode, and, in my experience (standard backend development, Java, Python, Spring, etc) Qwen3.6 is SO MUCH BETTER that's shocking. Kimi can't even get most tool calling arguments right.

CuriouslyC · 2026-05-19T11:47:58 1779191278

There's a lead time on models, and there's some tuning gotchas they probably already figured out with Kimi, so they weren't ready to just drop everything and switch. I'm sure they will switch models eventually.

roflcopter69 · 2026-05-19T12:21:18 1779193278

I recommend reading the entire article

  Together with SpaceXAI, we're training a significantly larger model from scratch, using 10x more total compute.
  With Colossus 2's million H100-equivalents and our combined data and training techniques, we expect this to be a major leap in model capability.

grim_io · 2026-05-19T14:26:10 1779200770

I guess this will largely decide if xai is going to pay 60 or 10 billion, depending on the success of the new coding model.

KaoruAoiShiho · 2026-05-19T12:24:14 1779193454

Kimi 2.5 has the best long context. For raw coding benchmark scores you can just post train on top of it with more specialized data. 2.5 is kinda old, 2.6 is the current release which is exactly just that and catches up to the frontier in most aspects.

Bombthecat · 2026-05-19T11:25:13 1779189913

Cheaper to run?

steviedotboston · 2026-05-19T13:38:12 1779197892

It's very confusing that they use the same name as the very well known PHP package manager, composer

https://getcomposer.org/

wesammikhail · 2026-05-19T13:41:22 1779198082

I dont know what it is with products names these days. Antigravity, Antimatter, Composer, Clay, Ramp, Bolt, etc.

You'd think the founders would Google for naming conflict before choosing a name.

varun_ch · 2026-05-19T13:47:36 1779198456

I genuinely wonder if consulting LLMs for naming advice could be an explanation.

They certainly wouldn’t be great at coming up with new words for a product name.

dewey · 2026-05-19T17:56:25 1779213385

Naming issues are as old as time. Apple Computer vs. Apple Records comes to mind as a popular example.

PUSH_AX · 2026-05-18T17:58:44 1779127124

They set themselves up for flack when they use whatever these evals are… they did the same for composer 2 which was evaled in close competition with frontier models, spoiler alert, it wasn’t even close in practice.

So now 2.5 is supposed to compete with opus 4.7? Sure…

jmcqk6 · 2026-05-19T17:53:44 1779213224

That does not match my experience. Composer 2 was fantastic for my uses, and I hit Composer 2.5 with some very difficult things last night, which it handled fast and effectively. I don't really care about benchmarks. I care about practice, and in practice, it's been very very good for me.

tuo-lei · 2026-05-18T18:46:38 1779129998

they say it themselves in the post - behavior dimensions "not well captured by existing benchmarks". that was the exact problem with composer 2. not dumber on individual tasks, just bad at session-level decisions like when to stop editing, how much context to carry forward, when to re-read a file vs assume. you don't catch any of that in an isolated eval.

infecto · 2026-05-18T22:28:30 1779143310

As I have said before in prior composer threads. The proof is in the usage. I am inclined to somewhat believe the results as I use composer and also take the results for the given context. It’s not a general purpose sota model. It’s a model that runs inexpensively in their coding workflow that is creating results similar to opus or gpt.

criemen · 2026-05-18T18:20:45 1779128445

Well is that a statement about the quality of Opus 4.7 or about compose 2.5? :P

jtwaleson · 2026-05-18T18:56:43 1779130603

Ok this might be weird but I've moved everyone in my 4 person team to our team plan and costs seem to have sky rocketed compared to the individual plans. Where before most people spent 20-100 USD, now the total bill is more like 1k USD. I haven't gone into the details but it feels like I'm being scammed.

mohsen1 · 2026-05-19T07:59:04 1779177544

We moved off Cursor and onto Codex + Claude Code. Cost went from multiple thousand per engineer per month to about $500

zackify · 2026-05-19T13:01:53 1779195713

Best deal currently:

Cursor team Codex team Claude team

Swap between the models when limited.

I am saving our company a lot of money vs Claude enterprise usage cost

skeptic_ai · 2026-05-19T08:31:20 1779179480

I did some monitoring. 15 accounts, 300 millions tokens input, 200k output went to 0 the 5h quota in 7 hours. 4 parallel tasks.

I think 300 million is too low. For reference before I could do more than 1 billion on same conditions.

DedlySnek · 2026-05-19T05:56:08 1779170168

My company is shifting us from Cursor to Claude due to increased costs.

danbrooks · 2026-05-18T19:28:43 1779132523

Check which model you're using.

The fast version of composer is the default now (which costs ~x3 as much).

infecto · 2026-05-18T22:29:46 1779143386

Keep in mind I believe there is a larger buffer given to personal plans. If they have 50% extra with the personal plan you now only get 25%.

PUSH_AX · 2026-05-18T19:02:10 1779130930

My cursor costs sky rocketed recently too

chemex · 2026-05-19T18:14:06 1779214446

I've been using Claude Code as my daily driver on a React Native + iOS codebase for the last few months. The thing that surprised me wasn't quality differences on individual edits — those are pretty close once you control for harness wiring — but how differently I'd ended up structuring my workflow around each style of tool.

Tab completion + chat-in-sidebar feels like an extension of my editing. An agentic harness feels more like delegating a 20-minute task and coming back to review. Different cognitive load, different bug profile. The "which is better" framing tends to skip over the fact that they reward different working styles.

Two things I'd watch on Composer 2.5 specifically:

1. How it handles long-running multi-file refactors that touch 10+ files. My experience with smaller models in that slot is they lose track of which files they've already edited around 30% of the way through. Frontier models keep the plan coherent for longer.

2. How it deals with non-obvious file boundaries. The thing that takes me out of "let it work" mode is the model deciding it needs to edit a config file I didn't think of. Usually that's right, but occasionally it's spelunking somewhere I don't want it to be.

The Kimi K2.5 base is interesting on its own. Open weights below frontier closed models is the thing worth watching from the harness side. If anyone's set up to fine-tune for a specific harness, this is the moment.

chis · 2026-05-19T19:40:53 1779219653

AI slop detected, you're under arrest

everfrustrated · 2026-05-18T17:47:27 1779126447

Full details https://cursor.com/blog/composer-2-5

dang · 2026-05-19T05:35:45 1779168945

Thanks! Link belatedly changed above.

wunderlotus · 2026-05-19T18:05:00 1779213900

I love Cursor as a tool, but I'm skeptical bc:

1/ CursorBench is so opaque [1] that it makes it hard to trust. Not to mention the v3.1 eval is a newer iteration and there's no insight into the tasks or if the model was just tuned to max it out. Composer 2 previously scored between 60-65% on the previous benchmark eval [2] but scores between 50-55% on CB v3.1[3].

2/ I've experienced Composer 2's performance and it leaves much to be desired as a daily driver for a knowledge worker. but KWs are obviously not the target users and I can see how it's cost-efficient for executing on clearly-defined, discrete coding tasks. Obviously that's their value proposition and they're figuring out how to communicate it well to the target customer. It just doesn't feel like CursorBench is that.

[1] https://cursor.com/blog/cursorbench#building-cursorbench

[2] https://cursor.com/blog/composer-2-technical-report#performa...

[3] https://cursor.com/blog/composer-2-5

zurfer · 2026-05-19T09:49:28 1779184168

Kudos to the team. Please consider making the model available via API!

bg24 · 2026-05-19T10:13:35 1779185615

They shipped an SDK recently. https://cursor.com/blog/typescript-sdk

enraged_camel · 2026-05-19T13:32:56 1779197576

I tested it yesterday. It is pretty bad. Just like with Composer 2, it's fast, but quality is nowhere near what Cursor claims with their benchmarks. It is not even at Opus 4.5 level.

I gave it a mix of refactoring tasks and new feature tasks. For each one, I had it write a plan, then I had Codex review it. Codex found major issues with every plan: patterns that don't match the rest of the code base, hallucinated variable/function names, and even outright bugs in the way the plan was written. I fed the feedback to Composer 2. After it made the changes and implemented the revised plan, I had Codex and Opus 4.7 do code reviews, and once again both of them found major bugs.

Overall it was a very frustrating experience. I feel like I wasted a whole day. Which is sad, as I have been looking for an excuse to come back to Cursor. But as things stand, Codex + CC combo cannot be beat, not just in terms of price but also quality.

granzymes · 2026-05-18T21:28:47 1779139727

Surprised this got pushed off the front page so quickly! It’s exciting to see what the Cursor team has been able to do with significantly fewer resources than the frontier labs.

I do wish they weren’t joining xAI. Something tells me there will be a contingent of researchers that departs Cursor if that merger is consummated.

dang · 2026-05-19T05:15:51 1779167751

It set off the flamewar detector, a,k.a. the overheated discussion detector. We'll turn that off.

granzymes · 2026-05-19T05:29:47 1779168587

Thanks, dang! The blog post[1] might be a better source than the twitter thread. Also I regret my typo above (lab -> labs) but too late now!

[1] https://cursor.com/blog/composer-2-5

dang · 2026-05-19T05:34:23 1779168863

Thanks! I had been just about to add that maybe the link wasn't the most informative. We've switched it now from https://twitter.com/cursor_ai/status/2056415413077233983.

As for the typo, s's are cheap and I've added one :)

ChrisArchitect · 2026-05-18T18:22:18 1779128538

Non-x link: https://cursor.com/blog/composer-2-5 (https://news.ycombinator.com/item?id=48182126)

DeathArrow · 2026-05-19T09:16:33 1779182193

I think anybody will be much better by acquiring a coding plan from Kimi.com and using Kimi K2.6, with whatever harness they like, including Claude Code, instead of paying more for Cursor's version of Kimi K2.5.

m_mueller · 2026-05-19T06:23:09 1779171789

It's a bit confusing to me why they'd make this 'fast' version the default, as it appears to be much more expensive than Composer 2. Wasn't it supposed to be a very cheap alternative to SOTA models?

mrklol · 2026-05-19T07:22:15 1779175335

Isn’t it a really cheap alternative to sota models (according to benchmarks)?

ryanshrott · 2026-05-19T21:20:12 1779225612

The cost claim is the easy part to sell. The real test is whether it stays useful in ugly codebases, long files, and repos with a bunch of half-broken conventions. That’s where these assistants usually fall apart, even when the benchmark numbers look great.

machiaweliczny · 2026-05-19T13:09:13 1779196153

Tested and it's good. Fast version is bad though. I like planning model in Cursor that it works more like human written design doc instead of too detailed AI plan. Seems like this is more responsible for results that model but still on fast it failed but on normal got good results.

luodaint · 2026-05-19T10:53:55 1779188035

Benchmarks measure turn-level capabilities: you feed a task into the system and then grade the result. Capability for production-level usage concerns session-level decision making: does the agent know when to stop editing, retain the right amount of context, or go back and reread the file if the state has changed?

This is not a property of the model, but a property of the discipline; it can be operationalized by what you have documented before the session begins. Without "stop editing where you can no longer follow your changes to the spec" and "go back and read the migration file before changing the schema," there is nothing to halt the process until it fails integration.

Those teams who get consistent results independent of the model being used typically do so because they have operationalized their discipline first. Those switching out models monthly tend to expect the model to supply them.

0fes911 · 2026-05-19T11:01:59 1779188519

I found composer 2 pretty good as a subagent delegating tasks like auditing for bugs after finishing implementation, but hopefully composer 2.5 will be more reliable so it can be used to implement and execute long running tasks.

WhitneyLand · 2026-05-19T13:13:51 1779196431

Say what you want about Cursor but they don’t lack for ambition.

Forking VS Code, going big on bleeding edge features like cloud agents, and now they’ve thrown down the gauntlet directly challenging frontier labs by training their own model (“much larger” than Kimi 2.5’s 1T parameters) from scratch.

They’ve been highly successful so far. Raised $50B, $2B in revenue, forecast to end 2026 above $6B. But even at these heights, they’re just not in the same league as OpenAI/Anthropic/Google.

And if building a state of the art multitrillion parameter model is not challenging enough, it’s a mountain you don’t climb just once. Every few months you need to push it farther with a new release. Fall off for a couple cycles and like Facebook you may never catch up again.

Not for the faint of heart.

pdq · 2026-05-19T14:55:16 1779202516

Why is this comment upvoted?

It is most likely AI generated with a nice "Raised $50B" hallucination and filled with cliches ("thrown down the gauntlet", "mountain you don’t climb just once", "not for the faint of heart").

Aurornis · 2026-05-19T16:06:18 1779206778

Good catch. I didn’t even notice it at first, but the hallucinations on top of cliches gives it away.

The account doesn’t have a history of other comments that have too much of an AI vibe, but this one does. Even if it wasn’t AI, it’s misinformation.

WhitneyLand · 2026-05-19T20:25:29 1779222329

Please see reply to your other comment on this thread.

WhitneyLand · 2026-05-19T20:14:36 1779221676

I wrote this 100% off the top of my head on my phone while eating a sandwich.

Ffs.

edit: removed cursing you out. Sorry but this is frustrating. I don’t leave AI generated comments here (or anywhere else).

Aurornis · 2026-05-19T14:06:00 1779199560

EDIT: As others have pointed out, the comment above contains hallucinations (Like the $50 billion number) and a lot of AI tells. The account doesn’t have a history of AI-like comments but the hallucinations and structure in this one are suspicious. If anything, don’t trust the numbers it cites because they’re made up.

Cursor is a team that I want to see succeed. They have stacked their company with very smart people and they’re going hard at a highly competitive market. We all win when there is more competition and more innovation.

My problem is that every few months I look at Cursor’s product offerings and maybe retry it, but it never feels like something I want to use. Part is personal preference, the other part is the fact that my combination of other tools and services just does a better job. Their biggest advantage felt like first-mover advantage when they came out early and captured market share, but at in person meetups I hear stories about companies switching away from Cursor or trying to convince their management to let them switch away. They need to come up with a compelling advantage fast, which is a hard thing to do against the other companies with their virtually unlimited budgets by comparison.

WhitneyLand · 2026-05-19T20:23:56 1779222236

So, you’re wrong on two counts.

1. Evidently you’re no longer able to distinguish AI from people as the whole comment was written by a human off the cuff.

2. The numbers are not hallucinations. It’s word on the street reporting, so yes it’s speculative, but a model did not make up it up unless that’s where TechCrunch got it which is not on me.

https://techcrunch.com/2026/04/17/sources-cursor-in-talks-to...

Aurornis · 2026-05-19T22:04:06 1779228246

Quoting directly from your comment:

> They’ve been highly successful so far. Raised $50B,

They have not raised $50B. The article you linked says they're raising $2B, not $50B.

The valuation is not the amount raised.

WhitneyLand · 2026-05-20T01:52:40 1779241960

So I made a mistake reading the article? So what?

The point is you made two brigade style comments about my posts sounding suspiciously like an LLM and having hallucinations.

Neither turned out to be true and I think a better response would concede the point.

It may be more helpful for us to stick together as humans since we can’t always recognize each other so easily anymore.

Survey8430 · 2026-05-20T06:57:20 1779260240

What do you mean neither turned out to be true?

Your comment DOES sound like an LLM and it DOES have hallucinations!

Please make your humanness more recognizeable next time, don't waste readers time with posh fanboying and lazy fact checking.

adamkeys · 2026-05-19T14:37:19 1779201439

Same, I kick the tires on Cursor every several weeks wanting to find they've finally crossed some chasm I can't quite explain. But every time, I bounce off the ground-truth that they're forked off vscode, which just isn't for me. I think moving agents to the center of their experience and developing a model that focuses on speed/efficiency over maximum depth is a promising step away from being a spicy vscode fork.

whs · 2026-05-19T15:42:24 1779205344

My company is heavy on Cursor and I still ask them to provide me GitHub Copilot, for the sole reason that Cursor is probably the reason Microsoft had to implement technical enforcement of their TOS on proprietary plugins. Previously, you could use PyLance on VSCodium but now those plugins do not work outside VSCode anymore.

If Cursor (and every other commercial VSCode forks) didn't use MS extension store in the beginning and violate the TOS these might not have happened.

chrisrickard · 2026-05-19T14:57:02 1779202622

Cursor 3 is a full rewrite. No VS Code

causal · 2026-05-19T13:36:03 1779197763

Yeah I want them to do well. I find Cursor to be a much better tool for actually working with the code the agent writes than whatever the big vendors provide.

highfrequency · 2026-05-19T14:54:54 1779202494

> now they’ve thrown down the gauntlet directly challenging frontier labs by training their own model (“much larger” than Kimi 2.5’s 1T parameters) from scratch.

To clarify, the model Composer 2.5 announced in this post is not that; it uses Kimi 2.5 as a strong starting point. This is not to discount Cursor's work or future ambitions, but one of the most striking things about the last 6 months is that multiple open-source models/labs are now within striking distance of the frontier closed-sourced labs.

See eg Kimi 2.6 benchmarks: https://www.kimi.com/blog/kimi-k2-6

didroe · 2026-05-19T13:54:49 1779198889

They have no choice but to train their own model to try and survive. They're paying API pricing for the top tier models but competing against subsidized subscriptions.

worldsavior · 2026-05-19T13:50:04 1779198604

Them raising this much money doesn't mean they're successful, it only means they know how to fool the investors well. A project that is basically an extension to VSCode only adding a chat interface, isn't really worth this much money. Obviously, it's the users, but people think it's something genius and revolutionary, but no.

infecto · 2026-05-19T15:42:30 1779205350

This is rsync all over again. Go create it yourself if you think it’s just a simple extension.

worldsavior · 2026-05-20T07:13:17 1779261197

You're right, I regret I didn't have the sense to do the same as them at the time.

infecto · 2026-05-20T11:46:04 1779277564

Nope you are blowing hot air. Take it elsewhere.

worldsavior · 2026-05-20T15:55:06 1779292506

You can take yourself elsewhere. Good luck.

infecto · 2026-05-20T18:03:13 1779300193

Less hot air and more substance please. It’s easy to deconstruct a company as an arm chair quarterback. It’s much harder to build a viable one. Until you have something constructive, kick rocks. Hot air is boring.

I realize you’re a troll account but at least be a fun troll.

worldsavior · 2026-05-21T11:05:52 1779361552

I think that the product is easy to build, that's what I think because in my gathered experience it's easy. What more do you want?

This is the last time I'm responding. Good luck on whatever journey you're on. I'm sure it's an interesting journey since you've realizations over troll accounts, very interesting.

dtagames · 2026-05-19T14:45:26 1779201926

As a heavy user, I don't think the model is their product. Cursor is primarily a harness and lately, a specialized agent dashboard.

Composer, their in house model, is dispatched by other models like Claude Opus for individual items on a task list. No one is suggesting you write your main prompt to Composer 2.

benmusch · 2026-05-19T15:22:00 1779204120

they aren't "throwing down the gauntlet", they're trying to find ways to eke margin out of their product by owning a commodity-level coding model. it's an impressive engineering task but it's not particularly ambitious.

Survey8430 · 2026-05-19T15:43:56 1779205436

AI comment... BOO!

jorl17 · 2026-05-19T12:08:15 1779192495

I want to like composer, but I just can't.

- Its communication style is completely opposite to Anthropic models. It's not as bad as OpenAI's models, which are obsessed with "shapes", "wrinkles", hyphenated-words, and other cryptic formulations that make you feel like you're not on planet earth after a while talking to them. But it is nonetheless markedly "rude", "dry", "cold", gives off this "entitled I'm right, you're wrong" attitude. I once had composer2-fast accidentally run `rm -rf $HOME` (no harm done) as part of a bug in an install script it wrote and all it could say once it realized it was: "Running script with proper hardening". Qwen's models have clearly been distilled from Anthropic models because they have a much closer communication style and that's why I hope cursor will one day release a new family of composer models derived from that. A damn joy to use.

- It's just dumb. I don't know what they're doing with benchmarks, but for my work (python, bash, docker, whatever), cursor is just incredibly dumb. Always does in 10 lines what could be done in one. Doesn't know loads of internals of things that other models know. Never places things in the right files, constantly makes terrible edits (inline imports, edits without testing). Everything is so complicated when done by composer2, it's just a joke to me at this point. It clearly needs more handholding than Opus 4.x or GPT-5.x. I tried 2.5-fast and it seemed more of the same. And this would sort of be acceptable if it owned up to its incompetence, but it is so confidently incompetent that it's revolting.

I know that for many people the "tone" of the models is not relevant, or maybe they even prefer models like these. I simply cannot work like that.

Ever since Gemini started blowing benchmarks out of the water while being a clearly inferior model incapable of producing anything (and pretty much just doing tool calls without any feedback to the user), I gave up on benchmarks. Composer has been more of the same in that regard.

As a GPT model would say:

   "Small wrinkle: the production-ready benchmark results were tainted by real-world data points. I've assimilated the inconsistencies and added guardrails so that v2 has the right shape for future evaluations."

sofumel · 2026-05-20T11:31:06 1779276666

I'm currently using Claude Code, but should I cancel it at the next renewal and switch to Composer 2.5?

sergiotapia · 2026-05-18T18:04:50 1779127490

Congratulations on the launch! I'm interested in trying Cursor but it's very confusing what I should buy. What does the Pro $20 plan get me in usage if I only use Composer 2.5? How fast is the model?

darkwi11ow · 2026-05-18T18:19:13 1779128353

I use $20 plan on daily basis for more than a year now, and have yet to exhaust that limit. The plan includes $20 in api costs for non-Cursor premium models and $20 for Composer and Auto models provided by Cursor themselves.

That said, I am pretty old-fashioned coder and use LLM mostly to overcome the blank page problem, which means I review and often rewrite LLM output by hand and avoid prompt loops for a single task.

People who are aiming to not read code any more might find this $20 plan lacking for their needs, however for my needs it fits perfectly.

kaizoku156 · 2026-05-18T18:47:43 1779130063

The limits are probably even higher than that, i seem to get about 100$+ of usage on composer and about 45-50 usd on non composer models

uf00lme · 2026-05-19T06:01:08 1779170468

I wonder why they didn’t train off Kimi 2.6, I hope is it because they already had a good base and not that they messed up that relationship.

NitpickLawyer · 2026-05-19T08:07:21 1779178041

> and not that they messed up that relationship.

There's nothing to mess up. The license is MIT w/ attribution, and the attribution clause can be easily sidestepped w/o any legal repercussions. The "drama" was simply content creators going nuts over some misunderstandings and poor comms from some kimi related devs.

re-thc · 2026-05-19T06:32:12 1779172332

That's 3.0

bingud · 2026-05-19T08:56:15 1779180975

Seems like a promising and useful model but its probably scary how much customer data they fed into it to reach this performance

vanuatu · 2026-05-18T18:39:18 1779129558

It's always great that more companies are throwing their hat in the ring, especially focusing on value (latency + intelligence + cost)