Looking at Unity made me understand the point of C++ coroutines

Looking at Unity made me understand the point of C++ coroutines

(mropert.github.io)

179

by ingve

Joker_vD

Simon Tatham, author of Putty, has quite a detailed blog post [0] on using the C++20's coroutine system. And yep, it's a lot to do on your own, C++26 really ought to give us some pre-built templates/patterns/scaffolds.

[0] https://web.archive.org/web/20260105235513/https://www.chiar...

zozbot234

People love to complain about Rust async-await being too complicated, but somehow C++ manages to be even worse. C++ never disappoints!

jandrewrogers

I find C++ coroutines to be well-designed. Most of the complexity is intrinsic because it tries to be un-opinionated. It allows precise control and customization of almost every conceivable coroutine behavior while still adhering to the principle of zero-cost abstractions.

Most people would prefer opinionated libraries that allow them to not think about the design tradeoffs. The core implementation is targeted at efficient creation of opinionated abstractions rather than providing one. This is the right choice. Every opinionated abstraction is going to be poor for some applications.

throwaway17_17

18h

I don’t know if the language is yours, but I think the wording and its intended meaning (the sentence starting with ‘The core implementation…’) may be one of the most concise statements of my personal programming language design ethos. I’m jealous that I didn’t come up with it. I will certainly credit you when I steal it for my WIP language.

I will be adding the following to my “Primary Design Criteria” list: The core design and implementation of any language feature is explicitly targeted at the efficient creation of opinionated, composable abstractions rather than providing those abstractions at the language level.

fooker

23h

C++ standards follow a tick-tock schedule for complex features.

For the `tick`, the core language gets an un-opinionated iteration of the feature that is meant for compiler developers and library writers to play with. (This is why we sometimes see production compilers lagging behind in features).

For the `tock`, we try to get the standard library improved with these features to a realistic extent, and also fix wrinkles in the primary idea.

This avoids the standard library having to rely on any compiler magic (languages like swift are notorious for this), so in practice all libraries can leverage the language to the same extend.

This pattern has been broken in a few instances (std::initializer_list), and those have been widely considered to have been missteps.

throwaway17_17

17h

Regarding your mention of compiler magic and Swift, I don’t know much about the language, but I have read a handful of discussions/blogs about the compiler and the techniques used for its implementation. One of the purported benefits/points of pride for Swift that stood out to me and I still remember was something to the effect of Swift being fundamentally against features/abstractions/‘things’ being built in. In particular they claimed the example of Swift not having any literal types (ints, sized ints, bools, etc) “built in” to the compiler but were defined in the language.

I don’t doubt your point (I know enough about Swift’s generic resolution crapshow during semantic analysis to be justified in assuming the worst) but can you think of any areas worth looking into for expansion of the compiler magic issues.

I have a near reflexive revulsion for the kinds of non-composability and destruction of principled, theoretically sound language design that tends to come from compiler magic and shortcuts, so always looking for more reading to enrage myself.

fooker

12h

> literal types (ints, sized ints, bools, etc) “built in” to the compiler but were defined in the language.

This is actually a good example by itself.

Int is defined in swift with Builtin.int64 IIRC. That is not part of the swift language.

Comment was deleted :(

pjmlp

12h

Not really, because due to C++'s unsafe first approach, means that workarounds like Pin aren't required.

Additionally, for those with .NET background, C++ co-routines are pretty much inspired by how they work in .NET/C#, naturally with the added hurdle there isn't a GC, and there is some memory management to take into account.

Also so even if it takes some time across ISO working processes, there is still a goal to have some capabilities on the standard library, that in Rust's case means "use tokio" instead.

01HNNWZ0MV43FF

async is simply a difficult problem, and I think we'll find irreducible complexity there. Sometimes you are just doing 2 or 3 things at once and you need a hand-written state machine with good unit tests around it. Sometimes you can't just glue 3 happy paths together into CSP and call it a day.

quietbritishjim

Using structured concurrency [1] as introduced in Python Trio [2] genuinely does help write much simpler concurrent code.

Also, as noted in that Simon Tatham article, Python makes choices at the language level that you have to fuss over yourself in C++. Given how different Trio is from asyncio (the async library in Python's standard library), it seems to me that making some of those basic choices wasn't actually that restrictive, so I'd guess that a lot of C++'s async complexity isn't that necessary for the problem.

[1] https://vorpus.org/blog/notes-on-structured-concurrency-or-g...

[2] https://trio.readthedocs.io/en/stable/

throwaway17_17

16h

After so wrote the comment below I realized that it really is just ‘um, actually…’ about discussing using concurrency vs implementing it. It’s probably not needed, but I do like my wording so I’m posting it for personal posterity.

In the context of an article about C++’s coroutines for building concurrency I think structured concurrency is out of scope. Structured concurrency is an effective and, reasonably, efficient idiom for handling a substantial percentage of concurrent workloads (which in light of your parent’s comment is probably why you brought up structured concurrency as a solution); however, C++ coroutines are pitched several levels of abstraction below where structured concurrency is implemented.

Additionally, there is the implementation requirements to have Trio style structured concurrency function. I’m almost certain a garbage collector is not required so that probably isn’t an issue, but, the implementation of the nurseries and the associated memory management required are independent implementations that C++ will almost certainly never impose as a base requirement to have concurrency. There are also some pretty effective cancelation strategies presumed in Trio which would also have to be positioned as requirements.

Not really a critique on the idiom, but I think it’s worth mentioning that a higher level solution is not always applicable given a lower level language feature’s expected usage. Particularly where implementing concurrency, as in the C++ coroutines, versus using concurrency, as in Trio.

Comment was deleted :(

maleldil

19h

Python's stdlib now supports structured concurrency via task groups[1], inspired by Trio's nurseries[2].

[1] https://docs.python.org/3/library/asyncio-task.html#id6

[2] https://github.com/python/cpython/issues/90908

quietbritishjim

Good point. I did carefully say that Trio "introduced" structured concurrency, partly due to this (and also other languages that now use it e.g. Swift, Kotlin).

I will say that it's still not as nice as using Trio. Partly that's because it has edge-triggered cancellation (calling task.cancel() injects a single cancellation exception) rather than Trio's level-triggered cancellation (once a scope is cancelled, including the scope implicit in a nursery, it stays cancelled so future async calls all throw Cancelled unless shielded). The interaction between asyncio TaskGroup and its older task API is also really awkward (how do I update the task's cancelled count if an unrelated task I'm waiting on throws Cancelled?). But it's a huge improvement if you're forced to use asyncio.

jujube3

22h

It's quite simple in Golang.

menaerus

13h

Golang has a GC and that makes a lot of things easier.

rafram

Languages like Swift do manage to make it much simpler. The culture guiding Rust design pretty clearly treats complexity as a goal.

CyberDildonics

17h

C++ is great, coroutines are not. Neither of these are good ways to handle concurrency. You really need a more generalized graph and to minimize threads and context switching. You can't do more than the number of logical cores on a CPU anyway.

matt_d

See also C++ coroutines resources (posts, research, software, talks): https://gist.github.com/MattPD/9b55db49537a90545a90447392ad3...

ZoomZoomZoom

For a layperson it's clear that it's either "Writings" and "Talks", or "Readings" and "'Listenings", but CPP profeciency is in an inverse relation with being apt in taxonomy, it looks like.

Thanks for the list.

nananana9

You can roll stackful coroutines in C++ (or C) with 50-ish lines of Assembly. It's a matter of saving a few registers and switching the stack pointer, minicoro [1] is a pretty good C library that does it. I like this model a lot more than C++20 coroutines:

1. C++20 coros are stackless, in the general case every async "function call" heap allocates.

2. If you do your own stackful coroutines, every function can suspend/resume, you don't have to deal with colored functions.

3. (opinion) C++20 coros are very tasteless and "C++-design-commitee pilled". They're very hard to understand, implement, require the STL, they're very heavy in debug builds and you'll end up with template hell to do something as simple as Promise.all

[1] https://github.com/edubart/minicoro

pjc50

> You can roll stackful coroutines in C++ (or C) with 50-ish lines of Assembly

I'm not normally keen to "well actually" people with the C standard, but .. if you're writing in assembly, you're not writing in C. And the obvious consequence is that it stops being portable. Minicoro only supports three architectures. Granted, those are the three most popular ones, but other architectures exist.

(just double checked and it doesn't do Windows/ARM, for example. Not that I'm expecting Microsoft to ship full conformance for C++23 any time soon, but they have at least some of it)

giancarlostoro

> Not that I'm expecting Microsoft to ship full conformance for C++23 any time soon,

They are actively working on it for their VS2026 C++ compiler. I think since 2017 or so they've kept up with C++ standards reasonably? I'm not a heavy C++ guy, so maybe I'm wrong, but my understanding is they match the standards.

manwe150

Boost has stackful coroutines. They also used to be in posix (makecontext).

audidude

> I'm not normally keen to "well actually" people with the C standard, but .. if you're writing in assembly, you're not writing in C.

These days on Linux/BSD/Solaris/macOS you can use makecontext()/swapcontext() from ucontext.h and it will turn out roughly the same performance on important architectures as what everyone used to do with custom assembly. And you already have fiber functions as part of the Windows API to trampoline.

I had to support a number of architectures in libdex for Debian. This is GNOME code of course, which isn't everyone's cup of C. (It also supports BSDs/Linux/macOS/Solaris/Windows).

* https://packages.debian.org/sid/libdex-1-1

* https://gitlab.gnome.org/GNOME/libdex

gpderetta

Unfortunately swap context requires saving and restoring the signal mask, which, at least on Linux, requires a syscall so it is going to be at least a hundred times slower than an hand rolled implementation.

Also, although not likely to be removed anytime soon from existing systems, POSIX has declared the context API obsolescent a while ago (it might actually no longer be part of the standard).

bonzini

21h

Stackful coroutines also can't be used to "send" a coroutine to a worker thread, because the compiler might save the address of a thread local variable across the thread switch (happened in QEMU).

gpderetta

13h

Yes I know, GCC has a long standing bug open on the issue :(.

cyberax

23h

Signal mask? What century are we in?

It can be safely ignored for the vast majority of apps. If you're using multithreading (quite likely if you're doing coroutines), then signals are not a good fit anyway.

gpderetta

23h

Aside from the fact that the signal mask is still relevant in 2026 and even for multithreaded programs, that doesn't have anything to do with the fact that POSIX requires swapcontext to preserve it.

audidude

21h

In most cases you're already using signalfd in places where libdex runs.

ndiddy

Looking at the repo, it falls back to Windows fibers on Windows/ARM. If you'd like a coroutine with more backends, I'm a fan of libco: https://github.com/higan-emu/libco/ which has assembly backends for x86, amd64, ppc, ppc-64, arm, and arm64 (and falls back to setjmp on POSIX platforms and fibers on Windows). Obviously the real solution would be for the C or C++ committees to add stackful coroutines to the standard, but unless that happens I would rather give up support for hppa or alpha or 8-bit AVR or whatever than not be able to use stackful corountines.

gpderetta

A proposal to add stackfull coroutines has been around forever and gets updated at every single mailing. Unfortunately the authors don't really have backing from any major company.

blacklion

There is no "Linux/ARM[64]". But there are "Raspberry Pi" and "RISC-V". I don't know such OSes, to be honest :-)

This support table is complete mess. And saying "most platforms are supported" is too optimistic or even cocky.

fluoridation

I think what they meant is that that what it takes to add coroutines support to a C/++ program. Adding it to, say, Java or C# is much more involved.

Joker_vD

Hmm. I'm fairly certain that most of that assembly code for saving/restoring registers can be replaced with setjmp/longjmp, and only control transfer itself would require actual assembly. But maybe not.

That's the problem with register machines, I guess. Interestingly enough, BCPL, its main implementation being a p-code interpreter of sorts, has pretty trivially supported coroutines in its "standard" library since the late seventies — as you say, all you need to save is the current stack pointer and the code pointer.

lelanthran

> Hmm. I'm fairly certain that most of that assembly code for saving/restoring registers can be replaced with setjmp/longjmp, and only control transfer itself would require actual assembly.

Actually you don't even need setjmp/longjmp. I've used a library (embedded environment) called protothreads (plain C) that abused the preprocessor to implement stackful coroutines.

(Defined a macro that used the __LINE__ macro coupled with another macro that used a switch statement to ensure that calling the function again made it resume from where the last YIELD macro was encountered)

Cloudef

Wouldnt that be stackless (shared stack)

lelanthran

Correct; stackless. I misspoke.

zabzonk

You can do a lot of horrible things with setjmp and friends. I actually implemented some exception throw/catch macros using them (which did work) for a compiler that didn't support real C++ exceptions. Thank god we never used them in production code.

This would be about 32 years ago - I don't like thinking about that ...

gpderetta

GCC still uses sj/lj by default on some targets to implement exceptions.

gpderetta

setjmp + longjump + sigaltstack is indeed the old trick.

Sharlin

C++ destructors and exception safety will likely wreak havoc with any "simple" assembly/longjmp-based solution, unless severely constraining what types you can use within the coroutines.

fluoridation

Not really. I've done it years ago. The one restriction for code inside the coroutine is that it mustn't catch (...). You solve destruction by distinguishing whether a couroutine is paused in the middle of execution or if it finished running. When the coroutine is about to be destructed you run it one last time and throw a special exception, triggering destruction of all RAII objects, which you catch at the coroutine entry point.

Passing uncaught exceptions from the coroutine up to the caller is also pretty easy, because it's all synchronous. You just need to wrap it so it can safely travel across the gap. You can restrict the exception types however you want. I chose to support only subclasses of std::exception and handle anything else as an unknown exception.

pjc50

> Passing uncaught exceptions from the coroutine up to the caller is also pretty easy, because it's all synchronous. You just need to wrap it so it can safely travel across the gap

This is also how dotnet handles it, and you can choose whether to rethrow at the caller site, inspect the exception manually, or run a continuation on exception.

gpderetta

> mustn't catch (...)

You could use the same trick used by glibc to implement unstoppable exceptions for POSIX cancellation: the exception rethrows itself from its destructor.

Sharlin

Thanks, that's interesting.

TuxSH

> every async "function call" heap allocates.

> require the STL

That it has to heap-allocate if non-inlined is a misconception. This is only the default behavior.

One can define:

void *operator new(size_t sz, Foo &foo)

in the coro's promise type, and this:

- removes the implicitly-defined operator new

- forces the coro's signature to be CoroType f(Foo &foo), and forwards arguments to the "operator new" one defined

Therefore, it's pretty trivial to support coroutines even when heap cannot be used, especially in the non-recursive case.

Yes, green threads ("stackful coroutines") are more straightforward to use, however:

- they can't be arbitrarily destroyed when suspended (this would require stack unwinding support and/or active support from the green thread runtime)

- they are very ABI dependent. Among the "few registers" one has to save FPU registers. Which, in the case of older Arm architectures, and codegen options similar to -mgeneral-regs-only (for code that runs "below" userspace). Said FPU registers also take a lot of space in the stack frame, too

Really, stackless coros are just FSM generators (which is obvious if one looks at disasm)

gpderetta

A stackful coroutine implementation has to save exactly the same registers that a stackless one has to: the live ones at the suspension point.

A pure library implementation that uses on normal function call semantics obviously needs to conservatively save at least all callee-save registers, but that's not the only possible implementation. An implementation with compiler help should be able to do significantly better.

Ideally the compiler would provide a built-in, but even, for example, an implementation using GCC inline ASM with proper clobbers can do significantly better.

loeg

21h

Stackful makes for cute demos, but you need huge per-thread stacks if you actually end up calling into Linux libc, which tends to assume typical OS thread stack sizes (8MB). (I don't disagree that some of the other tradeoffs are nice, and I have no love for C++20 coroutines myself.)

Trung0246

21h

Actually you don't even need ASM at all. Just need to have smart use of compiler built-in to make it truly portable. See my composable continuation implementation: https://godbolt.org/z/zf8Kj33nY

socalgal2

As an x-gamedev, suspect/resume/stackful coroutines made them too heavy to have several thousand of them running during a game loop for our game. At the time we used GameMonkey Script: https://github.com/publicrepo/gmscript

That was over 20 years ago. No idea what the current hotness is.

fluoridation

24h

Several thousand? What were you using them for? Coroutines' main utility is that they let you write complex code that pauses and still looks sensible, so for games, you'd typically put stuff like the behavior of an NPC in a coroutine. If you have thousands of things to put each in its own coroutine, they must have been really, really simple stuff. At that point, the cost of context switching can become significant.

socalgal2

> the cost of context switching can become significant.

Which is why a solution with no (or very tiny) context switching is preferred over one that's heavy to switch.

> they must have been really, really simple stuff

Yes, because they were low-overhead it was trivial to start them for all kinds of tiny things.

MisterTea

A much nicer code base to study is: https://swtch.com/libtask/

The stack save/restore happens in: https://swtch.com/libtask/asm.S

loeg

21h

Single OS thread only, FWIW (no M:N scheduling). And like any stackful implementation, requires relatively huge stack allocations if you actually call into stdlib, particularly things like getaddrinfo().

cherryteastain

Not an expert in game development, but I'd say the issue with C++ coroutines (and 'colored' async functions in general) is that the whole call stack must be written to support that. From a practical perspective, that must in turn be backed by a multithreaded event loop to be useful, which is very difficult to write performantly and correctly. Hence, most people end up using coroutines with something like boost::asio, but you can do that only if your repo allows a 'kitchen sink' library like Boost in the first place.

spacechild1

> that must in turn be backed by a multithreaded event loop to be useful

Why? You can just as well execute all your coroutines on a single thread. Many networking applications are doing fine with just use a single ASIO thread.

Another example: you could write game behavior in C++ coroutines and schedule them on the thread that handles the game logic. If you want to wait for N seconds inside the coroutine, just yield it as a number. When the scheduler resumes a coroutine, it receives the delta time and then reschedules the coroutine accordingly. This is also a common technique in music programming languages to implement musical sequencing (e.g. SuperCollider)

pjc50

Much of the original motivation for async was for single threaded event loops. Node and Python, for example. In C# it was partly motivated by the way Windows handles a "UI thread": if you're using the native Windows controls, you can only do so from one thread. There's quite a bit of machinery in there (ConfigureAwait) to control whether your async routine is run on the UI thread or on a different worker pool thread.

In a Unity context, the engine provides the main loop and the developer is writing behaviors for game entities.

nitwit005

> but I'd say the issue with C++ coroutines (and 'colored' async functions in general) is that the whole call stack must be written to support that.

You can call a function that makes use of coroutines without worrying about it. That's the core intent of the design.

That is, if you currently use some blocking socket library, we could replace the implementation of that with coroutine based sockets, and everything should still work without other code changes.

spacechild1

ASIO is also available outside of boost! https://github.com/chriskohlhoff/asio

lionkor

For anyone wondering; this isn't a hack, that's the same library, just as good, just without boost dependencies.

spacechild1

Thanks for pointing this out! This may not obvious not everybody.

Also, this is not some random GitHub Repo, Chris Kohlhoff is the developer of ASIO :)

hrmtst93837

They don't need a multithreaded event loop. Single-threaded schedulers cover plenty of game-style work without hauling in Boost, and the uglier part is that async colors the API surface and control flow in ways that make refactors annoying and big legacy codebases harder to reason about.

inetknght

> From a practical perspective, that must in turn be backed by a multithreaded event loop to be useful

Multithreaded? Nope. You can do C++ coroutines just fine in a single-threaded context.

Event loop? Only if you're wanting to do IO in your coroutines and not block other coroutines while waiting for that IO to finish.

> most people end up using coroutines with something like boost::asio

Sure. But you don't have to. Asio is available without the kitchen sink: https://think-async.com/Asio/

Coroutines are actually really approachable. You don't need boost::asio, but it certainly makes it a lot easier.

I recommend watching Daniela Engert's 2022 presentation, Contemporary C++ in Action: https://www.youtube.com/watch?v=yUIFdL3D0Vk

Davidbrcz

I use asio at work for coroutine. It's one of the most opaque library I've ever used. The doc is awful and impenetrable.

The most helpful resource about it is a guy on stackoverflow (sehe). No idea how to get help once SO will have closed

astrange

Ask Claude Code to write a manual for it.

nottorp

20h

> turns it into some sort of ugly state machine

Why are people afraid of state machines? There's been sooo much effort spent on hiding them from the programmer...

matheusmoreira

19h

They're essentially callable, stateful, structured gotos. Difficult to understand for the uninitiated.

For example, generators. Also known as semicoroutines.

https://langdev.stackexchange.com/a/834

This:

  generator fib() {
      a, b = 1, 2
      while (a<100) {
          b, a = a, a+b
          yield a
      }
      yield a-1
  }

Becomes this:

  struct fibState {
      a,
      b,
      position
  }

  int fib(fibState state) {
      switch (fibState.postion) {
          case 0:
              fibState.a, fibState.b = 1,2
              while (a<100) {
                  fibState.b, fibState.a = fibState.a, fibState.a+fibState.b
                  // switching the context
                  fibState.position = 1;
                  return fibState.a;
          case 1:
              }

              fibState.position = 2;
              return fibState.a-1
          case 2:
              fibState.position = -1;
      }
  }

The ugly state machine example presented in the article is also a manual implementation of a generator. It's as palatable to the normal programmer as raw compiler output. Being written in C++ makes it even uglier and more complicated.

The programming language I made is a concrete example of what programming these things manually is like. I had to write every primitive as a state machine just like the one above.

https://www.matheusmoreira.com/articles/delimited-continuati...

nottorp

13h

What you've given is an example of how to implement a coroutine though.

Not of how to write a state machine based application without hiding the state machine behind abstractions.

tliltocatl

11h

Because they are unstructured and non-modular. And yes, graphical notation sucks.

abcde666777

More broadly the dimension of time is always a problem in gamedev, where you're partially inching everything forward each frame and having to keep it all coherent across them.

It can easily and often does lead to messy rube goldberg machines.

There was a game AI talk a while back, I forget the name unfortunately, but as I recall the guy was pointing out this friction and suggesting additions we could make at the programming language level to better support that kind of time spanning logic.

manoDev

This is more evident in games/simulations but the same problem arises more or less in any software: batch jobs and DAGs, distributed systems and transactions, etc.

This what Rich Hickey (Clojure author) has termed “place oriented programming”, when the focus is mutating memory addresses and having to synchronize everything, but failing to model time as a first class concept.

I’m not aware of any general purpose programming language that successfully models time explicitly, Verilog might be the closest to that.

gopher_space

> I’m not aware of any general purpose programming language that successfully models time explicitly

Step 1, solve "time" for general computing.

The difficulty here is that our periods are local out of both necessity and desire; we don't fail to model time as a first class concept, we bring time-as-first-class with us and then attempt to merge our perspectives with varying degrees of success.

We're trying to rectify the observations of Zeno, a professional turtle hunter, and a track coach with a stopwatch when each one has their own functional definition of time driven by intent.

syncurrent

This timing additions to a language is also at the core of imperative synchronous programming languages like Este rel, Céu or Blech.

repelsteeltje

> There was a game AI talk a while back, I forget the name unfortunately, but as I recall the guy was pointing out this friction and suggesting additions we could make at the programming language level to better support that kind of time spanning logic.

Sounds interesting. If it's not too much of an effort, could you dig up a reference?

abcde666777

You're in luck - it's the first talk at this link, "The Polling Problem": https://www.gdcvault.com/play/1018040/Architecture-Tricks-Ma...

Mind you my memory may have distorted it a little beyond what it was, but it's loosely on the topic!

truepricehq

[dead]

twoodfin

As the author lays out, the thing that made coroutines click for me was the isomorphism with state machine-driven control flow.

That’s similar to most of what makes C++ tick: There’s no deep magic, it’s “just” type-checked syntactic sugar for code patterns you could already implement in C.

(Occurs to me that the exceptions to this … like exceptions, overloads, and context-dependent lookup … are where C++ has struggled to manage its own complexity.)

HarHarVeryFunny

If you need to implement an async state machine, couldn't that just as easily be done with std::future? How do coroutines make this cleaner/better?

LatencyKills

std::future doesn't give you a state machine. You get the building blocks you have to assemble into one manually. Coroutines give you the same building blocks but let the compiler do the assembly, making the suspension points visible in the source while hiding the mechanical boilerplate.

This is why coroutine-based frameworks (e.g., C++20 coroutines with cppcoro) have largely superseded future-chaining for async state machine work — the generated code is often equivalent, but the source code is dramatically cleaner and closer to the synchronous equivalent.

(me: ex-Visual Studio dev who worked extensively on our C++ coroutine implementation)

HarHarVeryFunny

It doesn't seem like a clear win to me. The only "assembly" required with std::future is creating the associated promise and using it to signal when that async step is done, and the upside is a nice readable linear flow, as well as ease of integration (just create a thread to run the state machine function if want multiple in parallel).

With the coroutine approach using yield, doesn't that mean the caller needs to decide when to call it again? With the std::future approach where it's event driven by the promise being set when that state/step has completed.

LatencyKills

You are describing a single async step, not a state machine. "Create a promise, set it when done", that's one state. A real async state machine has N states with transitions, branching, error handling, and cleanup between them.

> "The only 'assembly' required is creating the associated promise"

Again, that is only true for one step. For a state machine with N states you need explicit state enums or a long chain of .then() continuations. You also need to the manage the shared state across continuations (normally on the heap). You need to manage manual error propagation across each boundary and handle the cancellation tokens.

You only get a "A nice readable linear flow" using std:future when 1) using a blocking .get() on a thread, or 2) .then() chaining, which isn't "nice" by any means.

Lastly, you seem to be conflating a co_yield (generator, pull-based) with co_await (event-driven, push-based). With co_await, the coroutine is resumed by whoever completes the awaitable.

But what do I know... I only worked on implementing coroutines in cl.exe for 4 years. ;-)

HarHarVeryFunny

I only mentioned co_yield() since that's what the article was (ab)using, although perhaps justifiably so. It seems the coroutine support was added to C++ in a very flexible way, but so low level as to be daunting/inconvenient to use. It needs to have more high level facilities (like Generators) built on top.

What I was thinking of as a state machine with using std::future was a single function state machine, using switch (state) to the state specific dispatch of asynch ops using std::future, wait for completion then select next state.

LatencyKills

23h

> as to be daunting/inconvenient to use

I don't even know how to respond to that. How in the world are you using C++ professionally if you think coroutines are "daunting"? No one uses C++ for it's "convenience" factor. We use it for the power and control it affords.

> What I was thinking of as a state machine with using std::future was a single function state machine, using switch (state) to the state specific dispatch of asynch ops using std::future, wait for completion then select next state.

Uh huh. What about error propagation and all the other very real issues I mentioned that you are just ignoring? Why not just let the compiler do all the work the way it was spec'ed and implemented?

HarHarVeryFunny

21h

So it's meant to be inconvenient, and that's the only right and proper way?!

Sounds more like punishment than software design, but each to their own.

LatencyKills

10h

I get what you’re saying, but you kicked off this thread like an expert — even though you knew you were talking to someone who helped build the very thing you’re critiquing.

It’s pretty clear you’ve never built a production-grade async state machine.

C++ is designed to provide the plumbing, not the kitchen sink. It’s a language for building abstractions, not handing them to you — though in practice, there’s a rich ecosystem if you’d rather reuse than reinvent.

That flexibility comes at the cost of convenience, which is why most new engineers don’t start with C++.

What you call “intimidating,” I call powerful. If coroutines throw you off, you’re probably using the wrong language.

Last thought — when you run into someone who’s built the tools you rely on, ask them questions instead of trying to lecture them. I would have been more than happy to work through a pedagogical solution with you.

/ignored

HarHarVeryFunny

> It’s pretty clear you’ve never built a production-grade async state machine.

Haha .. you have no idea.

FWIW I've built frameworks exactly for that, and it's highly likely that you've unwittingly used one of them.

LatencyKills

Uh huh. The person who gets confused by how co_wait() actually works and thinks that coroutines are "intimidating" wrote frameworks that I would have used to build our C++ compiler. Do you not understand that cl.exe doesn't use external frameworks? lmfao

HarHarVeryFunny

I said used, as in used in your everyday life, by interacting with computer systems whose backend implementations you are blissfully ignorant of.

But yeah, if you want to win arguments then arguing against yourself and your own hallucinations, is in your case the best way to go.

physPop

I feel like thats really oversellign coro -- theres still a TON of boilerplate

LatencyKills

My response specifically addressed the question of why you might choose one option over the other.

Do you believe that std::future is the better option?

sta1n

12h

[dead]

BSTRhino

21h

This is one reason why I built coroutines into my game programming language Easel (https://easel.games). I think they let you keep the flow of the code matching the flow of the your logic (top-to-bottom), rather than jumping around, and so I think they are a great tool for high-level programming. The main thing is stopping the coroutines when the entity dies, and in Easel that is done by implying ownership from the context they are created in. It is quite a cool way of coding I think, avoids the state machines like the OP stated, keeps everything straightforward step-by-step and so all the code feels more natural in my opinion. In Easel they are called behaviors if anyone is interested in more detail: https://easel.games/docs/learn/language/behaviors

wiseowise

Looking at C++ made me understand the point of Rust.

pjc50

Always jarring to see how Unity is stuck on an ancient version of C#. The use of IEnumerable as a "generator" mechanic is quite a good hack though.

tyleo

Unity is currently on C# 9 and that IEnumerable trick is no longer needed in new codebases. async is properly supported.

Deukhoofd

Thankfully they are actively working towards upgrading, Unity 6.8 (they're currently on 6.4) is supposed to move fully towards CoreCLR, and removing Mono. We'll then finally be able to move to C# 14 (from C# 9, which came out in 2020), as well as use newer .NET functionality.

https://discussions.unity.com/t/coreclr-scripting-and-ecs-st...

Rohansi

One annoying piece of Unity's CoreCLR plan is there is no plan to upgrade IL2CPP (Unity's AOT compiler) to use a better garbage collector. It will continue to use Boehm GC, which is so much worse for games.

pjc50

Why wouldn't they use the GC that comes with the dotnet AOT runtime?

pjmlp

Probably because the AOT runtime doesn't run on game consoles, straight out of the box.

Capcom has their own fork of .NET for the Playstation, for example.

I don't know what kind of GC they implemented.

Rohansi

They just haven't announced any plans to do so yet. They might one day.

They will not be using .NET AOT probably ever though. Unity's AOT basically supports full C# (reflection etc) while .NET opted to restrict it and lean more on generated code.

pjmlp

For several years now, I wonder if it will ever happen.

Deukhoofd

Well, this is at least an update from earlier this month with a clear roadmap of how they're going to get there. There's hope!

Philip-J-Fry

>The use of IEnumerable as a "generator" mechanic is quite a good hack though.

Is that a hack? Is that not just exactly what IEnumerable and IEnumerator were built to do?

jayd16

It feels hacky because you have to (had to?) use it as the async/await tool and because of that the types you're generating and how they are handled is a huge mess.

Really you're generating the vague concept of a yield instruction but you can return other coroutines that are implicitly run and nest your execution... Because of this you can't wait less than a frame so things are often needlessly complicated and slow.

It's like using a key to jam a door shut. Sure a key is for keeping doors closed but...

debugnik

Not that ancient, they just haven't bothered to update their coroutine mechanism to async/await. The Stride engine does it with their own scheduler, for example.

Edit: Nevermind, they eventually bothered.

nananana9

Unity has async too [1]. It's just that in a rare display of sanity they chose to not deprecate the IEnumerator stuff.

[1] https://docs.unity3d.com/6000.3/Documentation/ScriptReferenc...

debugnik

Oh I totally missed this, thanks! I was overly confident they wouldn't have bothered, given how long it was taking. The last time I used Unity was 2022.3, which was apparently the last version without Awaitable.

Rohansi

It's ancient. The latest version of Unity only partially supports C# 9. We're up to C# 14 now. But that's just the language version. The Mono runtime is only equivalent to .NET Framework 4.8 so all of the standard library improvements since .NET (Core) are missing. Not directly related to age but it's performance is also significantly worse than .NET. And Unity's garbage collector is worse than the default one in Mono.

debugnik

The runtime is absolutely ancient, but I think the version number says more about C#'s churn than about how outdated the language version is. Take my opinion on C# with a grain of salt, though, I was an F#-er until the increasing interop pains forced me to drop it.

Rohansi

There were also a lot of performance improvements to .NET over the last few years.

ahoka

IIRC generators and co-routines are equivalent in a sense that you can implement one with the other.

Sharlin

Generators are a subset of coroutines that only yield data in one direction. Full coroutines can also receive more input from the caller at every yield point.

repelsteeltje

Not too different from C++'s iterator interface for generators, I guess.

tliltocatl

11h

Stackless coroutines in C when? As an embedded dev, I miss them deeply. Certainly not enough RAM to give a separate stack for everything and rewriting every async call as a callback sequence sucks.

appstorelottery

20h

I've been doing a lot of work with ECS/Dots recently and once I wrapped my head around it - amazing.

I recall working on a few VR projects - where it's imperative that you keep that framerate solid or risk making the user physically sick - this is where really began using coroutines for instantiating large volumes of objects and so on (and avoiding framerate stutter).

ECS/Dots & the burst compiler makes all of this unnecessary and the performance is nothing short of incredible.

bullen

Coroutines generally imply some sort of magic to me.

I would just go straight to tbb and concurrent_unordered_map!

The challenge of parallelism does not come from how to make things parallel, but how you share memory:

How you avoid cache misses, make sure threads don't trample each other and design the higher level abstraction so that all layers can benefit from the performance without suffering turnaround problems.

My challenge right now is how do I make the JVM fast on native memory:

1) Rewrite my own JVM. 2) Use the buffer and offset structure Oracle still has but has deprecated and is encouraging people to not use.

We need Java/C# (already has it but is terrible to write native/VM code for?) with bottlenecks at native performance and one way or the other somebody is going to have to write it?

pjc50

> C# (already has it but is terrible to write native/VM code for?)

What do you mean here? Do you mean hand-writing MSIL or native interop (pinvoke) or something else?

bullen

No I meant this but for C# is a whole lot more complex:

http://move.rupy.se/file/jvm.txt

themafia

> some sort of magic to me.

Your stack is on the heap and it contains an instruction pointer to jump to for resume.

pjmlp

As I mentioned on the Reddit thread,

This is quite understandable when you know the history behind how C++ coroutines came to be.

They were initially proposed by Microsoft, based on a C++/CX extension, that was inspired by .NET async/await implementation, as the WinRT runtime was designed to only support asynchronous code.

Thus if one knows how the .NET compiler and runtime magic works, including custom awaitable types, there will be some common bridges to how C++ co-routines ended up looking like.

mgaunard

Coroutines is just a way to write continuations in an imperative style and with more overhead.

I never understood the value. Just use lambdas/callbacks.

usrnm

> Just use lambdas/callbacks

"Just" is doing a lot of work there. I've use callback-based async frameworks in C++ in the past, and it turns into pure hell very fast. Async programming is, basically, state machines all the way down, and doing it explicitly is not nice. And trying to debug the damn thing is a miserable experience

mgaunard

You can embed the state in your lambda context, it really isn't as difficult as what people claim.

The author just chose to write it as a state machine, but you don't have to. Write it in whatever style helps you reach correctness.

Sharlin

You still need the state and the dispatcher, even if the former is a little more hidden in the implicit closure type.

affenape

Not necessarily. A coroutine encapsulates the entire state machine, which might pe a PITA to implement otherwise. Say, if I have a stateful network connection, that requires initialization and periodic encryption secret renewal, a coroutine implementation would be much slimmer than that of a state machine with explicit states.

spacechild1

> Just use lambdas/callbacks.

Lol, no thanks. People are using coroutines exactly to avoid callback hell. I have rewritten my own C++ ASIO networking code from callback to coroutines (asio::awaitable) and the difference is night and day!

socalgal2

I'll take the bait. Here's a coroutine

    waitFrames(5); // wait 5 frames
    fireProjectile();
    waitFrames(15);
    turnLeft(-30/*deg*/, 120); // turn left over 120 frames
    waitFrames(10);
    fireProjectile();
    // spin and shoot
    for (i of range(0, 360, 60)) {
      turnRight(60, 90);  // turn 60 degrees over 90 frames
      fireProjectile();
    }

10 lines and I get behavior over time. What would your non-coroutine solution look like?

mgaunard

Given a coroutine body

``` int f() { a; co_yield r; b; co_return r2; } ```

this transforms into

``` auto f(auto then) { a; return then(r, [&]() { b; return then(r2); }); }; ```

You can easily extend this to arbitrarily complex statements. The main thing is that obviously, you have to worry about the capture lifetime yourself (coroutines allocate a frame separate from the stack), and the syntax causes nesting for every statement (but you can avoid that using operator overloading, like C++26/29 does for executors)

spacechild1

23h

How is this better than the equivalent coroutine code? I don't see any upsides from a user's perspective.

> The main thing is that obviously, you have to worry about the capture lifetime yourself

This is a big deal! The fact that the coroutine frame is kept alive and your state can just stay in local variables is one of the main selling points. I experienced this first-hand when I rewrote callback-style C++ ASIO code to the new coroutine style. No more [self=shared_from_this()] and other shenanigans!

mgaunard

Using shared_ptr everywhere is an antipattern.

The whole point of controlling the capture is controlling the memory layout, which is what C++ is all about.

Even with Asio, you don't really have to do this. It's just the style the examples follow, and Asio itself isn't necessarily the best design.

krackers

15h

Isn't this basically what javascript went through with Promise chaining "callback hell" that was cleaned up with async/await (and esbuild can still desugar the latter down to the former)

mgaunard

This is literally what coroutines are, syntactic sugar to generate nested lambdas.

Except in C++ this removes a fair amount of control given how low-level it is.

jayd16

You can structure coroutines with a context so the runtime has an idea when it can drop them or cancel them. Really nice if you have things like game objects with their own lifecycles.

For simple callback hell, not so much.

Sharlin

Did you read the article? As the author says, it becomes a state machine hell very quickly beyond very simple examples.

kccqzy

I just don’t agree that it always becomes a state machine hell. I even did this in C++03 code before lambdas. And honestly, because it was easy to write careless spaghetti code, it required a lot more upfront thought into code organization than just creating lambdas willy-nilly. The resulting code is verbose, but then again C++ itself is a fairly verbose language.

duped

The value is fewer indirect function calls heap allocations (so less overhead than callbacks) and well defined tasks that you can select/join/cancel.

DonHopkins

The Unity editor does not let you examine the state hidden in your closures or coroutines. (And the Mono debugger is a steaming pile of shit.)

Just put your state in visible instance variables of your objects, and then you will actually be able to see and even edit what state your program is in. Stop doing things that make debugging difficult and frustratingly opaque.

jayd16

Use Rider or Visual Studio. Debugging coroutines should be easy. You just can't step over any yield points so you need to break after execution is resumed. It's mildly tedious but far from impossible.

bradrn

In Haskell this technique has been called ‘reinversion of control’: http://blog.sigfpe.com/2011/10/quick-and-dirty-reinversion-o...

sagebird

>> To misquote Kennedy, “we chose to focus coroutines on generator in C++23, not because it is hard, but because it is easy”.

Appreciate this humor -- absurd, tasteful.

nice_byte

23h

I don't know, I'm not convinced with this argument.

The "ugly" version with the switch seems much preferable to me. It's simple, works, has way less moving parts and does not require complex machinery to be built into the language. I'm open to being convinced otherwise but as it stands I'm not seeing any horrible problems with it.

nulltrace

20h

Switch is fine until you hit five or six states with cleanup in each branch. Then it's just a worse version of what coroutines give you for free.

Animats

21h

Most game engines seem to have some coroutine kludge.

djmips

19h

The 'primitive' SCUMM language used for writing Adventure Games like Maniac Mansion had coroutines - an ill fated attempt to convert to using Python was hampered by Python (at the time) having no support for yield.

troad

18h

I did not know that, that's neat. Are there any blog posts or articles that go deeper into this?

djmips

11h

That's a great idea.

maltyxxx

[dead]

rando-guy

[dead]

sta1n

17h

[dead]

FpUser

I do not find so called "green threads" useful at all. In my opinion except some very esoteric cases they serve no purpose in "native" languages that have full access to all OS threading and IO facilities. Useful only in "deficient" environments like inherently single threaded request handlers like NodeJS.

bigstrat2003

16h

Yeah I agree that user threads are overused by programmers. For most situations, using an OS thread is going to be far easier to work with. People like to cite the drawback that context switching overhead becomes a problem when you have thousands of threads, but the reality is that most people are not writing software that needs to handle many thousands of users all at once. Using green threads to handle such large scale instead of OS threads is a prime example of YAGNI.

tliltocatl

11h

Besides context switching, another issue with OS threads is that you need to carefully use synchronization primitives and watch for missing barriers. Green threads are a bit more forgiving.

momocowcow

No serious devs even uses Unity coroutines. Terrible control flow and perf. Fine for small projects on PC.

krajzeg

Echoing the thoughts of the only current sibling comment: lots of "serious" developers (way to gatekeep here) definitely use coroutines, when they make sense. As mentioned, it's one of the best ways to have something update each frame for a short period of time, then neatly go away when it's not needed anymore. Very often, the tiny performance hit you take is completely outweighed by the maintanability/convenience.

DonHopkins

...and then crash when any object it was using gets deleted while it's still running, like when the game changes scenes, but it becomes a manual, error-prone process to track down and stop all the coroutines holding on to references, that costs much more effort than it saves.

I've been a serious Unity developer for 16 years, and I avoid coroutines like the plague, just like other architectural mistakes like stringly typed SendMessage, or UnityScript.

Unity coroutines are a huge pain in the ass, and a lazy undisciplined way to do things that are easy to do without them, using conventional portable programming techniques that make it possible to prevent edge conditions where things fall through the cracks and get forgotten, where references outlive the objects they depend on ("fire-and-forget" gatling foot-guns).

Coroutines are great -- right up until they aren’t.

They give you "nice linear code" by quietly turning control flow into a distributed state machine you no longer control. Then the object gets destroyed, the coroutine keeps running, and now you’re debugging a null ref 200 frames later in a different scene with an obfuscated call stack and no ownership.

"Just stop your coroutines" sounds good until you realize there’s no coherent ownership model. Who owns it? The MonoBehaviour? The caller? The scene? Every object it has a reference to? The thing it captured three yields ago? The cure is so much worse than the disease.

Meanwhile: No static guarantees about lifetime. No structured cancellation. Hidden allocation/GC from yield instructions. Execution split across frames with implicit state you can’t inspect.

Unity has a wonderful editor that lets you inspect and edit the state of the entire world: EXCEPT FOR COROUTINES! If you put your state into an object instead of local variables in a coroutine, you can actually see the state in the editor.

All of this to avoid writing a small explicit state machine or update loop -- Unity ALREADY has Update and FixedUpdate just for that: use those.

Coroutines aren’t "cleaner" -- they just defer the mess until it’s harder to reason about.

If you can't handle state machines, then you're even less equipped to handle coroutines.

kdheiwns

Never had a crash from that. When the GameObject is destroyed, the coroutine is gone. If you're using a coroutine to manage something outside the scope of the GameObject itself, that's a problem with your own design, not the coroutine itself.

It'd be like complaining about arrays being bad because if you pass a pointer to another object, nuke the original array, then try to access the data, it'll cause an error. That's kind of... your own fault? Got to manage your data better.

Unity's own developers use them for engine code. To claim it's just something for noobs is a bit of an interesting take, since, well, the engine developers are clearly using them and I doubt they're Unity noobs. They made the engine.

jayd16

23h

So if you need to conditionally tick something or you want to wait for an effect to finish, etc., you're using Update() with if() statements?

The same code in a coroutine hits the same lifecycle failures as Update() anyway. You don't gain any safety by moving it to Update().

> No structured cancellation.

Call StopCoroutine with the Coroutine object returned by StartCoroutine. Of course you can just pass around a cancellation token type thing as well.

> Hidden allocation/GC from yield instructions.

Hidden how? You're calling `new` or you're not.

Instead of fighting them, you should just learn how to use coroutines. They're a lot nicer than complicated logic in Update().

momocowcow

20h

Enjoy shipping console titles that run at a constant 60 fps with no GC.

Again, fine for pet projects on PC :)

jayd16

19h

I'll continue to bite... What AAA 60+fps mobile game written Unity without coroutines are you referring to?

jcelerier

12h

There's exactly 0 (zero) AAA games made with unity so it's going to be tough. They're all a terrible lag fest no matter how they're implemented

Arch485

I dunno, I've worked on some pretty big projects that have used lots of coroutines, and it's pretty easy to avoid all of the footguns.

I'm not advocating for the ubiquitous use of coroutines (there's a time and place), but they're like anything else: if you don't know what you're doing, you'll misuse them and cause problems. If you RTFM and understand how they work, you won't have any issues.

DonHopkins

They're a crutch for people who don't know what they're doing, so of course they invite a whole host of problems that are harder to solve than doing it right in the first place.

If you strictly require people to know exactly what they're doing and always RTFM and perfectly understand how everything works, then they already know well enough to avoid coroutines and SendMessage and UnityEvents and other footguns in the first place.

It's much easier and more efficient to avoid all of the footguns when you simply don't use any of the footguns.

bob1029

> Who owns it? The MonoBehaviour? The caller? The thing it captured three yields ago?

The monobehavior that invoked the routine owns it and is capable of cancelling it at typical lifecycle boundaries.

This is not a hill I would die on. There's a lot of other battles to fight when shipping a game.

DonHopkins

And then you're bending over backwards and have made so much more busy work for yourself than you would have if you'd just done it the normal way, in which all your state would be explicitly visible and auditable in the editor.

The biggest reason for using Unity is its editor. Don't do things that make the editor useless, and are invisible to it.

The problem with coroutines is that they generate invisible errors you end up shipping and fighting long after you shipped your game, because they're so hard to track down and reproduce and diagnose.

Sure you can push out fixes and updates on Steam, but how about shipping games that don't crash mysteriously and unpredictably in the first place?

kdheiwns

In all of my years of professional game dev, I can verify that this is not even remotely true. They're used basically everywhere. They're very common when you need something to update for a set period of time but managing the state outside a very local context would just make the code a mess.

Unity's own documentation for changing scenes uses coroutines

voidUpdate

Just out of interest, how many serious unity devs have you talked to?

DonHopkins

I've talked to some non-serious unity devs, like Peter Molyneux...

https://news.ycombinator.com/item?id=47110605

>1h 48m 06s, with arms spread out like Jesus H Christ on a crucifix: "Because we can dynamically put on ANY surface of the cube ANY image we like. So THAT's how we're going to surprise the world, is by giving clues about what's in the middle later on."

https://youtu.be/24AY4fJ66xA?t=6486

Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Moo!

https://www.gamedeveloper.com/design/the-i-curiosity-i-exper...

>"I'm jealous that [Molyneux] made a more boring clicking game than I did." -Ian Bogost

>"I also think Curiosity was brilliant and inspired. But that doesn't make it any less selfish or brazen. Curiosity was not an experiment. 'Experiment' is a rhetorical ruse meant to distract you from the fact that it's promotional." -Ian Bogost

mjr00

Molyneux is obviously a well-known gamedev figure, but he's always been much more on the design side than programming side, as opposed to someone like Carmack or even J Blow. I wouldn't take his opinions on minutiae like coroutines as authoritative.

appstorelottery

20h

Lol. I met him at a Unity Conference in Amsterdam I think it was 2013? To be honest I was a bit star struck, but then I saw the scandal of that click click click cube... something about careful about meeting your heroes...

sta1n

[dead]

Crafted by Rajat

Source Code

hckrnws

Looking at Unity made me understand the point of C++ coroutines