XState

My current side project, Discord Plays Pokemon, has a lot of dependencies. The application streams video with Discord, which presents a challenge. Discord does not provide APIs for streaming video, and I didn’t want to have to reverse-engineer the client. I chose to automate interactions with Discord’s web application using Selenium, which has yielded great results. It can programmatically stream video to a specific voice channel. This has worked very well so far — users are able to play real-time games of Pokemon with each other just using Discord’s text chat!

A screenshot of Discord text chat with two users inputting commands

Two users can input commands at the same time. D means to simulate a down-button press once, and 10r means to simulate a right-button press ten times.

If the command is valid, the bot will react to the message with a 👍 once the command is applied to the game.

I brute-forced a lot of this code to get the bot working quickly. While it works well, adding new features was not easy. I wanted only to have the bot stream if a user was in the voice chat. This requires tracking the state of the bot. Was the bot able to log in to Discord? Is the bot streaming or not? Has there been any error? How do I switch between browser tabs since there is one tab for Discord and another for the browser-based emulator?

Switching between tabs was an easy problem to fix. I created two instances of Firefox — one for the stream and another for the video. This eliminated a whole class of errors at a slight performance cost.

A screenshot of Discord streaming Pokemon
Video is streamed in real-time with instant feedback for the applied inputs.

Tracking the state, however, was not something I wanted to do. There are a lot of subtle edge cases that I didn’t want to deal with. I felt state machines would be applicable, but I had never used them in TypeScript.

I found the XState project and immediately fell in love. The project is incredibly polished and has excellent support for VS Code and TypeScript. I ported over my old code to a state machine, although understanding the concepts that XState introduced took some time.

A screenshot of VS Code with a code pane to the left and a state machine diagram to the right.
XState integrates well with VS Code.

I was surprised at how helpful the XState VS Code plugin was. It allows me to see a diagram of my state machine. You can simulate state transitions to understand what your state machine will do.

The XState VS Code extension lets you step through your state transitions.

Aside from the coolness of the extension, the library itself is quite polished and well-documented. Porting over my old code was simple because of how well XState integrates with promises.

Here’s an example of the state machine’s state for starting a Discord video stream. The method in src is invoked when the starting_stream state is reached. Once the promise is complete, onDone is called, which transitions the machine to the streaming state.

starting_stream: {
    invoke: {
    src: async ({ driver }, _event) => {
        await joinVoiceChat(driver);
        return await shareScreen(driver);
    },
    onDone: {
        target: "streaming",
    },
    onError: {
        target: "is_error",
        actions: (_context, event) => {
            console.error(event);
        },
      },
    },
},
The state for starting a Discord stream.

I even wrote some quick unit tests to ensure it works properly. This test was much easier to write than tests without a state machine.

test("able to reach the streaming state", (done) => {
  const actor = interpret(streamMachine)
    .onTransition((state) => {
      if (state.matches("is_ready")) {
        actor.send({ type: "start_stream" });
      }
      if (state.matches("is_streaming")) {
        done();
      }
    });
  actor.start();
});
Unit testing a state machine is straightforward. This would’ve been a lot more code without XState!

Hooking the entire thing up to the application wasn’t hard, either. This allows the bot to enter the voice channel and stream only when people are in the channel.

const stream = interpret(streamMachine);

stream.start();

handleChannelUpdate(async (channel_count) => {
    if (channel_count > 0) {
        stream.send({ type: "start_stream" });
    } else {
        stream.send({ type: "end_stream" });
    }
}
A complex set of interactions become so easy.

Overall, using XState feels like a huge win. I can be more confident about how I interact with Selenium and Discord. I hope to move more of my application to XState, which will significantly help when implementing new input methods and notification systems.