August 32022

Building a Toy Project

I learned a whole bunch building Check Run Reporter that I'd like to share, but even if I wanted to make Check Run Reporter open source, it's more complex than would be helpful. Instead, I'm planning to build a toy project to show off a bunch of the things I figured out, including:

basic repository setup covering things like eslint, persistent vs ephemeral artifiacts (e.g., check in compiled build tools, but not production artifacts), typechecking, etc
how to setup buildkite
how to setup GitHub Actions?
automated cloudformation deploys using multiple AWS accounts
separate dev/test/prod subdomains
docs first API design using apiary and dredd
how to interact with dynamodb
GraphQL
DynamoDB for User Sessions
AWS Pinpoint And I'm also planning to use this project to learn some things like
how to build a websockets api
using websockets with Remix
how to use AWS EventBridge for more than just scheduled tasks
Cognito for authentication
Authzed for authorization

So, now that I've layed out those goals, what the hell am I building? I'm going to build a clone of the nist randomness beacon. To be clear, I don't intend to provide the cryptographic or randomness guarantees that the original claims, I'm just looking for an example project that's well defined, not too complex but not too trivial, and inclues realtime data.

In order to cover a few of the requirements above, I'll tweak the nist project slightly. Specifically, historical values will only be available to authenticated users and values will be available using an RSS feed in addition to the WebSocket API.

So, what will I actually build:

A REST API matching the NIST Randomness Beacon API
A GraphQL API based on the NIST Randomness Beacon API
A Remix-based Web App using GraphQL Subscriptions for live data

Designs

Before starting a project like this (i.e., a project where I know the complete feature set in advance), I like to design the API, data models, and permissions up front.

API Blueprint

Even though API Blueprint is maintained by a mostly-dead subsidary of Oracle, it's still the most pleasant way to design a documentation-first API.

GraphQL Data Schema?

This is an idea I've been noodling on for a while, though it may end up being more complicated than it's worth for this project. DynamoDB is effectively schemaless. You can using tools like dynamodb-toolbox to add runtime enforcement on your datamodels, but I've been kinda underwhelmed by its typesafety and I found it doesn't do the "least surprising thing" in a lot of cases. I'd like to using a combination of GraphQL schema, custom directives, and GraphQL codegen to design my data schema in graphql and then generate typesafe clients for loading that data. Notably, I don't necessarily intend to access DynamoDB with a graphql client, I just want to use the schema language to define my data models.

GraphQL Public Schema

Separate from the data schema, I want to provide a GraphQL API alongside the Rest API. Mostly, I want the GraphQL API so it can do the heavy lifting on web sockets.

Authzed Permissions

Authzed is based on Google's Zanzibar paper, providing fine-grained authorization as a service. You provide it with a config file explaining your user and data relationships and send it facts about your data and your users whenever those facts change (and since outages happen, it's also good to send those facts as a daily sync job). Since it's receiving all this information upfront, it can maaterialize most of what it needs to know so it can provide really tight SLAs at runtime.

When someone tries to access data, Authzed can quickly determine if that user is authorized to access that data with just the user id, data id, and, potentially, the subcomponent of that data that the user is trying access.

Roadmap

Obviously, I can't build all that at one shot and I'd like to write separate blogposts going into detail on each step.

I hope to build things in roughly the order below, but as we get closer to the end of this roadmap, we get more into stuff that's new to me, so there are definitely some unknowns in how this'll all work. In any case, these are probably the blog posts I eventually hope to write on this topic.

Repository Setup and Minimal CI tooling

I've found npm workspaces to be really helpful in dealing with lambda projects that deploy lambdas, so I'll first setup a basic monorepo with things like eslint and typechecking and I'll setup Buildkite CI

I'll be using Buildkite instead of GitHub Actions because it'll be deploying into AWS. Since Buildkite build agents run in my AWS accounts, it's much easier to configure secure deployments. Without writing a lot of custom automation (or relying on questionable third-parties), you need to add AWS credentials to your GitHub Org or Repository secrets, which just gets tedious.

Trivial Ping Endpoint and Multi Account Deployment with DNS

Next, we'll create a trivial ping endpoint based on API Gateway and Lambda. We won't really ever use this endpoint in production (though we'll deploy it), but it's really useful for confirming that the rest of our tooling (transpilation, permissions, deployment, testing).

This'll be one of the spots where I experiment with something new. I'd really like to work out how to use delegation zones to have dev, test, and prod hostnames so that each account can have real DNS records.

Apiary and Dredd

Apiary has been on life support for a few years now and Dredd is, admittely, not the most helpful test runner for communicating failures, but API Blueprint still seems to be the nicest way to initially design an API and Dredd seems like really the only test runner out there that can read an API design and check if it works.

Typesafe APIs

So far, our API is pretty trivial, but it's going to have real models at some point and I don't like multiple sources of truth for model shapes. Since I'll be defining the API using OpenAPI imported into CloudFormation to specify the app's endpoints, it'd be nice to extract the endpoints' request and response models from there for use within our app

Open Telemetry

Open Telemetry is really great idea with a kinda not good SDK. There's a bunch of things that should happen automatically but don't, so I'll show how to write the wrappers that actually make it work as expected and I'll also show how to actually get it working with Lambda (which took a lot more trial and error than it should have).

Authorization using Authzed

I've been eyeing Authzed for a while now. It's a rule-based authorization framework and hosted service based on the Google's Zanzibar system.

Event driven data updates

I'll demonstrate how to execute Lambdas on a schedule using eventbridge.

API Implementation with Authorization

I'll build out the REST API modeled on the NIST Randomness Beacon spec using a custom authorizer to work out who the current user is and ask authzed for what they can do.

GraphQL Implementation

I'll build a GraphQL API similar to the REST API, but with subscription support so we can send live updates to our eventual webapp

Web App

I'll build a webap using Remix and all of the above to demonstrate the rest API, the GraphQL API, and Websocket-based GraphQL subscriptions.