decoders

Building your own

Learn how to define custom decoders using .refine() and .transform().

When defining new decoders, it's important to understand the difference between what values it accepts vs which values it returns. In many cases these are the same. This can make it confusing to notice there even is a difference. For example, the string decoder accepts strings, and also returns (those same) strings.

This isn't always automatically the case, though. Some random examples:

This decoder......accepts...but returns...so its type is
stringstringsstringsDecoder<string>
emailstringsstringsDecoder<string>
numbernumbersnumbersDecoder<number>
integernumbersnumbersDecoder<number>
isoDatestringsDate instancesDecoder<Date>
urlstringsURL instancesDecoder<URL>
truthyanything!booleansDecoder<boolean>

From the type definition, you can always tell what the decoder will return. You cannot tell from the type what input values it will accept. You'll need to read the documentation or look at the implementation to know which values are going to get accepted or rejected.

Defining a new decoder

The easiest way to define a new decoder, is to define it in terms of an existing one that already accepts (at least) all of the values you want your new decoder to accept, and then narrow it down.

The tl;dr is:

  • Start from an existing decoder that already accepts (at least) all inputs that you want to be accepting
  • Optionally, narrow down what will get accepted by adding extra criteria with .refine()
  • Optionally, to change what your custom decoder returns, use .transform()

Now, let's build a few custom decoders to illustrate the above.

Example 1: a "max length" string

To build a decoder accepting strings up until a maximum number of characters, use the built-in sized() helper:

Try it
sized(string, { max: 280 }).verify(input)
InputResult
"Hello world!"
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx..." [truncated] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Too long, must be at most 280 chars
123 ^^^ Must be string

The sized() helper also supports min, or an exact size.

Example 2: a truncating "max length" string

The example above will reject strings that are too long. Suppose you don't want to reject those strings, but instead just chop them off at the given max length. In that case, you want to not change what values it accepts, but what values it returns. You do that by adding a .transform() function:

Try it
function truncated(size) {  return string.transform((s) => s.substring(0, size));}truncated(23).verify(input)
InputResult
"hello"
"long strings will trunc"
123 ^^^ Must be string

Compared to example 1, you can see how the "lorem ipsum" string now gets accepted (but returned in truncated form).

Example 3: Accepting Wordle words

Suppose you want to build your own Wordle clone. At the boundary of your program, you'll want to enforce that these are 5-letter words with only alphabetical letters.

We want to build this as a decoder that:

  • Accepts strings containing exactly 5 alphabetical characters
  • Returns those in uppercased form (even when not inputted as such)

To define the acceptance criteria, we'll use a regex() decoder. This will enforce that only alphabetical chars are used, and that there are exactly 5 of them (it's important that the regex pattern is anchored using ^ and $ for that). Also, we'll accept case-insensitive input with the i flag.

Then, we'll transform any accepted words to uppercase automatically.

Try it
const wordle =  regex(/^[a-z]{5}$/i, 'Must be 5-letter word')    .transform((s) => s.toUpperCase());wordle.verify(input)
InputResult
"SWEET"
"SPACE"
"Sp@cE" ^^^^^^^ Must be 5-letter word
"hi" ^^^^ Must be 5-letter word

Example 4: Making a transformation reusable

The wordle example above will uppercase the output before returning it. Suppose that you want to use that on other string decoders as well. Do you all just stick .transform() after those?

You can define this as a higher-order decoder which works for any string decoder (aka Decoder<string>). Simply define this as a function if you want to make writing these easier:

function lower(decoder) {
  return decoder.transform((s) => s.toLowerCase());
}

function upper(decoder) {
  return decoder.transform((s) => s.toUpperCase());
}

Then you can use reuse them anywhere in your decoders.

Try it
const tag = regex(/^\w+$/, 'Must be single word');const thing = object({  email: lower(email),  labels: array(upper(tag)),});thing.verify(input)
InputResult
{ "email": "user@example.org", "labels": [ "EASY" ] }
{ "email": "invalid@email", ^^^^^^^^^^^^^^^ Must be email "labels": [ "ok", ], }

Example 5: Sanitizing messy inputs

While I would not recommend going overboard with this, you can perform light parsing to clean up messy inputs. For example, if you have to handle messy data from an incoming webhook that you have no control over, you can use a decoder at the boundary to not only validate those inputs, but also to tidy things up in the same pass.

For example, suppose you have an incoming webhook that looks like this:

{
  "events": [
    { "id": 1, "created_at": "2022-02-01T08:12:29Z", "labels": "urgent, delayed" },
    { "id": 2, "created_at": null, "labels": "" },
    { "id": 3, "labels": null }
  ]
}

Suppose we want to clean up some data on the way in:

  1. We store all ids as strings internally, so we'll want to transform those numeric IDs
  2. The created_at field (an ISO8601-formatted string) is sometimes null and sometimes missing completely. When it's missing, we'll want to treat it as if it was null
  3. The labels argument represents a list of tags we'll want to treat as structural data, so we'll want to convert these values to an array of strings

Handling the id field

We'll want to look at the id field as containing an "ID" data type, not a number. We can define an id decoder for this:

const id: Decoder<string> = either(
  string,
  positiveInteger.transform((n) => String(n)),
);

This decoder will play nicely if ever this vendor will switch to id strings in the future.

Handling the created_at field

We'll want to look at the created_at field as a Date | null value. So let's use the following decoder:

nullish(isoDate, null);

Wait, why not use nullable(isoDate) here?! The reason is the third event in the example. Because the field can legally be missing, we'll have to explicitly accept both undefined and null inputs. That's what the nullish() decoder does! Its second argument is a convenience default value that nullish values will get normalized to.

Handling the labels field

We'll want to look at the labels field as an array of strings, but we're given a string. (A potentially null or empty string, even.)

We can build a comma-separated helper decoder like so:

const commaSeparated = string.transform((s) => s.split(',').filter(Boolean));

Putting it all together

Using the helper decoders defined above, we can put it all together this way:

const eventsDecoder = object({
  events: array(
    object({
      id,
      created_at: nullish(isoDate, null),
      labels: nullable(commaSeparated, []),
    }),
  ),
});

eventsDecoder.verify(/* JSON example from above */);
// => {
//   events: [
//     {
//       id: '1',
//       created_at: new Date('2022-02-01T08:12:29Z'),
//       labels: ['urgent', 'delayed'],
//     },
//     { id: '2', created_at: null, labels: [] },
//     { id: '3', created_at: null, labels: [] },
//   ],
// }

A note on naming

To make decoders maximally useful, refrain from naming decoders after the field they're used for. Think of a decoder as the description of a data type, and name them accordingly. That's why the decoder in the example above is called commaSeparated and not labelsDecoder or something like that! In the case of the id field, it happens to be also named id because it makes sense to think of it as an ID data type. That the field also happens to be named id is a coincidence.

Keep edge cases outside your decoders

Try to keep nulls, undefineds, or other edge cases outside of the decoder as much as possible, and instead wrap them in nullable()s where you use them in your call sites - typically in those big object() decoders.

Take the example above. It would be easy to let the commaSeparated decoder be infected with the null-case and handle it too. But this is less composable. Keeping the null-case outside of that decoder makes it a smaller, and thus a more reusable, building block. It's cheap to wrap it in a nullable() where you put it all together.

On this page