Building your own custom decoders
When defining new decoders, it’s important to understand the difference between what values it accepts vs which values it returns. In many cases these are the same. This can make it confusing to notice there even is a difference. For example, the string
decoder accepts strings, and also returns (those same) strings.
This isn’t always automatically the case, though. Some random examples:
This decoder… | …accepts | …but returns | …so its type is |
---|---|---|---|
string | strings | strings | Decoder<string> |
email | strings | strings | Decoder<string> |
number | numbers | numbers | Decoder<number> |
integer | numbers | numbers | Decoder<number> |
iso8601 | strings | Date instances | Decoder<Date> |
url | strings | URL instances | Decoder<URL> |
truthy | anything! | booleans | Decoder<boolean> |
From the type definition, you can always tell what the decoder will return. You cannot tell from the type what input values it will accept. You’ll need to read the documentation or look at the implementation to know which values are going to get accepted or rejected.
Defining a new decoder
The easiest way to define a new decoder, is to define it in terms of an existing one that already accepts (at least) all of the values you want your new decoder to accept, and then narrow it down.
The tl;dr is:
- Start from an existing decoder that already accepts (at least) all inputs that you want to be accepting
- Optionally, narrow down what will get accepted by adding extra criteria with
.refine()
- Optionally, to change what your custom decoder returns, use
.transform()
Now, let’s build a few custom decoders to illustrate the above.
Example 1: a “max length” string
To build a decoder accepting strings up until a maximum number of characters, we can start from the base string
decoder. After all, we’re only interested in accepting string values.
What we want to do is putting an extra acceptance predicate on these string inputs. We use .refine()
for that.
const maxLength = string.refine(
(s) => s.length <= 20,
'Too long. Must be at most 20 characters',
);
We can generalize this by making this a function:
function maxLength(size: number): Decoder<string> {
return string.refine(
(s) => s.length <= size,
`Too long. Must be at most ${size} characters`,
);
}
const max20 = maxLength(20);
const max50 = maxLength(50);
...
Notice how this decoder works when you pass it arbitrary inputs:
const max20 = maxLength(20);
// 👍
max20.verify('hi'); // 'hi'
// 👎
max20.verify('Lorem ipsum dolor sit amet.'); // throws "Too long. Must be at most 20 characters"
max20.verify(123); // throws "Must be string"
Note that you get the rejection of numbers and other non-string inputs for free if you build your decoder this way. The "Must be string"
guarantee and error message is provided by the base string
decoder.
Example 2: a truncating “max length” string
The example above will reject strings that are too long. Suppose you don’t want to reject those strings, but instead just chop them off at the given max length. In that case, you want to not change what values it accepts, but what values it returns. You do that by adding a .transform()
function:
function truncated(size: number): Decoder<string> {
return string.transform((s) => s.substring(0, size));
}
const str20 = truncated(20);
// 👍
str20.verify('hi'); // 'hi'
str20.verify('Lorem ipsum dolor sit amet.'); // 'Lorem ipsum dolor si'
// 👎
str20.verify(123); // throws "Must be string"
Compared to example 1, you can see how the “lorem ipsum” string now gets accepted (but returned in truncated form).
Example 3: Accepting Wordle words
Suppose you want to build your own Wordle clone, because that’s what the world needs. At the boundary of your program, you’ll want to enforce that these are 5-letter worlds with only alphabetical letters.
We want to build this as a decoder that:
- Accepts strings containing exactly 5 alphabetical characters
- Return those in uppercased form (even when not inputted as such)
To define the acceptance criteria, we’ll use a regex()
decoder. This will enforce that only alphabetical chars are used, and that there are exactly 5 of them (it’s important that the regex pattern is anchored using ^
and $
for that). Also, we’ll accept case-insensitive input with the i
flag.
Then, we’ll transform any accepted words to uppercase automatically.
const wordle = regex(/^[a-z]{5}$/i, 'Must be 5-letter word').transform((s) =>
s.toUpperCase(),
);
// 👍
wordle.verify('Sweet'); // 'SWEET'
wordle.verify('space'); // 'SPACE'
// 👎
wordle.verify('Hey!!'); // throws "Must be 5-letter word"
wordle.verify('hi'); // throws "Must be 5-letter word"
wordle.verify(123); // throws "Must be string"
Example 4: Making a transformation reusable
The wordle example above will uppercase the output before returning it. Suppose that you want to use that on other string decoders as well. Do you all just stick .transform()
after those?
You can define this as a higher-order decoder which works for any string decoder (aka Decoder<string>
). Simply define this as a function if you want to make writing these easier:
function uppercase(decoder: Decoder<string>): Decoder<string> {
return decoder.transform((s) => s.toUpperCase());
}
import { email, regex, string } from 'decoders';
// These now all magically work
uppercase(string);
uppercase(regex(/^\w+$/, 'Must be single word'));
uppercase(email);
// Using this on non-string decoders makes no sense though
uppercase(number); // TypeError
uppercase(array(number)); // TypeError
This may make your object definitions super readable:
const tag = regex(/^\w+$/, 'Must be single word');
const thing = object({
email: uppercase(email),
labels: array(uppercase(tag)),
});
thing.verify({ email: 'user@example.org', labels: ['easy'] });
// => { email: 'USER@EXAMPLE.ORG', labels: ['EASY'] })
Example 5: Sanitizing messy inputs
While I would not recommend going overboard with this, you can perform light parsing to clean up messy inputs. For example, if you have to handle messy data from an incoming webhook that you have no control over, you can use a decoder at the boundary to not only validate those inputs, but also to tidy things up in the same pass.
For example, suppose you have an incoming webhook that looks like this:
{
"events": [
{ "id": 1, "created_at": "2022-02-01T08:12:29Z", "labels": "urgent, delayed" },
{ "id": 2, "created_at": null, "labels": "" },
{ "id": 3, "labels": null }
]
}
Suppose we want to to clean up some data on the way in:
- We store all
id
s as strings internally, so we’ll want to transform those numeric IDs - The
created_at
field (an ISO8601-formatted string) is sometimesnull
and sometimes missing completely. When it’s missing, we’ll want to treat it as if it wasnull
- The
labels
argument represents a list of tags we’ll want to treat as structural data, so we’ll want to convert these values to an array of strings
Handling the id
field
We’ll want to look at the id
field as containing an “ID” data type, not a number. We can define an id
decoder for this:
const id: Decoder<string> = either(
string,
positiveInteger.transform((n) => String(n)),
);
This decoder will play nicely if ever this vendor will switch to id
strings in the future.
Handling the created_at
field
We’ll want to look at the created_at
field as a Date | null
value. So let’s use the following decoder:
nullish(iso8601, null);
Wait, why not use nullable(iso8601)
here?! The reason is the third event in the example. Because the field can legally be missing, we’ll have to explicitly accept both undefined
and null
inputs. That’s what the nullish()
decoder does! Its second argument is a convenience default value that nullish values will get normalized to.
Handling the labels
field
We’ll want to look at the labels
field as an array of strings, but we’re given a string. (A potentially null
or empty string, even.)
We can build a comma-separated helper decoder like so:
const commaSeparated = string.transform((s) => s.split(',').filter(Boolean));
Putting it all together
Using the helper decoders defined above, we can put it all together this way:
const eventsDecoder = object({
events: array(
object({
id,
created_at: nullish(iso8601, null),
labels: nullable(commaSeparated, []),
}),
),
});
eventsDecoder.verify(... /* JSON example from above */);
// => {
// events: [
// {
// id: '1',
// created_at: new Date('2022-02-01T08:12:29Z'),
// labels: ['urgent', 'delayed'],
// },
// { id: '2', created_at: null, labels: [] },
// { id: '3', created_at: null, labels: [] },
// ],
// }
A note on naming
To make decoders maximally useful, refrain from naming decoders after the field they’re used for. Think of a decoder as the description of a data type, and name them accordingly. That’s why the decoder in the example above is called commaSeparated
and not labelsDecoder
or something like that! In the case of the id
field, it happens to be also named id
because it makes sense to think of it as an ID data type. That the field also happens to be named id
is a coincidence.
Keep edge cases outside your decoders
Try to keep null
s, undefined
s, or other edge cases outside of the decoder as much as possible, and instead wrap them in nullable()
s where you use them in your call sites—typically in those big object()
decoders.
Take the example above. It would be easy to let the commaSeparated
decoder be infected with the null
-case and handle it too. But this is less composable. Keeping the null
-case outside of that decoder makes it a smaller, and thus a more reusable, building block. It’s cheap to wrap it in a nullable
where you put it all together.