Files
greenlight/README.md

392 lines
23 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Endpoints
| Method | URL Pattern | Handler | Action |
|--------|-----------------|---------------------|--------------------------------------|
| GET | /v1/healthcheck | healthCheckHandler | Show application information |
| GET | /v1/movies | listMoviesHandler | Show the details of all movies |
| POST | /v1/movies | createMoviesHandler | Create a new movie |
| GET | /v1/movies/:id | showMovieHandler | Show the details of a specific movie |
| PUT | /v1/movies/:id | editMovieHandler | Edit the details of a specific movie |
| DELETE | /v1/movies/:id | deleteMovieHandler | Delete a specific movie |
# Installation
## Launch API
`go run ./cmd/api`
If you want, you can also verify that the command-line flags are working correctly by specifying alternative **port** and **env** values when starting the application.
When you do this, you should see the contents of the log message change accordingly. For example :
`go run ./cmd/api -port=3030 -env=production`
**time=2025-10-10T11:08:00.000+02:00 level=INFO msg=
"starting server" addr=:3030 env=production**
## Test endpoints
`curl -i localhost:4000/v1/healthcheck`
The *-i* flag in the command above instructs curl to display the HTTP response headers as well as the response body.
### Result
HTTP/1.1 200 OK
Date: Mon, 05 Apr 2021 17:46:14 GMT
Content-Length: 58
Content-Type: text/plain; charset=utf-8
status: available
environment: development
version: 1.0.0
## API Versioning
There are two comon approaches to doing this :
1. By prefixing all URLs with your API version, like **/v1/healthcheck** or **/v2/healthcheck**
2. By using custom **Accept** and **Content-Type** headers on requests and responses to convey the API version, like **Accept: application/vnd.greenlight-v1**
From an HTTP semantics point of view, using headers to convey the API version is the 'purer' approach. But from a user-experience point of view, using a URL prefix is arguably better. It makes it possible for developers to see which version of the API is being used at a glance, and it also means that the API can still be explored using a regular web browser (which is harder if custom headers are required).
## SQL Migrations
The first thing we need to do is generate a pair of _migration files_ using the **migrate create** command :
```bash
migrate create -seq -ext=.sql -dir=./migrations create_movies_table
```
In this command:
- The **-seq** flag indicates that we want to use sequential numbering like **0001, 0002, ...** for the migration files (instead of a Unix timestamp, which is the default).
- The **-ext** flag indicates that we want to give the migration files the extension **.sql**.
- The **-dir** flag indicates that we want to store the migration files in the **./migrations** directory (which will be created automatically if it doesn't already exist).
- The name **create_movies_table** is a descriptive label that we give the migration files to signify their contents.
### Executing the migrations
```bash
migrate -path=./migrations -database=$GREENLIGHT_DB_DSN up
```
---
Note: You may get the error: **error: pq: permission denied for schema public...** when running this command. It's because Postgres might revoke the **CREATE** permission from all users except a database ownser.
To get around this, set the database owner to the **greenlight** user:
```sql
ALTER DATABASE greenlight OWNER TO greelight;
```
If that still doesn't work, try explicitly granting the **CREATE** privileges to the **greenlight** user:
```sql
GRANT CREATE ON DATABASE greenlight TO greelight;
```
---
The **schema_migrations** table is automatically generated by the **migrate** tool and used to keep track of which migrations have been applied.
```
greenlight => SELECT * FROM schema_migrations;
version | dirty
-----------------
2 | f
```
The **version** column here indicates that our migration files up (and including) number **2** in the sequence have been executed against the database. The value of **dirty** column is **false**, which indicates that the migration files were cleanly executed _without any errors_ and the SQL statements they contain were successfully applied in _full_.
You can run the **\d** meta command on the **movies** table to see the structure of the table and confirm the **CHECK** constraints were created correctly.
### Migrating to a specific version
As an alternative to looking at the **schema_migrations** table, if you want to see which migration version your database is currently on, you can run the **migrate** tool's **version** command like so :
```bash
$ migrate -path=./migrations -database=$EXAMPLE_DSN version
2
```
You can also migrate up or down to a specific version by using the **goto** command:
```bash
$ migrate -path=./migrations -database=$EXAMPLE_DSN goto 1
```
### Executing down migrations
You can use the down command to roll-back by a specific number of migrations. For example, to rollback the _most recent migration_, you would run :
```bash
$ migrate -path=./migrations -database=$EXAMPLE_DSN down 1
```
Generally, prefer **goto** command to perform roll-backs (as it's more explicit about the target version) and reserve use of the **down** command for rolling-back _all migrations_, like so:
```bash
$ migrate -path=./migrations -database=$EXAMPLE_DSN down
Are you sure you want to apply all down migrations? [y/N]
y
Applying all down migrations
2/d create_bar_table (39.38729ms)
1/d create_foo_table (59.29829ms)
```
Another variant of this is the **drop** command, which will remove all tables from the database including the **schema_migrations** table - but the database itself will remain, [along with anything else that has been created](https://github.com/golang-migrate/migrate/issues/193) like sequences and enums. Because if this, using **drop** can leave your database in a messy and unknown state, and it's generally better to stick with the **down** command if you want to roll back everything.
### Fixing errors in SQL migrations
When you run a migration that contains an error, all SQL statements up to the erroneous one will be applied and then the **migrate** tool will exit with a message describing the error. Similar to this :
```bash
$ migrate -path=./migrations -database=$EXAMPLE_DSN up
1/u create_foo_table (39.38729ms)
2/u create_bar_table (78.29829ms)
error: migration failed: syntax error at end of input in line 0: CREATE TABLE (details: pq syntax error at end of input)
```
If the migration file which failed contained multiple SQL statements, then it's possible that the migration file was **partially** applied before the error was encountered. In turn, this means that the database is in an unknown state as far as the **migrate** tool is concerned.
Accordingly, the **version** field in the **schema_migrations** field will contain the number for the failed migration and the **dirty** field will be set to **true**. At this point, if you run another migration (**even a "down" migration**) you will get an error message similar to this:
```bash
Dirty database version {X}. Fix and force version.
```
What you need to do is investigate the original error and figure out if the migration file which failed was partially applied. If it was, then you need to manually roll-back the partially applied migration.
Once that's done, then you must also 'force' the version number in the schema_migrations table to the correct value. For example, to force the database version number to 1 you should use the force command like so :
```bash
$ migrate -path=./migrations -database=$EXAMPLE_DSN force 1
```
Once you force the version, the database is considered 'clean' and you should be able to run migrations again without any problem.
### Remote migration files
The migrate tool also supports reading migration files from remote sources including Amazon S3 and GitHub repositories. For example :
```bash
$ migrate -source="s3://<bucket>/<path" -database=$EXAMPLE_DSN up
$ migrate -source="github://owner/repo/path#ref" -database=$EXAMPLE_DSN up
$ migrate -source="github://user:personal-access-token@owner/repo/path#ref" -database=$EXAMPLE_DSN up
```
More information about this functionality and a full list of the supported remote resources can be [found here](https://github.com/golang-migrate/migrate#migration-sources).
### $N notation
A nice feature of the PostgreSQL placeholder parameter **$N** notation is that you can use the same parameter value in multiple places in your SQL statement. For example, it's perfectly acceptable to write code like :
```go
// This SQL statement uses the $1 parameter twice, and the value `123` will be used in both locations where $1 appears
stmt := "UPDATE foo SET bar = $1 + $2 WHERE bar = $1"
err := db.Exec(stmt, 123, 456)
if err != nil {
// ...
}
```
### Executing multiple statements
Occasionally, you might find yourself in the position where you want to execute more than one SQL statement in the same database call, like this :
```go
stmt := `
UPDATE foo SET bar = true;
UPDATE foo SET bar = false;`
err := db.Exec(stmt)
if err != nil {
// ...
}
```
Having multiple statements in the same call is supported by the **pq** driver, _so long as the statements do not contain any placeholder parameters_. If they do contain placeholder parameters, then you'll receive the following error message at runtime:
```bash
pq: cannot insert multiple commands into a prepared statement
```
To work around this, you will need to either split out the statements into separate database calls, or if that's not possible, you can create a [custom function](https://www.postgresql.org/docs/current/xfunc-sql.html) in PostgreSQL which acts as a wrapper around the multiple SQL statements that you want to run.
### Why not use an unsigned integer for the movie ID?
The first reason is that PostgreSQL doesn't have unsigned integers. Instead, it's best to align the integer types based on the following table :
| PostgreSQL type | Go type |
|-----------------------|-----------------------------------------------------|
| smallint, smallserial | int16 (-32768 to 32767) |
| integer, serial | int32 (-2147483648 to 2147483647) |
| bigint, bigserial | int64 (-9223372036854775808 to 9223372036854775807) |
The second reason is that Go's **database/sql** package doesn't actually support any integer values greater than 9223372036854775807 (the maximum value for an **int64**). Its possible that a **uint64** value could be greater than this, which would in turn lead to Go generating a runtime error similar to this:
```bash
sql: converting argument $1 type: uint64 values with high bit set are not supported.
```
By sticking with an int64 in our Go code, we eliminate the risk of ever encountering this error.
## Additional Information
### How different Go Types are encoded
The following table summarizes how different Go types are mapped to JSON data types during encoding :
| Go type | JSON type |
|---------------------------------------------------|----------------------------|
| bool | JSON boolean |
| string | JSON string |
| int*, uint*, float*, rune | JSON number |
| array, slice | JSON array |
| struct, map | JSON object |
| nil pointers, interface values, slices, maps, etc | JSON null |
| chan, func, complex* | Not supported |
| time.Time | RFC3339-format JSON string |
| []byte | Base64-encoded JSON string |
The last two of these are special cases which deserve a bit more explanation :
- Go **time.Time** values (which are actually a struct behind the scenes) will be encoded as a JSON string in RFC 3339 format like **"2020-11-08T06:27:59+01:00"**, rather than as a JSON object.
- A **[]byte** slice will be encoded as a base64-encoded JSON string, rather than as a JSON array. So, for example, a byte slice of **[]byte{'h','e','l','l','o'}** would appear as **"aGVsbG8="** in the JSON output. The base64 encoding uses padding and the standard character set.
A few other important things to mention :
- Encoding of nested objects is supported. So, for example, if you have a slice of structs in Go that will encode to an *array of objects* in JSON.
- Channels, functions and **complex** number types cannot be encoded. If you try to do so, you'll get a **json.UnsupportedTypeError** error at runtime.
- Any pointer values will encode as *the value pointed to*.
### Enveloping responses
The data of the endpoint **/v1/movies/123** is nested under the key "movie", rather than being the top-level JSON object itself.
Enveloping response data like this isn't strictly necessary, and whether you choose to do so is partly a matter of style and taste. But there are a few tangible benefits :
1. Including a key name (like "movie") at the top-level of the JSON helps make the response more self-documenting. For any humans who see the response out of context, it is a bit easier to understand what the data relates to.
2. It reduces the risk of errors on the client side, because it's harder to accidentally process one response thinking that it is something different. To get at the data, a client must explicitly reference it via the "movie" key.
3. If we always envelope the data returned by our API, then we mitigate a security vulnerability in older browsers which can arise if you return a JSON array as a response.
### Advanced JSON Customization
_When Go is encoding a particular type to JSON, it looks to see if the type has a **MarshalJSON()** method implemented on it. If it has, then Go will call this method to determine how to encode it._
Strictly speaking, when Go is encoding a particular type to JSON it looks to see if the type satisfies the json.Marshaler interface, which looks like this :
`type Marshaler interface {
MarshalJSON() ([]byte, error)
}`
If the type does satisfy the interface, then Go will call its **MarshalJSON()** method and use the []byte slice that it returns as the encoded JSON value.
If the type doesn't have a **MarshalJSON()** method, then Go will fall back to trying to encode it to JSON based on its own internal set of rules.
So, if we want to customize how something is encoded, all we need to do is implement a **MarshalJSON()** method on it which returns a _custom JSON representation of itself_ in a **[]byte** slice.
An example is available here : **internal/data/runtime.go**
### Supported destinations types
It's important to mention that certain JSON types can only be successfully decoded to certain Go types. For example, if you have the JSON string **"foo""** it can be decoded into a Go **string**, but trying to decode it into a Go **int** or **bool** will result in an error at runtime.
The following tables shows the supported target decode destinations for the different JSON types :
| JSON type | Supported Go types |
|--------------|---------------------------|
| JSON boolean | bool |
| JSON string | string |
| JSON number | int*, uint*, float*, rune |
| JSON array | array, slice |
| JSON object | struct, map |
### Triaging the Decode error
The Decode() method could potentially return the following five types of error :
- **json.SyntaxError** : There is a syntax problem with the JSON being decoded.
- **io.ErrUnexpectedEOF** : There is a syntax problem with the JSON being decoded.
- **json.UnmarshalTypeError** : A JSON value is not appropriate for the destination Go type.
- **json.InvalidUnmarshalError** : The decode destination is not valid (usually because it is not a pointer). This is actually a problem with our application code, not the JSON itself.
- **io.EOF** : The JSON being decoded is empty.
### System-generated error responses
In certain scenarios Go's **http.Server** may still automatically generate and send plain-text HTTP responses. These scenarios include when :
- The HTTP request specifies an unsupported HTTP protocol version.
- The HTTP request contains a missing or invalid **Host** header, or multiple **Host** headers.
- The HTTP request contains an empty **Content-Length** header.
- The HTTP request contains an unsupported **Transfer-Encoding** header.
- The size of the HTTP request headers exceeds the server's **MaxHeaderBytes** setting.
- The client makes a HTTP request to an HTTPS server.
For example, if we try sending a request with an invalid **Host** header value, we will get a response like this:
`$ curl -i -H "Host: こんにちは" http://localhost:4000/v1/healthcheck
HTTP/1.1 400 Bad Request: malformed Host header
Content-Type: text/plain; charset=utf-8
Connection: close
400 Bad Request: malformed Host header`
Unfortunately, these responses are hard-coded into the Go Standard library, and there's nothing we can do to customize them to use JSON instead.
But while this is something to be aware of, it's not necessarily something to worry about. In a production environment it's relatively unlikely that well-behaved, non-malicious, clients would trigger these responses anyway, and we shouldn't be overly concerned if bad clients are sometimes set a plain-text response instead of JSON.
### Panic recovery in other goroutines
It's important to realize that our middleware will only recover panics that happen in the _same goroutine that executed the **recoverPanic()** middleware_.
If, for example, you have a handler which spins up another goroutine (e.g. to do some background processing), then any panics that happen in the background goroutine will not be recovered - not by the **recoverPanic()** middleware... and not by the panic recovery build into **http.Server**. These panics will cause your application to exit and bring down the server.
So, if you are spinning up additional goroutines from within your handlers and there is any chance of a panic, you **must make sure** that you recover any panics from within those goroutines too.
A demonstration will follow when we will use a background goroutine to send welcome emails to our API users.
### Panicking vs returning errors
The decision to panic in the **readJSON()** helper if we get a **json.InvalidUnmarshalError** error isn't taken lightly. It's generally considered best practice in Go to return your errors and handle them gracefully.
But - _in some specific circumstances_ - it can be OK to panic. And you shouldn't be too dogmatic about _not panicking_ when it makes sense to.
It's helpful here to distinguish between the two classes of error that your application might encounter.
The first class of errors are _expected errors_ that may occur during normal operation. Some examples of expected errors are those caused by a database query timeout, a network resource being unavailable, or bad user input. These errors don't necessarily mean there is a problem with your program itself - in fact, they're often caused by things outside the control of your program. Almost all the time, it's good practice to return these kinds of errors and handle them gracefully.
The other class of errors are _unexpected errors_.These are errors which should not happen during normal operation, and if they do it is probably the result of a developer mistake or a logical error in your codebase.These errors are truly exceptional, and using _panic_ in these circumstances is more widely accepted. In fact, the Go standard library frequently does this when you make a logical error or try to use the language features in an unintended way - such as when trying to access an out-of-bounds index in a slice, or trying to close an already closed channel.
I'd recommend trying to return and gracefully handle unexpected errors in most cases. The exception to this is when _returning the error_ adds an unacceptable amount of error handling to the rest of your codebase.
'Go By example' page on panics summarizes all of this quite nicely:
_A panic typically means something went unexpectedly wrong. Mostly we use it to fail fast on errors that shouldn't occur during normal operation and that we aren't prepared to handle gracefully._
### Performance
**json.MarshalIndent()** takes 65% longer to run and uses around 30% more memory than **json.Marshal()**, as well as making two more heap allocations. Those figures will change depending on what you're encoding, but they're fairly indicative of the performance impact.
For most applications this performance difference simply isn't something that you need to worry about. In real terms, we're talking about a few thousandths of a millisecond - and the improved readability of responses is probably worth this trade-off.
But if your API is operating in a very resource-constrained environment, or needs to manage extremely high levels of traffic, then this is worth being aware of, and you may prefer to stick with using **json.Marshal()** instead.
#### Optimizing PostgreSQL settings
The default settings that PostgreSQL ships with are quite conservative, and you can often improve the performance of your database by tweaking the values in your **postgresql.conf** file.
You can check where your **postgresql.conf** file lives with the following SQL query :
``` postgresql
$ sudo -u postgres psql -c 'SHOW config_file;'
----------------------------------------------
/etc/postgresql/15/main/postgresql.conf
(1 row)
```
To read more about PostgreSQL optimization :
https://www.enterprisedb.com/postgres-tutorials/how-tune-postgresql-memory
To generate suggested values based on your available system hardware, you can use :
https://pgtune.leopard.in.ua
#### Configuring the Database Connection Pool
1. You should explicitly set a **MaxOpenConns** value. This should be comfortably below any hard limits on the number of connections imposed by your database and infrastructure, and you may also want to consider keeping it fairly low to act as a rudimentary throttle.
For this project we'll set a **MaxOpenConns** limit of 25 connections. This is a reasonable starting point for small-to-medium web applications and APIs, but ideally you should tweak this value for your hardware depending on the results of benchmarking and load-testing.
2. In general, higher **MaxOpenConns** and **MaxIdleConns** values will lead to better performance. But the returns are diminishing, and you should be aware that having a too-large idle connection pool (with connections that are not frequently re-used) can actually lead to reduced performance and unnecessary resource consumption.
Because **MaxIdleConns** should always than or equal to **MaxOpensConns**, we'll also limit **MaxIdleConns** to 25 connections for this project.
3. To mitigate the risk from point 2 above, you should generally set a ConnMaxIdleTime value to remove idle connections that haven't been used for a long time. In this project we'll set a ConnMaxIdleTime duration of 15 minutes.
4. It's probably OK to leave **ConnMaxLifetime** as unlimited, unless your database imposes a hard limit on connection lifetime, or you need it specifically to facilitate something like gracefully swapping databases. Neither of those things apply in this project, so we'll leave this as the default unlimited setting.