Writing a CGI Script in Kotlin

...or more accurately, What Not to Do if You Want to Write CGI Scripts

Who?

Hi, I'm sschr15, a Kotlin fanatic and a member of this public circle.

Why?

Kotlin has a scripting feature. CGI scripts are scripts. A fantastic person is hosting a server that supports CGI scripts.

Therefore, I should write a CGI script in Kotlin and make it work tm.

How?

The big part: how was this done?

Part 1: The Basics

The CGI communication method provided by this particular server is rather friendly — it supports any file marked executable in a user's public_html/cgi-bin directory. Many others in this cyberspace use this to write CGI scripts in Scheme, Bash, and possibly other languages. As the avid Kotlin fan I am, I wanted to then write a CGI script using Kotlin.

The "Easy" Way

Kotlin's modern versions directly support calling any scripts ending in .main.kts without any additional setup. This makes it really easy to launch a Kotlin script from the command line:

kotlin my_script.main.kts

To make this easier, Kotlin's parser for .main.kts scripts will ignore a shebang line at the beginning, meaning this is a valid script!

#!/usr/bin/env kotlin
println("Hello, world!")

This is a great way to write scripts, but it is a little problematic for CGI scripts for a couple of reasons:

This requires not only Java to be installed, but also a standalone Kotlin installation. This may not be a problem for most people, but it is for this group as it's running on NixOS with not too much space. A full Kotlin standalone installation is rather large, especially if native compilation is included in the installation.
Kotlin is slow. Builds via a build system get around a large portion of this by using a daemon keeping the JVM and Kotlin's compiler running in the background. Running a Kotlin script directly means each launch requires the full startup time alongside compilation. When CGI scripts are expected to be speedy, this is a problem.
The server uses NixOS, and Kotlin on NixOS is rather odd: it launches a launcher which launches a launcher for launching a loader (loading Kotlin). fcgiwrap, the CGI runner used on this server, doesn't like this and fails to receive any output from the script.

Given all these issues, how in the world is one supposed to use Kotlin, a very slow-to-compile language without a speedy interpreter replacement, to write a CGI script?

Part 2: The Requirements

With all this said, the best option is clearly to write an entire library designed specifically for CGI scripts (though it works just as well in non-CGI scenarios). Such a library should have a few things:

It needs to be responsive. Any time a script is requested, it should immediately run with little to no startup time.
It should be extensible. Individual scripts should be able to add arbitrary dependencies for additional capabilities.
It should be easy to use. The library should be simple to use, with a minimal API that is easy to understand.
It needs to be (relatively) small. There's no need to have a full native system for simple scripting tasks.

Through about two weeks of work, I created a library that does a pretty decent job of achieving these goals.

Part 3: The Library

cgi-kts is a library that provides a minimal DSL For creating CGI responses as well as a framework for compiling and evaluating scripts.

The Daemon

The library's primary entrypoint is DaemonMain, which runs a daemon that will precompile scripts and evaluate them based on socket communications. It goes through a few steps while running:

Upon startup, it launches a coroutine that repeatedly checks in a given directory for scripts. If a script is found but has not been compiled or has been modified, it will compile the script. If a script that previously was compiled is no longer found, it will remove the compilation of the script.
- Compilation results are serialized to disk on each compilation to allow for quicker startup times. Despite the caching, the current code will recompile scripts shortly after loading.
While running, it listens for requests on a given Unix socket. When a request is received, it goes through a series of steps:
1. The requester sends sixteen bytes to represent the script name. It is one of:
  - The full script name¹ if it is sixteen bytes or fewer, or
  - The first nine bytes of the script name, code point u+0002, and the hash code of the script name if it is seventeen bytes or more. Any unused bytes are zero, along with a seventeenth byte that is always zero, meant for easy implementation in C-like languages.
2. The daemon finds the script in its cache. If the script is not found, it sends an error message back to the requester.
3. The script is evaluated. If requested, additional communication may occur to send environment variables from the requester to the script:
  1. The script requests an environment variable.
  2. The daemon sends the name of the environment variable.
  3. The requester sends the length of the environment variable, or -1 if it doesn’t exist.
  4. If the variable exists, the requester sends the value of the variable. No null terminator is sent alongside the value.
4. The script completes evaluation with response data. The daemon sends the response data to the requester.

¹ The script name is the name of the file, without the `.cgi.kts` extension used to identify the script.

The Requester

Within the repository, there is a small C implementation which can communicate with the daemon. It gets compiled with a SOCKET_LOCATION macro to specify the location of the Unix socket. If also compiled with RUN_AS_MAIN, it will input one argument as the script to run. If not, it will use its own name as the script to run.

Any implementation can work with the daemon, as long as it follows the protocol described in the previous section.

The "DSL"

Scripts written to interact with the library have a few utilities available:

respond(HttpStatusCode, ResponseBuilder.() -> Unit): This function is the primary way to respond to a request. It takes an HttpStatusCode and a block to create the response.
- html(HTML.() -> Unit): Within the block, this will create an HTML response. This sets the Content-Type and Content-Length headers automatically.
- headers(HeaderBuilder.() -> Unit): Within the block, this will set headers for the response.
  - cookie(...): Within a headers block, this will add a Set-Cookie header to the response. It has parameters for each cookie field.
- text(String): In case the response shouldn't be HTML, this will allow any custom response. This will not set the Content-Type header, but will still set the Content-Length header.
respond(String): In case the standard HTTP response isn't desired, another response can be sent with this function.
stop(): This function intentionally throws to halt the script sooner than its end. This is useful for stopping the script if an error occurs. Be sure to send any kind of response first, as not doing so will cause the daemon to crash.
A few standard CGI environment variables can be directly called:
- requestUri: The URI requested by the client.
- requestType: The request method (GET, POST, etc.).
- queryString: The query string of the request.
  - query: a map of the query string, with keys and values URL-decoded.
- userAgent: The user agent of the client.
requestEnvironmentVariable(String): This function will request an environment variable from the requester. It will return null if the variable doesn’t exist.
- Note that calling System.getenv(...) may give different results, as the daemon is not necessarily run by the same user that handles CGI requests.

Huh?

So, what's the point of all this? Why go through all this trouble to write a CGI script in Kotlin?

No reason.

In case you're curious, the source code for the CGI script running this page (and my other blog post pages if they ever come about) is available on the site.