Installing on Docker

Docker is an application that treats a whole Linux machine, including its operating system and installed applications, as a computer-within-a-computer, called a “container.” “Containers” are similar to a virtual machine in many respects. They are typically used for “shipping” applications. Instead of installing an application on a server directly, you can run the application in a “container.” This way, the application runs bundled with all of the operating system software that it needs. Installing applications is quicker, simpler, and less error-prone. There is virtually no performance degredation.

Docker is a good platform for trying out docassemble for the first time. It is also ideal in a production environment.

Since the docassemble application depends on so many different component parts, including a web server, SQL server, Redis server, distributed task queue, background task system, scheduled task system, and other components, running it inside of a Docker container is convenient. When all of these components are running inside of a “container,” you don’t have to do the work of installing and maintaining these components.

As much as Docker simplifies the process of installing docassemble, it takes some time to understand the concepts behind “running,” “stopping,” and “starting” containers.

Docker can also be used to deploy even the most complex docassemble installations. For example, Kubernetes or Amazon’s EC2 Container Service can be used to maintain a cluster of docassemble web server instances, created from Docker images, that communicate with a central server. For information about how to install docassemble in a multi-server arrangement, see the scalability section.

Docker is a complex and powerful tool, and the docassemble documentation is not a substitute for Docker documentation. If you are new to Docker, you should learn about Docker by reading tutorials or watching videos. Here is a brief cheat sheet based on loose real-world analogies:

  • Doing docker run is analogous to getting a Windows installation DVD, installing it on a computer with an empty hard drive, and then booting the computer for the first time.
  • Doing docker pull is analogous to going to a store and obtaining a Windows installation DVD.
  • Doing docker stop is analogous to turning off a computer (and forcibly unplugging it after a certain number of seconds after you initiate the shutdown from the Windows “start” menu).
  • Doing docker start is analogous to turning on a computer that already has Windows installed on it.
  • Doing docker rm is analogous to tossing a computer into a trash incinerator.
  • Doing docker rmi is analogous to tossing a Windows installation DVD into a trash incinerator.
  • Doing docker exec is analogous to sitting down at your computer and opening up PowerShell.
  • Doing docker ps is analogous to walking around your house and making a list of your computers.
  • Doing docker volume is analogous to doing things with USB drives.
  • Doing docker build is analogous to creating a Windows installation DVD based on the Windows source code.

In these analogies, a Docker “image” is analogous to a Windows installation DVD, a Docker “container” is analogous to a particular computer that runs Windows, and a Docker “volume” is (very loosely) analogous to a USB drive.

Choosing where to run Docker

Docker can be run on a Windows PC, a Mac, an on-site Linux machine, or a Linux-based virtual machine in the cloud. Since docassemble is a web application, the ideal platform is a Linux virtual machine in the cloud.

You can test out docassemble on a PC or a Mac, but for serious, long-term deployment, it is worthwhile to run it in the cloud, or on a dedicated on-premises server. Running Docker on a machine that shuts down or restarts frequently could lead to database corruption. Also, if you are using Docker Desktop, docassemble will run very slowly if you do not have an amd64-based processor. If your processor is an Apple M1 chip, or other ARM-based microprocessor, you should build the image first and then docker run it.

If you have never deployed a Linux-based virtual machine in the cloud before, this might be a good opportunity to learn. The ability to use virtual machines on a cloud provider like Amazon Web Services or Microsoft Azure is a valuable and transferable skill. Learning how to do cloud computing is beyond the scope of this documentation, but there are many guides on the internet. The basic steps of running Docker in the cloud are:

  1. Create an account with a cloud computing provider.
  2. Start a sufficiently powerful virtual machine that runs some flavor of Linux.
  3. Connect to the virtual machine using SSH in order to control it using a command line. This can be a complicated step because most providers use certificates rather than passwords for authentication.
  4. Install Docker on the virtual machine.

There are also methods of controlling cloud computing resources from a local command line, where you type special commands to deploy Docker containers. These can be very useful, but they tend to be more complicated to use than the basic Docker command line.

Installing Docker

First, make sure you are running Docker on a computer or virtual computer with at least 4GB of memory and 20GB of hard drive space. The docassemble installation will use up about 10GB of space, and you should always have at least 10GB free when you are running docassemble.

If you have a Windows PC, follow the Docker installation instructions for Windows. You will need administrator access on your PC in order to install (or upgrade) Docker.

If you have a Mac, follow the Docker installation instructions for OS X.

On Ubuntu (assuming username ubuntu):

sudo apt -y update
sudo apt -y install docker.io
sudo usermod -a -G docker ubuntu

On Amazon Linux (assuming the username ec2-user):

sudo yum -y update
sudo yum -y install docker
sudo usermod -a -G docker ec2-user

The usermod line allows the non-root user to run Docker. You may need to log out and log back in again for this new user permission to take effect. On some distributions, the docker group is not created by the installation process, so you will need to manually create it by running sudo groupadd docker before you run the usermod command.

Docker will probably start automatically after it is installed. On Linux, you many need to do sudo /etc/init.d/docker start, sudo systemctl start docker, or sudo service docker start.

The operating system that runs inside of the docassemble Docker container is Ubuntu 22.04. This is a fairly recent version of Ubuntu. When using Docker, it is recommended that you run a recent version of Docker and its dependencies (containerd and runC). Ubuntu 22 and Debian 11 are known to work well. (If you run Docker on Mac or Windows, it will likely start a virtual machine and then deploy the docassemble Docker container inside that virtual machine; the operating system of that virtual machine, which is likely a flavor of Linux, should be recent.) You may encounter difficult-to-diagnose problems if docassemble’s OS and software do not fully function inside the host operating system.

Quick start

If you just want to test out docassemble for the first time, follow the instructions in this section, and you’ll get docassemble up and running quickly in a Docker container, whether you are using a laptop or AWS.

However, you should think of this as an educational exercise; don’t start using the container for serious development work. For a serious implementation, you should deploy docassemble on a server (in the cloud or on-premises), and go through additional setup steps, such as configuring HTTPS for encryption and data storage for the safe, long-term storage of development data and user data.

Starting

Once Docker is installed, you can install and run docassemble from the command line.

To get a command line on Windows, run Windows PowerShell.

To get a command line on a Mac, launch the Terminal application.

To get a command line on a virtual machine in the cloud, follow your provider’s instructions for using SSH to connect to your machine.

From the command line, simply type in:

docker run -d -p 80:80 -p 443:443 --stop-timeout 600 jhpyle/docassemble

The docker run command will download and run docassemble, making the application available on the standard HTTP port (port 80) of your machine.

It will take several minutes for docassemble to download, and once the docker run command finishes, docassemble will start to run. After a few minutes, you can point your web browser to the hostname of the machine that is running Docker. If you are running Docker on your own computer, this address is probably http://localhost.

Note that the docassemble web interface is not available immediately after docker run is invoked. The server needs time to boot and initialize. On EC2, this process takes about one minute forty seconds, and it might be slower on other platforms. If you want to investigate what is happening on the server, see the troubleshooting section. (If you have an existing configuration in data storage, the boot process will take even longer because your software and databases will need to be copied from data storage and restored on the server).

If you are running Docker on AWS, the address of the server will be something like http://ec2-52-38-111-32.us-west-2.compute.amazonaws.com (check your EC2 configuration for the hostname). On AWS, you will need a Security Group that opens HTTP (port 80) to the outside world in order to allow web browsers to connect to your EC2 instance.

Using the web browser, you can log in using the default username (“[email protected]”) and password (“password”), and make changes to the configuration from the menu. You should also go to User List from the menu, click “Edit” next to the [email protected] user, and change that e-mail address to an actual e-mail address you can access.

In the docker run command, the -d flag means that the container will run in the background.

The -p flag maps a port on the host machine to a port on the Docker container. In this example, port 80 on the host machine will map to port 80 within the Docker container. If you are already using port 80 on the host machine, you could use -p 8080:80, and then port 8080 on the host machine would be passed through to port 80 on the Docker container.

The jhpyle/docassemble tag refers to a Docker image that is hosted on Docker Hub. The image is about 4GB in size, and when it runs, the container uses about 10GB of hard drive space. The jhpyle/docassemble image is based on the “master” branch of the docassemble repository on GitHub. It is rebuilt every time the minor version of docassemble increases.

Shutting down

You can shut down the container by running:

docker stop -t 600 <containerid>

By default, Docker gives containers ten seconds to shut down before forcibly shutting them down. Usually, ten seconds is plenty of time, but if the server is slow, docassemble might take a little longer than ten seconds to shut down. To be on the safe side, give the container plenty of time to shut down gracefully. The -t 600 means that Docker will wait up to ten minutes before forcibly shutting down the container. It will probably take no more than 15 seconds for the docker stop command to complete, although it can take as long as a minute to stop a container if you are using Azure Blob Storage.

It is very important to avoid a forced shutdown of docassemble. The container runs a PostgreSQL server (unless configured to use an external SQL server), and the data files of the server may become corrupted if PostgreSQL is not gracefully shut down. To facilitate data storage (more on this later), docassemble backs up your data during the shutdown process and restores from that backup during the initialization process. If the shutdown process is interrupted, your data may be left in an inconsistent state and there may be errors during later initialization.

To see a list of stopped containers, run docker ps -a. To remove a container, run docker rm <containerid>.

Restarting the container after a shutdown

If you have shut down a Docker container using docker stop -t 600, you can start the container again:

docker start <containerid>

Overview of the Docker container

There are a variety of ways to deploy docassemble with Docker, but this subsection will give an overview of the most common way, which is to use a single Docker container hosted on a cloud provider.

When you run docker run on the “image” jhpyle/docassemble, Docker will go onto the internet, download (“pull”) the jhpyle/docassemble image, create a new container using that image, and then “start” that container. However, first it will check to see if a copy of the jhpyle/docassemble image has already been downloaded, and if there is a copy already downloaded, it will create the container using that copy. This is important to keep in mind; when you run docker run, you might be thinking you will always get the most recent version, but that is not the case. (See upgrading, below, for more information.)

When the docassemble container starts, it runs one command:

/usr/bin/supervisord -n -c /etc/supervisor/supervisord.conf

(This is specified in the Dockerfile, if you are curious.)

This command starts an application called Supervisor. Supervisor is a “process control system” that starts up the various applications that docassemble uses, including:

  • A web server, NGINX, which is called nginx within the Supervisor configuration.
  • A application server, uWSGI, called uwsgi.
  • A background task system, Celery, consisting of two processes, celery and celerysingle.
  • A scheduled task runner, called cron.
  • A SQL server, PostgreSQL, called postgres.
  • A distributed task queue system, RabbitMQ, called rabbitmq.
  • An in-memory data structure store, Redis, called redis.
  • A watchdog daemon that looks for out-of-control processes and kills them, called watchdog.
  • A WebSocket server that supports the live help functionality, called websockets.
  • A [unoconv] server, called unoconv

In addition to starting background tasks, Supervisor coordinates the running of ad-hoc tasks, including:

  • A bare-bones web server called nascent that runs during the initialization process, so that the application responds on port 80.
  • A script called sync that consolidates log files in one place, to support the Logs interface.
  • A script called reset that restarts the server.
  • A script called update that installs and upgrades the Python packages on the system.

There is also a Supervisor service called syslogng, which is dormant on a single-server system. (The syslog-ng application is used in the multi-server arrangement to consolidate log files from multiple machines.)

NGINX is used by default, but it is possible (mostly for backwards compatibility reasons) to run Apache instead of NGINX. For this reason, there is a service called apache2, which is defined in the configuration but does not run unless DAWEBSERVER is set to apache.

Finally, there is a service called initialize, which runs automatically when Supervisor starts. This is a shell script that initializes the server and starts the other services in the correct order.

Running without an internet connection

If you wish to run docassemble without a connection to the internet, it should work. Some features will be unavailable, of course, such as features that interact with GitHub and Google Cloud.

If the server will not have access to the internet, you may wish to set DAALLOWUPDATES to false so that docassemble will not try to run apt -q -y update during the initialization process. However, even if you don’t change DAALLOWUPDATES, the docassemble container should still start properly, because if apt cannot find a server it will fail and move on.

Troubleshooting

If you are having trouble with your docassemble server, do not assume that “turning it off and turning it on again” is a solution that will fix whatever problems you are having. Maybe that is true with some systems, but it is not true with Linux or docassemble. In fact, if you are new to docassemble, “turning it off and turning it on again” may make your problems much worse. Instead of forcibly rebooting your system and hoping for the best, learn how to access log files and uncover evidence about why your system is not working as it should. (This section explains how.) If you would like to be able to “pull the plug” on your docassemble system without negative repercussions, you can, if you first configure an external SQL server, an external Redis server, and a cloud-based persistent storage system. But until you have an external SQL server, an external Redis server, and cloud-based persistent storage system, you need to be extremely careful about how you shut down your Docker container. (See the section on shutting down to learn why.)

Normally, you will not need to access the running container in order to get docassemble to work, and all the log files you need will be available from Logs in the web browser. However, you might want or need to gain access to the running container in some circumstances.

To do so, find out the ID of the running container by doing docker ps. You will see output like the following:

CONTAINER ID  IMAGE  COMMAND  CREATED  STATUS  PORTS  NAMES
e4fa52ba540e  jhpyle/docassemble  "/usr/bin/supervisord" ...

The ID is in the first column. Then run:

docker exec -t -i e4fa52ba540e /bin/bash

using your own ID in place of e4fa52ba540e. This will give you a bash command prompt within the running container.

The first thing to check when you connect to a container is:

supervisorctl status

The output should be something like:

apache2                          STOPPED   Not started
celery                           RUNNING   pid 8539, uptime 6:52:59
celerysingle                     RUNNING   pid 8547, uptime 6:52:52
cron                             RUNNING   pid 1442, uptime 20 days, 20:58:38
exim4                            RUNNING   pid 1452, uptime 20 days, 20:58:37
initialize                       RUNNING   pid 7, uptime 20 days, 21:00:05
nascent                          STOPPED   Aug 14 04:20 PM
nginx                            RUNNING   pid 7679, uptime 6:54:28
postgres                         RUNNING   pid 321, uptime 20 days, 21:00:02
rabbitmq                         RUNNING   pid 8410, uptime 6:53:02
redis                            RUNNING   pid 479, uptime 20 days, 20:59:46
reset                            EXITED    Sep 04 06:26 AM
sync                             STOPPED   Not started
syslogng                         STOPPED   Not started
update                           STOPPED   Not started
uwsgi                            RUNNING   pid 880, uptime 20 days, 20:59:10
watchdog                         RUNNING   pid 9, uptime 20 days, 21:00:05
websockets                       RUNNING   pid 8590, uptime 6:52:42

If you are running docassemble in a single-server arrangement, the processes that should be “RUNNING” include celery, celerysingle, cron, exim4, initialize, nginx, postgres, rabbitmq, redis, uwsgi, watchdog, and websockets.

Supervisor is the application that orchestrates the various services that are necessary for the server to start up and operate. It creates various log files in the /var/log/supervisor directory on the server. For example, these files show the log for the initialize process, which is responsible for starting the server:

  • /var/log/supervisor/initialize-stderr---supervisor-*.log
  • /var/log/supervisor/initialize-stdout---supervisor-*.log

Other log files on the container that you might wish to check, in declining order of importance, are:

  • /usr/share/docassemble/log/docassemble.log (log for the web application)
  • /usr/share/docassemble/log/worker.log (log for background processes)
  • /usr/share/docassemble/log/uwsgi.log (log for the core of the web application)
  • /var/log/nginx/error.log (log for the web server)
  • /var/log/supervisor/postgres-stderr---supervisor-*.log (log for the SQL server)
  • Other files in /var/log/supervisor/ (logs for other services)
  • /usr/share/docassemble/log/websockets.log (log for parts of the live help feature)
  • /var/spool/mail/mail (log for scheduled tasks, generated by cron)
  • /tmp/flask.log (log used by Flask in rare situations)

To navigate through the directories on the system, use cd to change your current directory and ls to list the files in a directory. To view the contents of a file, type, e.g.,:

less /usr/share/docassemble/log/docassemble.log

Inside the less program, you can type spacebar to go to the next page, G to go to the end of the file, 1G to go to the start of the file, and q to quit.

Enter exit to leave the container and get back to your standard command prompt.

If supervisorctl status shows that the initialize service is in EXITED or FAILED status, then there should be an error message in the file /var/log/supervisor/initialize-stderr---supervisor-*.log indicating what went wrong that prevented docassemble from initializing. You will need to fix that problem, then type exit to leave the container, and then restart your container by doing docker stop -t 600 <containerid> followed by docker start <containerid>.

If initialize is RUNNING but celery is not RUNNING, and nascent is still RUNNING, then your server is still in the process of starting up. If it is taking a really long time to start up, check the above log files to see where in the process it is getting stuck.

If you are get a “server error” in your web browser when trying to access docassemble, there should be an error message in /usr/share/docassemble/log/uwsgi.log. If you see a message about a “blueprint’s name collision,” this is almost always not the real error; you need to scroll up through several error messages to find the actual error. When the web application crashes, the error that initiated the crash causes other errors inside of the code of the Flask framework, and a “blueprint’s name collision” error is typically the last error to be recorded in the error log.

If you encounter a problem with upgrading or installing packages, check /usr/share/docassemble/log/worker.log. This is the error log for the Celery background process system. A Celery background task controls the upgrading and installation of packages, so if you get an error during upgrading or installation of packages, make sure to check here first.

If you need to change the Configuration but you cannot use the web interface to do so because your container failed to start, or the web application does not work, you can edit the Configuration manually. The main configuration file is located at /usr/share/docassemble/config/config.yml.

Because of the way that data storage works, however, you need to be careful about editing the Configuration file directly. If you are using S3 or Azure Blob Storage, then during the container initialization process, the file will be overwritten with the copy of config.yml that is stored in the cloud. If you are not using cloud-based data storage, then when a container safely shuts down, /usr/share/docassemble/config/config.yml will be copied to /usr/share/docassemble/backup/config.yml, and when a container starts up, /usr/share/docassemble/backup/config.yml will be copied to /usr/share/docassemble/config/config.yml, overwriting the existing contents. This is part of the operation of the data storage feature; it makes it possible for you to remove a container and docker run a new one while retaining all of your data.

If you are using S3 or Azure Blob Storage, then you should docker stop -t 600 the container, then edit the config.yml file through the cloud service web interface (usually by downloading, editing, and uploading), and then docker start the container again.

If you are not using S3 or Azure Blob Storage, then you can edit the Configuration file using an editor like nano. If the status of initialize is RUNNING, edit /usr/share/docassemble/config/config.yml file, and then do supervisorctl start reset to restart the docassemble services so that they use the new Configuration. When the container stops, it will safely shut down, and /usr/share/docassemble/config/config.yml will be backed up to /usr/share/docassemble/backup/config.yml. If you are using persistent volumes, the backup folder will be in the Docker volume that will persist even if you docker rm the container. If the status of initialize is FAILED or EXITED, then this backup process will not take place; in that case, you should make your changes to /usr/share/docassemble/backup/config.yml, and then restart your container by doing docker stop -t 600 followed by docker start.

If you need to make manual changes to the installation of Python packages, note that docassemble’s Python code is installed in a Python virtual environment in which all of the files are readable and writable by the www-data user. The virtual environment is located at /usr/share/docassemble/local3.8/. Thus, installing Python packages through Ubuntu’s apt utility will not actually make that Python code available to docassemble. Before using pip, you need to first change the user to www-data, and then switch into the appropriate Python virtual environment.

su www-data
source /usr/share/docassemble/local3.8/bin/activate

Note that if you want to install a new version of a Python package that may already be installed, you will want to use the --upgrade and --force-reinstall parameters.

pip install --upgrade --force-reinstall azure-storage

To stop using the Python virtual environment, type the command deactivate. To stop being the www-data user, type the command exit.

Services other than NGINX and uWSGI are an important part of docassemble’s operations. For example, the upgrading and installation of Python packages takes place in a background process operated by the celery service. In addition, the live help feature uses a service called websockets. The nginx, uwsgi, celery, and websockets services all need to be restarted every time there is a change to the Configuration or a change to Python code. To restart all of the services at once, you can do:

supervisorctl start reset

However, if the uwsgi process has crashed, then you need to do:

supervisorctl restart uwsgi
supervisorctl start reset

You need to manually restart the uwsgi process here because the reset process uses an optimized method of refreshing the application server. This usually works well when you make Configuration and Python code changes, but if uWSGI has crashed, supervisorctl start reset will not bring it back to life.

If you want to access the Redis data, do docker exec to get inside the container and then run redis-cli (assuming that your Redis server is the default local Redis server). Note that docassemble uses several of the Redis databases. If you do redis-cli -n 1 (the default), you will access the database used on a system level. If you do redis-cli -n 2, you will access the database used by DARedis.

Unless you specify a different SQL server, the PostgreSQL data for your docassemble server is inside the docassemble database running on the Docker container. The default username is docassemble and the default password is abc123. After doing docker exec to get inside the container, run:

psql -U docassemble -d docassemble -h localhost -W

When prompted, enter password abc123.

For more information about troubleshooting docassemble, see the debugging subsection of the installation section.

Configuration options

In the example above, we started docassemble with docker run -d -p 80:80 jhpyle/docassemble. This command will cause docassemble to use default values for all configuration options. You can also communicate specific configuration options to the container.

The recommended way to do this is to create a text file called env.list in the current working directory containing environment variable definitions in standard shell script format. For example:

DAHOSTNAME=docassemble.example.com
USEHTTPS=true
USELETSENCRYPT=true
[email protected]

Then, you can pass these environment variables to the container using the docker run command:

docker run --env-file=env.list -d -p 80:80 -p 443:443 --stop-timeout 600 jhpyle/docassemble

These configuration options will cause NGINX to use docassemble.example.com as the server name and use HTTPS with certificates hosted on Let’s Encrypt. (The flag -p 443:443 is included so that the HTTPS port is exposed.)

If you want your server to be able to accept incoming e-mails, you will need to add -p 25:25 in order to open port 25. See the e-mailing the interview section for information about configuring your server to receive e-mails.

A template for the env.list file is included in distribution.

When running docassemble in ECS, environment variables like these are specified in JSON text that is entered into the web interface. (See the scalability section for more information about using ECS.)

In your env.list file, you can set a variety of options. These options are case specific, so you need to literally specify true or false, because True and False will not work.

The following two options are specific to the particular server being started (which, in a multi-server arrangement, will vary from server to server).

  • CONTAINERROLE: either all or a colon-separated list of services (e.g. web:celery, sql:log:redis, etc.) that should be started by the server. Only set the CONTAINERROLE if you are using a multi-server arrangement; the default is all. The available options are:
    • all: the Docker container will run all of the services of docassemble on a single container.
    • web: The Docker container will serve as a web server.
    • celery: The Docker container will serve as a Celery node.
    • sql: The Docker container will run the central PostgreSQL service.
    • cron: The Docker container will run scheduled tasks and other necessary tasks, such as updating SQL tables.
    • redis: The Docker container will run the central Redis service.
    • rabbitmq: The Docker container will run the central RabbitMQ service.
    • log: The Docker container will run the central log aggregation service.
    • mail: The Docker container will run Exim in order to accept e-mails.
  • SERVERHOSTNAME: In a multi-server arrangement, all docassemble application servers need to be able to communicate with each other using port 9001 (the supervisor port). All application servers “register” with the central SQL server. When they register, they each provide their hostname; that is, the hostname at which the specific individual application server can be found. Then, when an application server wants to send a message to the other application servers, the application server can query the central SQL server to get a list of hostnames of other application servers. This is necessary so that any one application server can send a signal to the other application servers to install a new package or a new version of a package, so that all servers are running the same software. If you are running docassemble in a multi-server arrangement, and you are starting an application server, set SERVERHOSTNAME to the hostname with which other application servers can find that server. Note that you do not need to worry about setting SERVERHOSTNAME if you are using EC2, because Docker containers running on EC2 can discover their actual hostnames by querying a specific IP address.

The other options you can set in env.list are global for your entire docassemble installation, rather than specific to the server being started.

The following options, if you choose to set them, need to be set using Docker environment variables at the time of the initial docker run. The values are needed immediately when the container first starts, so they cannot be set through a config.yml file. Setting these options is not required; these options are used to provide increased security within a multi-server arrangement, in which servers send each other commands over port 9001.

  • DASUPERVISORUSERNAME: the username that should be used when communicating with supervisor over port 9001.
  • DASUPERVISORPASSWORD: the password that should be used when communicating with supervisor over port 9001.

These variables will be populated in the [Configuartion] under the supervisor directive.

The following eight options indicate where an existing configuration file can be found on S3 or Azure blob storage. If a configuration file exists in the cloud at the indicated location, that configuration file will be used to set the configuration of your docassemble installation. If no configuration file yet exists in the cloud at the indicated location, docassemble will create an initial configuration file and store it in the indicated location.

  • S3ENABLE: Set this to true if you are using S3 (or S3-compatible object storage service) as a repository for uploaded files, Playground files, the configuration file, and other information. This environment variable, along with others that begin with S3, populates values in s3 section of the initial configuration file. If this is unset, but S3BUCKET is set, it will be assumed to be true.
  • S3BUCKET: If you are using S3, set this to the bucket name. Note that docassemble will not create the bucket for you. You will need to create it for yourself beforehand. The bucket should be empty.
  • S3ACCESSKEY: If you are using S3, set this to the S3 access key. You can ignore this environment variable if you are using EC2 with an IAM role that allows access to your S3 bucket.
  • S3SECRETACCESSKEY: If you are using S3, set this to the S3 access secret. You can ignore this environment variable if you are using EC2 with an IAM role that allows access to your S3 bucket.
  • S3REGION: If you are using S3, set this to the region you are using (e.g., us-west-1, us-west-2, ca-central-1).
  • S3ENDPOINTURL: If you are using an S3-compatible object storage service, set S3ENDPOINTURL to the URL of the service (e.g., https://mys3service.com).
  • S3_SSE_ALGORITHM: the server-side encryption algorithm used (e.g., AES256, aws:kms). This should only be specified if the S3 bucket uses server-side encryption.
  • S3_SSE_CUSTOMER_ALGORITHM: the server-side encryption algorithm used (e.g., AES256, aws:kms). This should only be specified if the S3 bucket uses server-side encryption and you are passing an S3_SSE_CUSTOMER_KEY.
  • S3_SSE_CUSTOMER_KEY: the encryption key used when encrypting data. This should only be specified if the S3 bucket uses server-side encryption and you have specified an S3_SSE_CUSTOMER_ALGORITHM.
  • S3_SSE_KMS_KEY_ID: the AWS KMS key ID to use for object encryption. This should only be specified if the S3 bucket uses server-side encryption.
  • AZUREENABLE: Set this to true if you are using Azure blob storage as a repository for uploaded files, Playground files, the configuration file, and other information. This environment variable, along with others that begin with AZURE, populates values in azure section of the configuration file. If this is unset, but AZUREACCOUNTNAME, AZUREACCOUNTKEY, and AZURECONTAINER are set, it will be assumed to be true.
  • AZURECONTAINER: If you are using Azure blob storage, set this to the container name. Note that docassemble will not create the container for you. You will need to create it for yourself beforehand.
  • AZUREACCOUNTNAME: If you are using Azure blob storage, set this to the account name.
  • AZUREACCOUNTKEY: If you are using Azure blob storage, set this to the account key.

The options listed below are “setup” parameters that are useful for pre-populating a fresh configuration with particular values. These environment variables are effective only during an initial run of the Docker container, when a configuration file does not already exist.

If you are using persistent volumes, or you have set the options above for S3/Azure blob storage and a configuration file exists in your cloud storage, the values in that stored configuration file will, by default, take precedence over any values you specify in env.list. If you are using S3/Azure blob storage, you can edit these configuration files in the cloud and then stop and start your container for the new configuration to take effect.

  • DAWEBSERVER: This can be set either to nginx (the default) or apache. See the web server configuration directive.
  • DBHOST: The hostname of the PostgreSQL server. Keep undefined or set to null in order to use the PostgreSQL server on the same host. This environment variable, along with others that begin with DB, populates values in db section of the configuration file. If you are using a managed SQL database service, set DBHOST to the hostname of the database service. If you are using PostgreSQL and the database referenced by DBNAME does not exist on the SQL server, the Docker startup process will attempt to use the DBUSER and DBPASSWORD credentials to create the database. Otherwise, you need to make sure the database by the name of DBNAME exists before docassemble starts.
  • DBNAME: The name of the database. The default is docassemble.
  • DBUSER: The username for connecting to the PostgreSQL server. The default is docassemble.
  • DBPASSWORD: The password for connecting to the SQL server. The default is abc123. The password cannot contain the character #.
  • DBPREFIX: This sets the prefix for the database specifier. The default is postgresql+psycopg2://. This corresponds with the prefix of the db configuration directive.
  • DBPORT: This sets the port that docassemble will use to access the SQL server. If you are using the default port for your database backend, you do not need to set this.
  • DBTABLEPREFIX: This allows multiple separate docassemble implementations to share the same SQL database. The value is a prefix to be added to each table in the database.
  • DBBACKUP: Set this to false if you are using an off-site PostgreSQL DBHOST and you do not want the database to be backed up by the daily cron job. This is important if the off-site SQL database is large compared to the available disk space on the server. The default value is true.
  • DBSSLMODE: This is relevant if you have PostgreSQL database and you have an SSL certificate for it. This sets the sslmode parameter. For more information, see the documentation for the db section of the Configuration.
  • DBSSLCERT: This is relevant if you have PostgreSQL database and you have an SSL certificate for it. This is the name of a certificate file. For more information, see the documentation for the db section of the Configuration.
  • DBSSLKEY: This is relevant if you have PostgreSQL database and you have an SSL certificate for it. This is the name of a certificate key file. For more information, see the documentation for the db section of the Configuration.
  • DBSSLROOTCERT: This is relevant if you have PostgreSQL database and you have an SSL certificate for it. This is the name of a root certificate file. For more information, see the documentation for the db section of the Configuration.
  • DASQLPING: If your docassemble server runs in an environment in which persistent SQL connections will periodically be severed, you can set DASQLPING: true in order to avoid errors. There is an overhead cost to using this, so only enable this if you get SQL errors when trying to connect after a period of inactivity. The default is false. See the sql ping configuration directive.
  • EC2: Set this to true if you are running Docker on EC2. This tells docassemble that it can use an EC2-specific method of determining the hostname of the server on which it is running. See the ec2 configuration directive.
  • COLLECTSTATISTICS: Set this to true if you want the server to use Redis to track the number of interview sessions initiated. See the collect statistics configuration directive.
  • KUBERNETES: Set this to true if you are running inside Kubernetes. This tells docassemble that it can use the IP address of the Pod in place of the hostname. See the kubernetes configuration directive.
  • USEHTTPS: Set this to true if you would like docassemble to communicate with the browser using encryption. Read the HTTPS section for more information. Defaults to false. See the use https configuration directive. Do not set this to true if you are using a proxy server that forwards non-encrypted HTTP to your server; in that case, see the BEHINDHTTPSLOADBALANCER variable below.
  • DAHOSTNAME: Set this to the hostname by which web browsers can find docassemble. This is necessary for HTTPS to function. See the external hostname configuration directive.
  • USELETSENCRYPT: Set this to true if you are using Let’s Encrypt. The default is false. See the use lets encrypt configuration directive.
  • LETSENCRYPTEMAIL: Set this to the e-mail address you use with Let’s Encrypt. See the lets encrypt email configuration directive.
  • LOGSERVER: This is used in the multi-server arrangement where there is a separate server for collecting log messages. The default is none, which causes the server to run Syslog-ng. See the log server configuration directive.
  • REDIS: If you are running docassemble in a multi-server arrangement, set this to redis://thehostname where thehostname is the host name at which the Redis server can be accessed. See the redis configuration directive.
  • RABBITMQ: If you are running docassemble in a multi-server arrangement, set this to the URL at which the RabbitMQ server can be accessed, in the form pyamqp://[email protected]// or pyamqp://user:[email protected]//. Note that RabbitMQ is very particular about hostnames. If the RabbitMQ server is running on a machine on which the command hostname -s evaluates to rabbitmqserver.local, then your application servers will need to use rabbitmqserver.local as the hostname in the RABBITMQ URL, even if other names resolve to the same IP address. Note that if you run docassemble using the instructions in the scalability section, you may not need to worry about setting RABBITMQ. See the rabbitmq configuration directive.
  • DACELERYWORKERS: By default, the number of Celery workers is based on the number of CPUs on the machine. If you want to set a different value, set DACELERYWORKERS to integer greater than or equal to 1. See the celery processes configuration directive.
  • SERVERADMIN: If your docassemble web server generates an error, the error message will contain an e-mail address that the user can contact for help. This e-mail address defaults to [email protected]. You can set this e-mail address by setting the SERVERADMIN environment variable to the e-mail address you want to use. See the server administrator email configuration directive.
  • POSTURLROOT: If users access docassemble at https://docassemble.example.com/da, set POSTURLROOT to /da/. The trailing slash is important. If users access docassemble at https://docassemble.example.com, you can ignore this. The default value is /. See the root configuration directive.
  • BEHINDHTTPSLOADBALANCER: Set this to true if you are using a load balancer or proxy server that accepts connections in HTTPS and forwards them to your server or servers as HTTP. This lets docassemble know that when it forms URLs, it should use the https scheme even though requests appear to be coming in as HTTP requests, and when it sends cookies, it should set the secure flag on the cookies. You also need to make sure that your proxy server is setting the X-Forwarded-* HTTP headers when it passes HTTP requests to your server or servers. See the behind https load balancer configuration directive for more information.
  • XSENDFILE: Set this to false if the X-Sendfile header is not functional in your configuration for whatever reason. See the xsendfile configuration directive.
  • DAALLOWUPDATES: Set this to false if you want to disable the updating of software through the user interface. See the allow updates configuration directive.
  • DAUPDATEONSTART: Set this to false if you do not want the container to update its software using pip when it starts up. Set DAUPDATEONSTART to initial if you want the container to update its software during the first docker run, but not on every docker start. See the update on start configuration directive.
  • DAROOTOWNED: Set this to true if you are setting DAALLOWUPDATES=false and DAENABLEPLAYGROUND=false you also want to take the extra step of making the directories containing code owned by root so that the web browser user cannot access them.
  • DAALLOWCONFIGURATIONEDITING: Set this to false to prevent the editing of the Configuration. See the allow configuration editing configuration directive.
  • DAENABLEPLAYGROUND: Set this to false to disable the Playground on the server. See the enable playground directive.
  • DAALLOWLOGVIEWING: Set this to false to prevent administrators and developers from viewing the system logs by going to Logs on the menu. By default, administrators and developers can access Logs.
  • DADEBUG: Set this to false if you want the server to be in production mode rather than developer mode. This will also disable access to example and demonstration interviews in the docassemble.base and docassemble.demo packages. See the debug and allow demo configuration directives.
  • TIMEZONE: You can use this to set the time zone of the server. The value of the variable is stored in /etc/timezone and dpkg-reconfigure -f noninteractive tzdata is run in order to set the system time zone. The default is America/New_York. See the timezone configuration directive.
  • LOCALE: You can use this to enable a locale on the server. The value needs to match an entry in /etc/locale.gen on Ubuntu. These are the locale values that Ubuntu/[Debian] recognizes. When the server starts, the value of LOCALE is appended to /etc/locale.gen and locale-gen and update-locale are run. The default is en_US.UTF-8 UTF-8. See the os locale configuration directive.
  • OTHERLOCALES: You can use this to set up other locales on the system besides the default locale. Set this to a comma separated list of locales. The values need to match entries in /etc/locale.gen on Ubuntu. See the other os locales configuration directive.
  • PACKAGES: If your interviews use code that depends on certain Ubuntu packages being installed, you can provide a comma-separated list of Ubuntu packages in the PACKAGES environment variable. The packages will be installed when the container is started. See the debian packages configuration directive.
  • PYTHONPACKAGES: If you want to install certain Python packages during the container start process, you can provide a comma-separated list of packages in the PYTHONPACKAGES environment variable. See the python packages configuration directive.
  • DASECRETKEY: The secret key for protecting against cross-site forgery. See the secretkey configuration directive. If DASECRETKEY is not set, a random secret key will be generated.
  • DABACKUPDAYS: The number of days backups should be kept. The default is 14. See the backup days configuration directive.
  • DAEXPOSEWEBSOCKETS: You may need to set this to true if you are operating a Docker container behind a reverse proxy and you want to use the WebSocket-based live help features. See the expose websockets configuration directive.
  • DAWEBSOCKETSIP: You can set this if you need to manually specify the address on which the websockets service runs. See the websockets ip configuration directive.
  • DAWEBSOCKETSPORT: You can set this if you need to manually specify the port on which the websockets service runs. See the websockets port configuration directive.
  • PORT: By default, if you are not using HTTPS, the docassemble web application runs on port 80. When running Docker, you can map any port on the host to port 80 in the container. However, if you are using a system like Heroku which expects the Docker container to use the PORT environment variable, you can set PORT in your env.list file. See the http port configuration directive.
  • USEMINIO: Set this to true if you are setting S3ENDPOINTURL to point to MinIO and you would like the bucket to be created when the container starts. See the use minio configuration directive.
  • USECLOUDURLS: Set this to false if you are using cloud storage but you do not want URLs for files to point directly to the cloud storage provider. See the use cloud urls configuration directive.
  • DASTABLEVERSION: Set this to true if you want docassemble to stay on version 1.0.x. This is the stable branch of the GitHub repository, which only receives bug fixes and security updates. See the stable version configuration directive.
  • DASSLPROTOCOLS: This indicates the SSL protocols that NGINX should accept. The default is TLSv1.2. You might want to set it to TLSv1 TLSv1.1 TLSv1.2 if you need to support older browsers. The value is passed directly to the NGINX directive ssl_protocols. See the nginx ssl protocols configuration directive.
  • PIPINDEXURL: This controls the package index that pip uses. See the pip index url configuration directive.
  • PIPEXTRAINDEXURLS: This controls the extra package index sites that pip uses. See the pip extra index urls configuration directive.
  • ENVIRONMENT_TAKES_PRECEDENCE: It was noted above that once the configuration file is located in the persistent volume, S3, or Azure blob storage, the values in that configuration file will take precedence over any values specified in Docker environment variables. This is the default behavior; the Docker environment variables are useful for 1) telling the server where to find an existing configuration file; and 2) if no configuration file exists already, pre-populating the initial configuration file. However, if you set ENVIRONMENT_TAKES_PRECEDENCE to true, then docassemble will override values in the configuration file with the values of Docker environment variables if they conflict. Note that the YAML of the configuration file will not be altered; you will still see the same YAML when you go to edit the Configuration. However, internally, docassemble will override those values with the values of the Docker environment variables. Since it can be confusing to have dueling sources of configuration values, it is encouraged that you update the YAML of your Configuration to align with the values in your Docker environment. The ENVIRONMENT_TAKES_PRECEDENCE option is primarily used in the Kubernetes/Helm environment, where there are some Docker environment variables that cannot be known in advance.

Changing the configuration

If you already have an existing docassemble installation and you want to run a new Docker container using it, but you want to change the configuration of the container, there are some things you will need to keep in mind.

The existing configuration file takes precedence over the environment variables that you set using Docker.

If you want to change the configuration, and the server is running, you can edit the configuration using the web interface.

If the server is not running, and you are using persistent volumes, you can use docker volume inspect to find the location of the persistent volume.

When docassemble starts up on a Docker container, it:

When docassemble stops, it saves the configuration file, a backup of the PostgreSQL database, and backups of the Let’s Encrypt configuration. If you are using persistent volumes, the information will be stored there. If you are using S3 or Azure blob storage, the information will be stored in the cloud.

When docassemble starts again, it will retrieve the configuration file, the backup of the PostgreSQL database, and backups of the Let’s Encrypt configuration from storage and use them for the container.

Suppose you have an existing installation that uses HTTPS and Let’s Encrypt, but you want to change the DAHOSTNAME. You will need to delete the saved configuration before running a new container. First, shut down the machine with docker stop -t 600. Then, if you are using S3, you can go to the S3 Console and delete the “letsencrypt.tar.gz” file. If you are using Azure blob storage, you can go to the Azure Portal and delete the “letsencrypt.tar.gz” file.

Also, if a configuration file exists on S3/Azure blob storage (config.yml) or in a persistent volume, then the values in that configuration will take precedence over the corresponding environment variables that are passed to Docker. Once a configuration file exists, you should make changes to the configuration file rather than passing environment variables to Docker. However, if your configuration is on S3/Azure blob storage, you will at least need to pass sufficient access keys (e.g., S3BUCKET, AZURECONTAINER, etc.) to access that storage; otherwise your container will not know where to find the configuration.

Also, there are some environment variables that do not exist in the configuration file because they are specific to the individual server being started. These include the CONTAINERROLE and SERVERHOSTNAME environment variables.

Data storage

Docker containers are volatile. They are designed to be run, turned off, and destroyed. When using Docker, the best way to upgrade docassemble to a new version is to destroy and rebuild your containers.

But what about your data? If you run docassemble, you are accumulating valuable data in SQL, in files, and in Redis. If your data are stored on the Docker container, they will be destroyed by docker rm.

There are two ways around this problem. The first, and most preferable solution, is to use an object storage service. The standard-setting object storage service is Amazon Web Services’s S3. If you use AWS, you can create an S3 bucket for your data, and then when you launch your docassemble container, set the S3BUCKET, S3ACCESSKEY, S3SECRETACCESSKEY, and S3REGION environment variables.

If you don’t want to use Amazon Web Services, you can use an S3-compatible object storage service by setting S3ENDPOINTURL to the URL of the service, along with the S3BUCKET, S3ACCESSKEY, and S3SECRETACCESSKEY environment variables. There are S3-compatible object storage services available for Google Cloud, Wasabi, Linode, Vultr, Digital Ocean, IBM Cloud, Oracle Cloud, Scaleway, Exoscale, and others. If you are operating an on-premises server, you can deploy MinIO (MinIO is configured by default if you deploy docassemble with Kubernetes) or Rook.

In addition to S3 and S3-compatible object storage, docassemble supports Azure blob storage. You can create a blob storage container inside Microsoft Azure and then when you launch your container, you set the AZUREACCOUNTNAME, AZUREACCOUNTKEY, and AZURECONTAINER environment variables.

When docker stop -t 600 is run, docassemble will backup the SQL database, the Redis database, the configuration, and your uploaded files to the S3 bucket or blob storage container. Then, when you issue a docker run command with environment variables pointing docassemble to your S3 bucket/Azure blob storage resource, docassemble will make restore from the backup. You can docker rm your container and your data will persist in the cloud.

The second method of persistent storage is to use persistent volumes, which is a feature of Docker. This will store the data in directories on the Docker host, so that when you destroy the container, these directories will be untouched, and when you start up a new container, it will use the saved directories. This feature is only available if you are running docassemble in a single-server configuration.

These two options are explained in the following subsections.

If you are operating a development server in a single-server configuration, and you will be using the Playground, using persistent volumes is recommended. When a development server uses S3, every time a Playground file is accessed, it must be copied from S3 to the server. This negatively impacts performance.

Using S3 or S3-compatible

To use S3 (or an S3-compatible] service) for persistent storage, you need to obtain credentials and create a bucket.

If you want to use Amazon Web Services, you would first sign up for an AWS account, and go to the S3 Console, click “Create Bucket,” and pick a name. If your site is at docassemble.example.com, a good name for the bucket is docassemble-example-com. (Client software will have trouble accessing your bucket if it contains . characters.) Under “Region,” pick the region nearest you. Then you need to obtain an access key and a secret access key for S3. To obtain these credentials, go to IAM Console and create a user with “programmatic access.” Under “Attach existing policies directly,” find the policy called AmazonS3FullAccess and attach it to the user.

When you run your docassemble Docker container, set the configuration options S3BUCKET, S3ACCESSKEY, S3SECRETACCESSKEY, and S3REGION. For example, you might use an env.list file such as:

DAHOSTNAME=interviews.example.com
S3ENABLE=true
S3BUCKET=interviews-example-com
S3ACCESSKEY=YERWERGDFSGERGSDFGSW
S3SECRETACCESSKEY=WERWR36dddeg3udjfRT1+rweRTHRTookiMVASDAS
S3REGION=us-east-2
TIMEZONE=America/New_York
USEHTTPS=true
EC2=true
USELETSENCRYPT=true
[email protected]

Note that if you run docassemble on EC2, you can launch your EC2 instances with an IAM role that allows docassemble to access to an S3 bucket without the necessity of setting S3ACCESSKEY and S3SECRETACCESSKEY. In this case, the only environment variable you need to pass is S3BUCKET.

If you are using an S3-compatible object storage service, you will need to set S3ENDPOINTURL to the URL endpoint of your service, which you can find in the service’s documentation or in your account settings. You likely will not need to set S3REGION unless the service supports the “region” concept.

These secret access keys will become available to all developers who use your docassemble server, since they are in the configuration file.

If you are using AWS and you want to limit access to a particular bucket, you do not have to use the AmazonS3FullAccess policy when obtaining S3 credentials. Instead, you can create your own policy with the following definition:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::docassemble-example-com"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::docassemble-example-com/*"
            ]
        }
    ]
}

Replace docassemble-example-com in the above text with the name of your S3 bucket.

When setting up your S3 bucket, you should consider setting up a backup mechanism using the S3 provider’s site. When uploaded files are stored in S3, docassemble does not include those files in its rotating backups.

Using Microsoft Azure

Using Microsoft Azure is very similar to using S3. From the Azure Portal dashboard, search for “Storage accounts” in the “Resources.” Click “Add” to create a new storage account. Under “Account kind,” choose “BlobStorage.” Under “Access tier,” you can choose either “Cool” or “Hot,” but you may have to pay more for “Hot.”

Once the storage account is created, go into your “Blobs” service in the storage account and click “+ Container” to add a new container. Set the “Access type” to “Private.” The name of the container corresponds with the AZURECONTAINER environment variable. Back at the storage account, click “Access keys.” The “Storage account name” corresponds with the environment variable AZUREACCOUNTNAME. The “key1” corresponds with the AZUREACCOUNTKEY environment variable. (You can also use “key2.”). For example, you might use an env.list file such as:

DAHOSTNAME=interviews.example.com
AZUREENABLE=true
AZUREACCOUNTNAME=exampledotcom
AZUREACCOUNTKEY=98f89asdfjwew/YosdfojweafASDFErgergDFGsergagaweWRTIQgqERGs243rergE4534tERgEFBDRGferEEB==
AZURECONTAINER=interviews-example-com
TIMEZONE=America/New_York
USEHTTPS=true
USELETSENCRYPT=true
[email protected]

If you enable both S3 and Azure blob storage, only S3 will be used.

When setting up Azure blob storage, you should consider setting up a backup mechanism in Azure. When uploaded files are stored in Azure blob storage, docassemble does not include those files in its rotating backups.

Using persistent volumes

To run docassemble in a single-server arrangement in such a way that the configuration, the Playground files, the uploaded files, and other data persist after the Docker container is removed or updated, run the image as follows:

docker run --env-file=env.list \
-v dabackup:/usr/share/docassemble/backup \
-d -p 80:80 -p 443:443 --stop-timeout 600 \
jhpyle/docassemble

where --env-file=env.list is an optional parameter that refers to a file env.list containing environment variables for the configuration. A template for the env.list file is included in distribution.

An advantage of using persistent volumes is that you can completely replace the docassemble container and rebuild it from scratch, and when you run the jhpyle/docassemble image again, docassemble will keep running where it left off.

If you are using HTTPS with your own certificates (as opposed to using Let’s Encrypt), or you need to provide other SSL certificates to docassemble (for example, for PostgreSQL and/or Redis encryption) you can use a persistent volume to provide the certificates to the Docker container; add -v dacerts:/usr/share/docassemble/certs to your docker run command. For more information on creating a persistent volume for SSL certificates, see below.

To see what volumes exist on your Docker system, you can run:

docker volume ls

A volume will be created when you run docker run with the -v option. For example, the docker run command above specified -v dabackup:/usr/share/docassemble/backup. If a volume called dabackup does not already exist, it will be created, and the jhpyle/docassemble container will initialize its contents. The dabackup volume will be associated with the directory /usr/share/docassemble/backup inside the container. The docker volume ls command can be used to list the files inside of a volume. docker cp can be used to copy files from the host to the /usr/share/docassemble/backup folder.

You might want to initialize a volume before starting your docassemble server, for example in order to provide certificates to docassemble. For example, to create a volume called dacerts, you can docker run the minimal busybox container with the volume mounted to /data:

docker run -v dacerts:/data --name deleteme busybox true

Then you can use docker cp to copy files to it:

docker cp redis.crt deleteme:/data/
docker cp redis.key deleteme:/data/

Then you can delete the container, as it is no longer needed:

Now delete the BusyBox container. (The volume dacerts will not be deleted.)

docker rm deleteme

Now that the dacerts volume has already been created on the host, if you do:

docker run --env-file=env.list \
-v dabackup:/usr/share/docassemble/backup \
-v dacerts:/usr/share/docassemble/certs \
-d -p 80:80 -p 443:443 --stop-timeout 600 \
jhpyle/docassemble

then the directory /usr/share/docassemble/certs will already be prepopulated when docassemble starts.

If you want to see the files in the dacerts volume, do:

docker volume ls dacerts

If you want to delete the dacerts volume, do:

docker volume rm dacerts

To delete all of the volumes, do:

docker volume rm $(docker volume ls -qf dangling=true)

Docker volumes are powerful but complicated. If you want to use them, you can read about them in the Docker documentation and other places on the internet.

It is recommended that you do not create Docker volumes for directories other than /usr/share/docassemble/certs and /usr/share/docassemble/backup. If you try to mount other directories as volumes, you might experience hard-to-debug problems. Depending on the architecture, the directories might cause the container to malfunction because certain file system features that the software depends on are not available. If you create a volume for a directory, your directory will supplant the directory that is present in the Docker image, so unless you populate the directory with the same files that the image provides, your docassemble server may be non-functional.

Ultimately, the better data storage solution is to use cloud storage (S3, Azure blob storage) because:

  1. S3 and Azure blob storage make scaling easier. They are the “cloud” way of storing persistent data, at least until cloud-based network file systems become more robust.
  2. It is easier to upgrade your virtual machines to the latest software and operating system if you can just destroy them and recreate them, rather than running update scripts. If your persistent data is stored in the cloud, you can destroy and recreate virtual machines at will, without ever having to worry about copying your data on and off the machines.

However, you can get around the second problem by using docker volume create to put your Docker volume on a separate drive. That way, you could remove the virtual machine that runs the application, along with its primary drive, without affecting the drive with the docassemble data.

Recovery from backup files

When you are using data storage, you can do docker stop -t 600 on a container, followed by docker rm, and then re-run your original docker run command, and when the system starts again, it will be in the same place it was before, with the same uploaded files, the same SQL database.

During the docker stop process, application data are saved into files and directories in the data storage area. During docker run (and docker start as well), application data are restored from the data storage area before the server attempts to start. Included in the application data is a the list of Python packages installed on your system; when the server starts, pip will be used to install the same list of packages.

This backup-on-shutdown/restore-on-startup feature is very powerful because it means you can shut down, delete your Docker container, pull a new Docker image, and then re-run docker run, and all of your application data and Python packages will be restored. Between the old Docker image and the new Docker image, the versions of the operating system, PostgreSQL, and Python might have changed, but the restore process will adjust for this.

However, if your server has an unsafe shutdown, the files in the data storage area might be corrupted. They might also be missing or very old (dating from the last time there was a safe shutdown). If this happens, not all is lost, because you can restore from one of the daily backup snapshots.

If you are using S3 or Azure blob storage, the files and directories that are saved during the shutdown process are:

  • postgres - a folder containing a “dump” of each database hosted by the PostgreSQL server. Usually the operative file is called docassemble, for the database called docassemble. If you point your server to an external database using the db section of your Configuration, this is not applicable. The backup file will exist, but it will be an empty database.
  • redis.rdb - a file containing a backup of the Redis database. If you point your server to an external Redis database using a redis directive in your Configuration, this is not applicable. The redis.rdb file will exist, but it will be an empty database.
  • log - a folder containing docassemble log files.
  • nginxlogs - a folder containing the logs for NGINX. If you are using Apache, the relevant folder is apachelogs. This is not applicable unless the CONTAINERROLE is all.

The files folder, the config.yml file, and the letsencrypt.tar.gz (if Let’s Encrypt is used) are important for restoring the system on startup, but they are always up-to-date; they are not copied from the server during the shutdown process. So even if you have an unsafe shutdown, you will have up-to-date versions of files, config.yml, and letsencrypt.tar.gz.

If you are not using S3 or Azure blob storage, then during a safe shutdown process, application data is saved into the following files and folders:

  • /usr/share/docassemble/backup/postgres - a folder containing a “dump” of each database hosted by the PostgreSQL server. Usually the operative file is called docassemble, for the database called docassemble. If you point your server to an external database using the db section of your Configuration, this is not applicable. The backup file will exist, but it will be an empty database.
  • /usr/share/docassemble/backup/redis.rdb - a file containing a backup of the Redis database. If you point your server to an external Redis database using a redis directive in your Configuration, this is not applicable. The redis.rdb file will exist, but it will be an empty database.
  • /usr/share/docassemble/backup/files - a directory containing all of the stored files in your system (document uploads, assembled documents, ZIP files for installed packages, etc.). If backup file storage in the Configuration is set to false, then this will not exist.
  • /usr/share/docassemble/backup/log - a folder containing docassemble log files.
  • /usr/share/docassemble/backup/nginxlogs - a folder containing the logs for NGINX. If you are using Apache, the relevant folder is /usr/share/docassemble/backup/apachelogs. This is not applicable unless the CONTAINERROLE is all.
  • /usr/share/docassemble/backup/config/config.yml - a file containing the Configuration of your system.

The file /usr/share/docassemble/backup/letsencrypt.tar.gz is important for restoring the system (if Let’s Encrypt is used), but it is always up-to-date; it is not copied from the server during the shutdown process.

Whenever a docassemble container starts up, the PostgreSQL database in postgres/docassemble is used to restore docassemble’s SQL database. The redis.rdb file is used to restore the Redis database. These files are created during the shutdown process. It is important that the shutdown process happens gracefully, because otherwise these files will not be complete.

As protection against the risk of an unsafe shutdown (as well as the risk of the accidental deletion of data), docassemble maintains a daily rotating backup. The daily backup is created whenever the daily cron job runs (which is typically around 6:00 in the morning).

If you are using S3 or Azure blob storage, these backups are in the backup folder in the cloud storage. There is a subfolder in the backup folder for each container that has used the cloud storage area. The subfolder names come from the internal hostnames of containers. In a multi-server arrangement, you will see several subfolders. You may also see several subfolders if you have called docker run multiple times. Within a subfolder for a container, there are subfolders for each day for which there is a backup. The folders are in the format MM-DD where MM is the month and DD is the day of the month. If you want to restore your system to a snapshot of where it was when a daily backup was made, you will need to shut down your server(s) with docker stop -t 600 if it is still running. Then you will need to copy files from the daily backup location to the places where they will be used when the system starts up again. In particular, you will copy the following out of the daily backup folder:

  • config/config.yml to config.yml in the root of the cloud storage.
  • files to files in the root of the cloud storage.
  • postgres to postgres in the root of the cloud storage.
  • redis.rdb to redis.rdb in the root of the cloud storage.
  • log to log in the root of the cloud storage.

Copying log is optional. The contents of log files are not critical to the functionality of the systems.

Note that when using S3 or Azure blob storage, file storage (uploaded files, assembled documents) are already in the cloud, so they are not backed up to the backup folder. If you want a backup mechanism for these files, you can enable it using S3 or Azure blob storage site.

If you are not using S3 or Azure blob storage, the disaster recovery backup files are in folders named /usr/share/docassemble/backup/MM-DD where MM is the month and DD is the day the backup was made. If you want to restore your system to a snapshot of where it was when a daily backup was made, you will first need to shut down your server with docker stop -t 600 if it is still running. Then you will need to copy files from the daily backup location to the places where they will be used when the system starts up again. In particular, you will copy the following out of the daily backup folder:

  • the config/config.yml file to /usr/share/docassemble/backup/config.yml
  • the files folder to /usr/share/docassemble/backup/files
  • the postgres folder to /usr/share/docassemble/backup/postgres
  • the redis.rdb file to /usr/share/docassemble/backup/redis.rdb
  • the log folder to /usr/share/docassemble/backup/log

Copying log is optional. The contents of log files are not critical to the functionality of the systems.

After copying these files into place, you can start your server(s) with docker run (using the same parameters you originally used) or docker start.

Multi-server arrangement

Services on different machines

The docassemble application consists of several services, some of which are singular and some of which can be plural.

The singular services include:

The (potentially) plural services include:

The docassemble Docker container will run any subset of these six services, depending on the value of the environment variable CONTAINERROLE, which is passed to the container at startup. In a single-server arrangement (CONTAINERROLE = all, or left undefined), the container runs all of the services (except the log message aggregator, which is not necessary in the case of a single-server arrangement).

You can run docassemble in a multi-server arrangement using Docker by running the docassemble image on different hosts using different configuration options.

In a multi-server arrangement, you can have one machine run SQL, another machine run Redis and RabbitMQ, and any number of machines run web servers and Celery nodes. You can decide how to allocate services to different machines. For example, you might want to run central tasks on a powerful server, while running many web servers on less powerful machines.

Since the SQL, Redis, and RabbitMQ services are standard services, they do not have to be run from docassemble Docker containers. For example, if you are already running a SQL server, a Redis server, and a RabbitMQ server, you could just point docassemble to those resources.

To change the SQL server that docassemble uses, edit the DBHOST, DBNAME, DBUSER, DBPASSWORD, DBPREFIX, DBPORT, and DBTABLEPREFIX configuration options.

To change the Redis server that docassemble uses, edit the REDIS configuration option.

To change the RabbitMQ server that docassemble uses, edit the RABBITMQ configuration option.

If you are only using a single Docker container to run the docassemble web application and Celery, then even if you are using an external SQL server, external Redis server, and external RabbitMQ server, you can keep the CONTAINERROLE as all (or undefined).

Port opening

Note that for every service that a Docker container provides, appropriate ports need to be forwarded from the Docker host machine to the container.

For example:

docker run \
--env CONTAINERROLE=sql:redis \
...
-d -p 5432:5432 -p 6379:6379 -p 9001:9001 \
--stop-timeout 600 jhpyle/docassemble
docker run \
--env CONTAINERROLE=web:celery \
...
-d -p 80:80 -p 443:443 -p 9001:9001 \
--stop-timeout 600 jhpyle/docassemble

Note that Docker will fail if any of these ports is already in use. For example, many Linux distributions run a mail transport agent on port 25 by default; you will have to stop that service in order to start Docker with -p 25:25. For example, on Amazon Linux you may need to run:

sudo /etc/init.d/sendmail stop

File sharing

If you run multiple docassemble Docker containers on different machines, the containers will need to have a way to share files with one another.

One way to share files among containers is to make /usr/share/docassemble/ a persistent volume on a network file system. This directory contains the configuration, SSL certificates, Python virtual environment, and uploaded files. However, network file systems present problems.

A preferable way to share files is with Amazon S3 or Azure blob storage, which docassemble supports. See the using S3 and using Azure blob storage sections for instructions on setting this up.

Configuration file

Note that when you use the cloud (S3 or Azure blob storage) for data storage, docassemble will copy the config.yml file out of the cloud on startup, and save config.yml to the cloud whenever the configuration is modified.

This means that as long as there is a config.yml file in the cloud with the configuration you want, you can start docassemble containers without specifying a lot of configuration options; you simply have to refer to your cloud storage bucket/container, and docassemble will take it from there. For example, to run a central server, you can do:

docker run \
--env CONTAINERROLE=sql:redis:rabbitmq:log:cron:mail \
--env S3BUCKET=docassemble-example-com \
--env S3ACCESSKEY=FWIEJFIJIDGISEJFWOEF \
--env S3SECRETACCESSKEY=RGERG34eeeg3agwetTR0+wewWAWEFererNRERERG \
-d -p 80:8080 -p 25:25 -p 5432:5432 -p 514:514 \
-p 6379:6379 -p 4369:4369 -p 5671:5671 \
-p 5672:5672 -p 25672:25672 -p 9001:9001 \
--stop-timeout 600 jhpyle/docassemble

To run an application server, you can do:

docker run \
--env CONTAINERROLE=web:celery \
--env S3BUCKET=docassemble-example-com \
--env S3ACCESSKEY=FWIEJFIJIDGISEJFWOEF \
--env S3SECRETACCESSKEY=RGERG34eeeg3agwetTR0+wewWAWEFererNRERERG \
-d -p 80:80 -p 443:443 -p 9001:9001 \
--stop-timeout 600 jhpyle/docassemble

Encrypting communications

Using HTTPS

If you are running docassemble on EC2, the easiest way to enable HTTPS support is to set up an Application Load Balancer that accepts connections in HTTPS format and forwards them to the web servers in HTTP format. In this configuration Amazon takes care of creating and hosting the necessary SSL certificates.

If you are not using a load balancer, you can use HTTPS either by setting up Let’s Encrypt or by providing your own certificates.

With Let’s Encrypt

If you are running docassemble in a single-server arrangement, or in a multi-server arrangement with only one web server, you can use Let’s Encrypt to enable HTTPS. If you have more than one web server, you can enable encryption without Let’s Encrypt by installing your own certificates.

To use Let’s Encrypt, set the following environment variables in your task definition or env.list file:

  • USELETSENCRYPT: set this to true.
  • LETSENCRYPTEMAIL: Let’s Encrypt requires an e-mail address, which it will use to get in touch with you about renewing the SSL certificates.
  • DAHOSTNAME: set this to the hostname that users will use to get to the web application. Let’s Encrypt needs this in order to verify that you have access to the host.
  • USEHTTPS: set this to true.

For example, your env.list may look like:

CONTAINERROLE=all
USEHTTPS=true
DAHOSTNAME=docassemble.example.com
USELETSENCRYPT=true
[email protected]
TIMEZONE=America/New_York

The first time the server is started, the letsencrypt utility will be run, which will change the NGINX configuration in order to use the appropriate SSL certificates. When the server is later restarted, the letsencrypt renew command will be run, which will refresh the certificates if they are within 30 days of expiring.

In addition, a script will run on a weekly basis to attempt to renew the certificates.

If you are using a multi-server arrangement with a single web server, you need to run the cron role on the same server that runs the web role. If you use the e-mail receiving feature with TLS encryption, the mail role also has to share the server with the web and cron roles.

Using your own certificates

If you do not want to use Let’s Encrypt to support HTTPS, or you have other SSL certificates that your server needs to use, you can pass your certificates to docassemble.

Using your own SSL certificates with Docker requires that your SSL certificates reside within each container. There are several ways to get your certificates into the container:

  • Use S3 or Azure blob storage and upload the certificates to certs/ in your bucket/container.
  • Build your own private image in which your SSL certificates are placed in Docker/ssl. During the build process, these files will be copied into /usr/share/docassemble/certs.
  • Use persistent volumes and copy the SSL certificate files into the volume for /usr/share/docassemble/certs before starting the container.

The default NGINX configuration file expects SSL certificates to be located in the following files:

ssl_certificate /etc/ssl/docassemble/nginx.crt;
ssl_certificate_key /etc/ssl/docassemble/nginx.key;

The meaning of these files is as follows:

  • nginx.crt: this file is generated by your certificate authority when you submit a certificate signing request.
  • nginx.key: this file is generated at the time you create your certificate signing request.

Other certificate files that docassemble uses include:

  • exim.crt - certificate for the Exim mail daemon
  • exim.key - certificate key for Exim mail daemon
  • redis.crt - certificate for an external Redis server
  • redis.key - certificate key for an external Redis server
  • redis_ca.crt - certificate authority certificate for an external Redis server

In addition, if your db configuration refers to an ssl cert, ssl key, or ssl root cert, these need to be the names of files that are present in certificate storage.

In order to make sure that these certificates are replicated on every web server, the supervisor will run the docassemble.webapp.install_certs module before starting the web server.

If you are using S3 or Azure blob storage, this module will copy the files from the certs/ prefix in your bucket/container to /etc/ssl/docassemble. You can use the S3 Console or the Azure Portal to create a folder called certs and upload your certificate files into that folder.

If you are not using S3 or Azure blob storage, the docassemble.webapp.install_certs module will copy the files from /usr/share/docassemble/certs to /etc/ssl/docassemble.

There are two ways that you can put your own certificate files into the /usr/share/docassemble/certs directory. The first way is to create your own Docker image of docassemble and put your certificates into the Docker/ssl directory. The contents of this directory are copied into /usr/share/docassemble/certs during the build process.

The second way is to use persistent volumes. If you have a persistent volume for the directory /usr/share/docassemble/certs, you can copy the SSL certificate files into that directory before starting the container.

If you are starting a new server using a persistent volume, you can set up your own certificates as follows.

Create an env.list file like the following:

DAHOSTNAME=myserver.example.com
TIMEZONE=America/New_York
USEHTTPS=true

(Of course, you may also need additional environment variables, such as EC2, depending on your setup.)

Create a docker volume called dacerts using a temporary container based on the minimal BusyBox image.

docker run -v dacerts:/data --name deleteme busybox true

Now copy your SSL certificate files to the volume.

docker cp nginx.crt deleteme:/data/
docker cp nginx.key deleteme:/data/

You may want to copy other certificates as well, for example for PostgreSQL or Redis.

Now delete the BusyBox container. (Your volume will not be deleted.)

docker rm deleteme

Now start your docassemble container using the dacerts volume mounted to /usr/share/docassemble/certs:

docker run \
  --stop-timeout 600 \
  --env-file=env.list \
  -v dabackup:/usr/share/docassemble/backup \
  -v dacerts:/usr/share/docassemble/certs \
  -d -p 80:80 -p 443:443 jhpyle/docassemble

When it comes time to update your NGINX certificate files, save the new certificates as nginx.crt and nginx.key, and then do:

docker cp nginx.crt a3970318cb38:/usr/share/docassemble/certs/
docker cp nginx.key a3970318cb38:/usr/share/docassemble/certs/

replacing a3970318cb38 with whatever the ID or name of your container is.

Then restart your container:

docker stop -t 600 a3970318cb38
docker start a3970318cb38

This last step is important; the location /usr/share/docassemble/certs/ is not a working directory, but a staging area. If the server is running, changing the files in that directory will not change the certificates that docassemble uses. You need to stop and start the container for docassemble.webapp.install_certs to copy the files to the correct working directories and for the services to restart using the new certificates.

If you want to use different filesystem or cloud locations, the docassemble.webapp.install_certs module can be configured to use different locations. See the configuration variable certs.

Using TLS for incoming e-mail

If you use the e-mail receiving feature, you can use TLS to encrypt incoming e-mail communications. By default, docassemble will install self-signed certificates into the Exim configuration, but for best results you should use certificates that match your incoming mail domain.

If you are using Let’s Encrypt to obtain your HTTPS certificates in a single-server arrangement, then docassemble will use your Let’s Encrypt certificates for Exim.

However, if you are running your mail server as part of a dedicated backend server that does not include web, you will need to create and install your own certificates. In addition, if your incoming mail domain is different from your external hostname (DAHOSTNAME), then you will also need to install your own certificates.

The process of installing your own Exim certificates is very similar to the process of installing HTTPS certificates.

If you are using S3 or Azure blob storage, copy your certificate and private key to the certs folder of your S3 bucket or Azure blob storage container, using the filenames exim.crt and exim.key, respectively.

If you are not using S3 or Azure blob storage, save these files as:

  • /usr/share/docassemble/certs/exim.crt (certificate)
  • /usr/share/docassemble/certs/exim.key (private key)

On startup, docassemble.webapp.install_certs will copy these files into the appropriate location (/etc/exim4) with the appropriate ownership and permissions.

Using a web server and a reverse proxy

Instead of having users access your docassemble interviews at https://da.foobar.com, where the DNS for da.foobar.com points to your docassemble server, and SSL certificates are obtained by your docassemble container, you can have your users access your docassemble interviews at https://foobar.com/da, where the DNS for foobar.com points to a web server you operate, and that web server acts as a go-between between the user’s web browser and the docassemble server. The docassemble server may operate on a same machine that runs your web server, or a different machine. The machine that operates the docassemble server does not have to be exposed to the internet; it might be on a local network, so long as the web server can access it.

You should use this deployment strategy if you wish to embed a docassemble interview into another site using an <iframe>. In the past, using an <iframe> was a convenient way to allow HTML content from a different server to appear inside your server. However, in recent years, web browsers have become more restrictive about Cross-Origin Resource Sharing. Browsers like Safari will block <iframe> content that stores information in the user’s browser if the URL of the <iframe> uses a hostname that is different from the hostname in the browser location bar.

The following example illustrates how to do this. Your situation will probably be different, but this example will still help you figure out how to configure your system.

Example using NGINX

The example will demonstrate how to run docassemble using Docker on an Ubuntu 22.04 server running in the cloud. The machine that runs Docker will also run the NGINX web browser. NGINX will be configured to use encryption and it will listen on ports 80 and 443. The web server will be accessible at https://justice.example.com and will serve resources other than docassemble. The docassemble resources will be accessible at https://justice.example.com/da. Docker will run on the machine and will listen on ports 8080 and 8050. The web server will accept HTTPS requests at /da and forward thema as HTTP requests to port 8080. The SSL certificate will be installed on the Ubuntu server, and the Docker container will run an HTTP server. Docker will be controlled by the user account ubuntu, which is assumed to have sudo privileges.

This example uses only one machine, but if you want to have a separate machine for your web browser and a separate machine for running docassemble, it is easy to set that up.

If you want to follow along with this example, make sure that you have purchased a domain name from a domain registrar and you have set up a CNAME record or an A record in your DNS configuration that associates a hostname with your server. Also make sure that the firewall protecting the machine has ports 80 (HTTP) and 443 (HTTPS) open.

In this example, we own the domain example.com domain and we have set up an A record in our DNS configuration that associates justice.example.com with the IP address of our server.

First, let’s install NGINX, Let’s Encrypt (the certbot utility), and Docker on the Ubuntu server. We use an SSH client to log in to the server as the user ubuntu. Then we run some commands on the command line:

sudo apt -y update
sudo apt -y install snapd nginx docker.io
sudo snap install --classic certbot
sudo usermod -a -G docker ubuntu

The last command changes the user privileges of the ubuntu user. For these changes to take effect, you need to log out and log in again. (E.g., exit the ssh session and start a new one.)

At this point, your web server should be running and should be visible from the internet. In our example, we can visit http://justice.example.com and we are greeted by a page that says “Welcome to nginx!”

This is good, but we want to access our server using https://, not http://. We will encounter a lot of problems if our connection runs on http://. In order to enable HTTPS, we can run certbot. certbot is an application that automates the process of obtaining SSL certificates from Let’s Encrypt and modifying the web browser configuration files so that they use these new certificates.

To run certbot, do:

sudo certbot --nginx

Answer all of the prompts that appear. It is particularly important that you provide the correct domain name. In our example, we entered justice.example.com.

Note that if you are using AWS and you are given a hostname such as ec2-54-213-142-150.us-west-2.compute.amazonaws.com, certbot will not issue a certificate for this hostname. You must purchase a real hostname of your own from a domain name registrar.

It is also important that you provide a good e-mail address to certbot. You will get an e-mail from Let’s Encrypt if your certificate is about to expire.

Upon completion, certbot shows the message “Congratulations! You have successfully enabled HTTPS on https://justice.example.com.”

Now, if you visit your web site again, you will see it redirects your browser to the https:// version of the site. In the browser you can see a padlock next to the location bar, indicating that the web site uses encryption.

Now that you have your web server running, you can install docassemble. (In this example, we will install it on the same server that is running NGINX, but you can also use a different machine.) First you need to create a short text file called env.list that contains some configuration options for docassemble.

nano env.list

Set the contents of env.list to:

BEHINDHTTPSLOADBALANCER=true
POSTURLROOT=/da/
DAEXPOSEWEBSOCKETS=true

Inside of nano, you can save the file by typing Ctrl-s and exit by typing Ctrl-x.

The POSTURLROOT variable, which is set to /da/, indicates the path after the domain at which docassemble can be accessed. The NGINX web server will be able to provide other resources at other paths, but /da/ will be reserved for the exclusive use of docassemble. The beginning slash and the trailing slash are both necessary.

Setting DAEXPOSEWEBSOCKETS to true means that the WebSocket server running inside the container (the supervisor process called websockets) will expose port 5000 to the external IP address rather than port 5000 of 127.0.0.1, so that the web server on the host can act as a proxy server for it.

Now, let’s download, install, and run docassemble.

docker run --env-file=env.list -v dabackup:/usr/share/docassemble/backup -d -p 8080:80 -p 8050:5000 jhpyle/docassemble

The option -p 8080:80 means that port 8080 on the Ubuntu machine will be mapped to port 80 within the Docker container. The option -p 8050:5000 means that the web sockets port of the container should be accessible on port 8050 of the host, so that the web server on the host can tunnel traffic to it directly. Note that ports 8080 and 8050 are not available from the internet (unless you configured your firewall to allow such access); what is important is that they are available to the NGINX web server that is running on the host.

Now, let’s edit the NGINX configuration so that the docassemble application is accessible through the NGINX web server.

nano /etc/nginx/sites-available/default

Scroll down to the second server { configuration. Look for a line that looks like this:

server_name justice.example.com; # managed by Certbot

After this line, put in the following:

location /da/ws {
    include proxy_params;
    proxy_pass http://localhost:8050;
}

location /da/ws/socket.io {
    include proxy_params;
    proxy_http_version 1.1;
    proxy_buffering off;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "Upgrade";
    proxy_pass http://localhost:8050/socket.io;
}

location /da {
    include proxy_params;
    proxy_pass http://localhost:8080;
}

The last location configuration is the most important setting. The others support websockets connections, which support the Live Help feature.

Next, restart NGINX so that it uses the new configuration.

sudo systemctl restart nginx

Now, we can access the docassemble server at https://justice.example.com/da.

Next, we can build a web site (using non-docassemble tools) and operate it on the NGINX web server running at https://justice.example.com. Any URL that does not start with /da will be handled by NGINX in the ordinary fashion.

If we wanted to embed a docassemble interview into a page of this web site using an <iframe>, there would be no CORS issues because from the web browser’s perspective, docassemble is just another page on the https://justice.example.com web site.

Example using Apache

If you prefer to use the Apache web server instead of NGINX, you can follow the above procedure, but instead of installing NGINX, install Apache, run sudo certbot --apache.

Install the following Apache modules.

sudo a2enmod proxy
sudo a2enmod proxy_http
sudo a2enmod proxy_wstunnel
sudo a2enmod headers

Edit the configuration file inside of /etc/apache2/sites-enabled so that it contains:

RewriteEngine On
RewriteCond %{REQUEST_URI}    ^/da/ws/socket.io     [NC]
RewriteCond %{QUERY_STRING}   transport=websocket   [NC]
RewriteRule /da/ws/(.*)  ws://localhost:8050/$1    [P,L]

ProxyPass /da/ws/ http://localhost:8050/
ProxyPassReverse /da/ws/ http://localhost:8050/

ProxyPass "/da"  "http://localhost:8080/da"
ProxyPassReverse "/da"  "http://localhost:8080/da"
RequestHeader set X-Forwarded-Proto "https"

Then restart the Apache server so that it uses the new configuration.

sudo systemctl restart apache2

Creating your own Docker image

To create your own Docker image, first make sure git is installed. If you are using Docker Desktop on a Windows PC or a Mac, you may find that git is already installed, or the instructions may explain how to run git using a Docker container.

If you are using Linux, installing git will look something like:

sudo apt -y install git

or

sudo yum -y install git

Then download docassemble, which consists of two images:

git clone https://github.com/jhpyle/docassemble-os
git clone https://github.com/jhpyle/docassemble

Each of these repositories contains a Dockerfile. The jhpyle/docassemble repository depends on jhpyle/docassemble-os. (The jhpyle/docassemble-os repository is separate because it contains operating system files and takes a long time to build. If you are on the amd64 platform and you want to modify the Docker image, you can download the jhpyle/docassemble-os image and then docker build the jhpyle/docassemble repository only.)

To make changes to the operating system or the operating system packages that are installed inside the container, edit the Dockerfile in the jhpyle/docassemble-os repository.

To make changes to the configuration of the docassemble application, edit the following files in the jhpyle/docassemble repository:

To build the image, run:

cd docassemble-os
docker build -t jhpyle/docassemble-os .
cd ../docassemble
docker build -t jhpyle/docassemble .

You can then run your image:

docker run -d -p 80:80 -p 443:443 --stop-timeout 600 jhpyle/docassemble

Or push it to Docker Hub:

docker tag yourdockerhubusername/mydocassemble jhpyle/docassemble
docker push yourdockerhubusername/mydocassemble

ARM support

Using docassemble on the ARM architecture is considered experimental. The images on Docker Hub are amd64-only, so if you want to run docassemble on ARM, you will need to use docker build to build the jhpyle/docassemble-os and jhpyle/docassemble images. The known issues with ARM compatibility are:

  • The DAGoogleAPI object cannot be used because the dependency package it relies on causes a C memory allocation error to be raised.
  • Google Chrome is not installed if the architecture is ARM, so you cannot use headless Chrome for web browser automation.

Upgrading docassemble when using Docker

New versions of the docassemble software are published frequently. Most changes only affect the Python code. You can upgrade the docassemble Python packages by going to “Package Management” from the menu and clicking the “Upgrade” button.

However, sometimes a “system upgrade” is necessary. This can happen when changes are made to docassemble’s underlying operating system files, or new versions of Python packages become incompatible with old versions of operating system files. Performing a “system upgrade” requires stopping your docassemble container and running docker run with a new version of the docassemble image in order to start a new container based on the new image.

Doing a system upgrade is only safe if you are already using a form of data storage. If you aren’t using a Docker volume, S3, or Azure blob storage, then your data will be lost if you attempt a system upgrade.

Overview of a system upgrade

The basic steps of a system upgrade on a server are:

  1. Safely shut down the docassemble server using docker stop. This will save your SQL database, Redis database, and files to data storage.
  2. Free up disk space by using docker rm and docker rmi to delete copies of the old docassemble containers and images.
  3. Start a new container from the latest docassemble image using the same docker run command you ran when you created the original container.

When the new container starts up, it will retrieve the SQL database, Redis database, and files from data storage and restore them into the new container. Python packages you had installed on your old container will be installed during the startup process. Your new docassemble container will work just like the old one, except its operating system will be upgraded and the docassemble software will be fully upgraded.

Docker tools that are helpful when doing a system upgrade

If you are going to perform a system upgrade, it is important that you understand some things about how Docker works.

Docker saves every image it uses to disk. So if you ran docker run two years ago, Docker downloaded the image jhpyle/docassemble and stored a copy of it. If you ran docker run today on the jhpyle/docassemble image, Docker would use the downloaded image from two years ago instead of downloading the latest image from Docker Hub.

The docassemble images take up a lot of disk space. One of the easiest ways to run out of disk space when using docassemble is to download too many copies of the jhpyle/docassemble image without deleting old ones.

docassemble containers also take up disk space. If you have old unused containers on your server, they will take up disk space and also prevent you from deleting the images from which they were created.

As long as you know that your data are backed up to data storage, you can clear up disk space and you don’t need to worry about losing your data. If you are using S3 or Azure blob storage as your data storage method, there is little to worry about; you can even delete the host computer without worrying about losing your data. However, if you are using a Docker volume for data storage, you need to be careful not to delete your volume (typically called dabackup) or the host computer. If you want to switch to a different host computer, you can copy the volume from one computer to another.

Here are some Docker commands you might want to use while doing a system upgrade:

  • docker ps will list all of the containers that are currently running.
  • docker stop -t 600 45034cf698b1 will stop the container with container ID 45034cf698b1 and it will give it ten minutes (600 seconds) to shut down safely.
  • docker ps -a will list all of the containers on the server, whether they are running or not.
  • docker rm 45034cf698b1 will delete the container with container ID 45034cf698b1 if the container is stopped.
  • docker images will list the docker images that Docker has downloaded.
  • docker rmi 22cd380ad224 will delete the image with the image ID 22cd380ad224 so long as no existing containers depend on the image.
  • docker volume ls will list the Docker volumes that exist on the system.
  • docker system df will report how much disk space Docker is using.

Stopping the Docker container

The first step of doing a system upgrade is to stop the Docker container. Use docker ps to determine the container ID or name (e.g., 45034cf698b1, and then use docker stop -t 600 45034cf698b1 to stop the container. The -t 600 indicates that Docker should wait 600 seconds, or ten minutes, for the container to safely stop.

It is very important that the shutdown is given enough time to complete, because part of the shutdown process involves dumping the SQL database, saving the Redis database, and copying files to data storage. Shutdown will be quicker if you are using S3/Azure blob storage, because in that case your files are already located in data storage and don’t need to be copied there. Shutdown will be even quicker if you are using an external SQL server and an external Redis server, in which case those databases do not need to be dumped out to data storage.

Doing docker stop -t 600 gives the machine ten minutes to shut down, which is probably more than enough time. If you think it actually took 10 minutes for the docker stop command to complete, then you should do docker ps -a to see what the exit status was. If it says something like Exited (0) About a minute ago, then that is good, because the exit status was 0. If the exit status was something else, that means Docker had to forcibly kill the container, which may have interrupted the container’s backup process. In that case, you should start the container again, wait for it to boot, and then do the shutdown again, but give it even more time to shut down. Or, you may need to investigate why it took so long to shut down in the first place.

If you are using a Docker volume for data storage, it is important to make sure that the backup procedures that were carried out when you did docker stop did not exhaust your disk space. If you have voluminous generated and uploaded files, you will need disk space to support multiple copies of the files. If the process of backing up the files exhausts disk space, then your data storage will not contain all of your files.

You can docker exec into your container and do:

df -h

This will give a report like:

Filesystem      Size  Used Avail Use% Mounted on
overlay          78G   28G   50G  36% /
tmpfs            64M     0   64M   0% /dev
tmpfs           2.0G     0  2.0G   0% /sys/fs/cgroup
shm              64M   24K   64M   1% /dev/shm
/dev/vda1        78G   28G   50G  36% /etc/hosts
tmpfs           2.0G     0  2.0G   0% /proc/acpi
tmpfs           2.0G     0  2.0G   0% /proc/scsi
tmpfs           2.0G     0  2.0G   0% /sys/firmware

The important line is the one that relates to the filesystem mounted on /. The Avail column indicates how much free space is available.

To find out how much space your uploaded and generated files are using, you can do:

du -hs /usr/share/docassemble/files

This will report something like:

4.1G	/usr/share/docassemble/files

If you are using S3 or Azure blob storage, files are not stored in /usr/share/docassemble/files; disk space is likely only going to be an issue if you are using a Docker volume for data storage.

The following three lines will stop all containers, remove all containers, and then remove all of the images that Docker created during the build process.

docker stop -t 600 $(docker ps -a -q)
docker rm $(docker ps -a -q)
docker rmi $(docker images | awk "{print $3}")

While these commands are helpful, you should understand what they are doing before you run them. The first command stops all the containers. It uses the output of docker ps -a -q to get a list of the container IDs of all of the containers on the system, and passes that list to docker stop. The second command deletes all of the containers on the system. The third command deletes all the images on the server. This last command frees up the most disk space. It is usually necessary to remove the containers first (the docker rm line), as the containers depend on the images.

Note that if you try to run these commands, you might get an error like "docker stop" requires at least 1 argument. This is harmless; it happens when the part inside $() does not return any output.

Thus, so long as you are using data storage, and you aren’t running any applications other than docassemble using Docker, it is recommended that you perform a system upgrade by running:

If you don’t know what you are doing, do not follow the instructions above and instead get help, or spend time learning how Docker works before you attempt this. Doing docker rm permanently deletes your server, so it’s not something you should be doing unless you know for a fact that your data are backed up in data storage.

If your host OS is old, you may want to upgrade your host OS and Docker itself while the docassemble server is stopped. You may also want to migrate to a new host.

If you want to move your docassemble server to a new host, and you are using a Docker volume for data storage, then the steps are:

  1. On the existing host, safely shut down the docassemble server using docker stop.
  2. Create a new host.
  3. Transfer the Docker volume to the new host using:
docker run --rm -v dabackup:/from alpine ash -c "cd /from ; tar -cf - . " | ssh -i cert.rsa [email protected] 'docker run --rm -i -v dabackup:/to alpine ash -c "cd /to ; tar -xpvf - " '
  1. Start a new container from the latest docassemble image using docker run.

Installing an earlier version of docassemble when using Docker

When you do docker run or docker pull, the only image available on Docker Hub is the “latest” image. To install a version based on an earlier version of docassemble, you can make your own image using git.

git clone https://github.com/jhpyle/docassemble
cd docassemble
git checkout v0.3.21
docker build -t yourname/mydocassemble .
cd ..
docker run -d -p 80:80 -p 443:443 --stop-timeout 600 yourname/mydocassemble

The docker run command that you use may have other options; this is simply an illustration of creating an image called yourname/mydocassemble and then creating a container from it using docker run.

Starting with version 0.5, the docassemble image is split into two parts. The jhpyle/docassemble image uses jhpyle/docassemble-os as a base image. The jhpyle/docassemble-os image consists of the underlying Ubuntu operating system with required Ubuntu packages installed. The jhpyle/docassemble-os image is updated much less frequently than the jhpyle/docassemble image. If you want to build your own version of jhpyle/docassemble-os, you can do so by running:

git clone https://github.com/jhpyle/docassemble-os
cd docassemble-os
docker build -t jhpyle/docassemble-os .
cd ..

The docassemble-os repository consists of a Dockerfile only. Note that the first line of the Dockerfile in the docassemble repository is:

FROM jhpyle/docassemble-os

Thus, the jhpyle/docassemble image incorporates by reference the jhpyle/docassemble-os base image. The docker build command above overwrites the jhpyle/docassemble-os image that is stored on your local machine. If you want, you can edit the Dockerfile before building your custom jhpyle/docassemble version so that it references a different base image.