Installing on Docker

Docker is a good platform for trying out docassemble for the first time. It is also ideal in a production environment. Amazon’s EC2 Container Service can be used to maintain a cluster of docassemble web server instances, created from Docker images, that communicate with a central server. For information about how to install docassemble in a multi-server arrangement on EC2 Container Service (“ECS”), see the scalability section.

Installing Docker

First, make sure you are running Docker on a computer or virtual computer with at least 1GB of memory and 20GB of hard drive space.

If you have a Windows PC, follow the Docker installation instructions for Windows. You will need administrator access on your PC in order to install (or upgrade) Docker.

If you have a Mac, follow the Docker installation instructions for OS X.

On Ubuntu (assuming username ubuntu):

sudo apt-get -y update
sudo apt-get -y install docker.io
sudo usermod -a -G docker ubuntu

On Amazon Linux (assuming the username ec2-user):

sudo yum -y update
sudo yum -y install docker
sudo usermod -a -G docker ec2-user

The usermod line allows the non-root user to run Docker. You may need to log out and log back in again for this new user permission to take effect.

Docker will probably start automatically after it is installed. On Linux, you many need to do sudo /etc/init.d/docker start, sudo systemctl start docker, or sudo service docker start.

Quick start

Once Docker is installed, you can install and run docassemble from the command line.

To get a command line on Windows, run Windows PowerShell.

To get a command line on a Mac, launch the Terminal application.

Starting

From the command line, simply type in:

docker run -d -p 80:80 jhpyle/docassemble

The docker run command will download and run docassemble, making the application available on the standard HTTP port (port 80) of your machine.

It will take several minutes for docassemble to download, and once the docker run command finishes, docassemble will start to run. After a few minutes, you can point your web browser to the hostname of the machine that is running Docker. If you are running Docker on your own computer, this address is probably http://localhost.

If you are running Docker on AWS, the address will be something like http://ec2-52-38-111-32.us-west-2.compute.amazonaws.com (check your EC2 configuration for the hostname). On AWS, you will need a Security Group that opens HTTP (port 80) to the outside world in order to allow web browsers to connect to your EC2 instance.

Using the web browser, you can log in using the default username (“[email protected]”) and password (“password”), and make changes to the configuration from the menu.

In the docker run command, the -d flag means that the container will run in the background.

The -p flag maps a port on the host machine to a port on the Docker container. In this example, port 80 on the host machine will map to port 80 within the Docker container. If you are already using port 80 on the host machine, you could use -p 8080:80, and then port 8080 on the host machine would be passed through to port 80 on the Docker container.

The jhpyle/docassemble tag refers to a Docker image that is hosted on Docker Hub. The image is about 2GB in size, and when it runs, the container uses about 10GB of hard drive space. The jhpyle/docassemble image is based on the “master” branch of the docassemble repository on GitHub. It is rebuilt every time the minor version of docassemble increases.

Shutting down

Shut down the container by running:

docker stop -t 60 <containerid>

By default, Docker gives containers ten seconds to shut down before forcibly shutting them down. Usually, ten seconds is plenty of time, but if the server is slow, docassemble might take a little longer than ten seconds to shut down. To be on the safe side, give the container plenty of time to shut down gracefully. The -t 60 means that Docker will wait up to 60 seconds before forcibly shutting down the container. It will probably take no more than 15 seconds for the docker stop command to complete.

It is very important to avoid a forced shutdown of docassemble. The container runs a PostgreSQL server, and the data files of the server may become corrupted if PostgreSQL is not gracefully shut down. To facilitate data storage (more on this later), docassemble backs up your data during the shutdown process and restores from that backup during the initialization process. If the shutdown process is interrupted, your data may be left in an inconsistent state and there may be errors during later initialization.

To see a list of stopped containers, run docker ps -a. To remove a container, run docker rm <containerid>.

Troubleshooting

You should not need to access the running container in order to get docassemble to work, and all the log files you need will hopefully be available from “Logs” in the web browser. However, you might want to gain access to the running container for some reason.

To do so, find out the ID of the running container by doing docker ps. You will see output like the following:

CONTAINER ID  IMAGE  COMMAND  CREATED  STATUS  PORTS  NAMES
e4fa52ba540e  jhpyle/docassemble  "/usr/bin/supervisord" ...

The ID is in the first column. Then run:

docker exec -t -i e4fa52ba540e /bin/bash

using your own ID in place of e4fa52ba540e. This will give you a command prompt within the running container.

The first thing to check when you connect to a container is:

supervisorctl status

The output should be something like:

apache2                          RUNNING   pid 1088, uptime 0:14:15
celery                           RUNNING   pid 1865, uptime 0:14:45
cron                             STOPPED   Not started
initialize                       RUNNING   pid 1014, uptime 0:14:53
postgres                         RUNNING   pid 1020, uptime 0:14:42
rabbitmq                         RUNNING   pid 1045, uptime 0:14:38
redis                            RUNNING   pid 1067, uptime 0:14:35
reset                            STOPPED   Not started
sync                             EXITED    Dec 22 07:22 AM
syslogng                         STOPPED   Not started
update                           STOPPED   Not started
watchdog                         RUNNING   pid 1013, uptime 0:14:53
websockets                       RUNNING   pid 1904, uptime 0:14:40

If you are running docassemble in a single-server arrangement, the processes that should be “RUNNING” include apache2, celery, initialize, postgres, rabbitmq, redis, watchdog, and websockets.

Log files on the container that you might wish to check include:

  • /var/log/supervisor/initialize-stderr---supervisor-*.log
  • /var/log/supervisor/postgres-stderr---supervisor-*.log
  • Other files in /var/log/supervisor/
  • /var/log/apache2/error.log
  • /usr/share/docassemble/log/docassemble.log

Enter exit to leave the container and get back to your standard command prompt.

Configuration options

In the example above, we started docassemble with docker run -d -p 80:80 jhpyle/docassemble. This command will cause docassemble to use default values for all configuration options. You can also communicate specific configuration options to the container.

The recommended way to do this is to create a text file called env.list in the current working directory containing environment variable definitions in standard shell script format. For example:

DAHOSTNAME=docassemble.example.com
USEHTTPS=true
USELETSENCRYPT=true
[email protected]

Then, you can pass these environment variables to the container using the docker run command:

docker run --env-file=env.list -d -p 80:80 -p 443:443 jhpyle/docassemble

These configuration options will cause the Apache configuration file to use docassemble.example.com as the ServerName and use HTTPS with certificates hosted on Let’s Encrypt. (The directive -p 443:443 is included so that the HTTPS port is exposed.)

A template for the env.list file is included in distribution.

When running docassemble in ECS, environment variables like these are specified in JSON text that is entered into the web interface. (See the scalability section for more information about using ECS.)

In your env.list file, you can set a variety of options.

The following two options are specific to the particular server being started (which, in a multi-server arrangement, will vary from server to server).

  • CONTAINERROLE: either all or a colon-separated list of services (e.g. web:celery, sql:log:redis, etc.) that should be started by the server. The available options are:
    • all: the Docker container will run all of the services of docassemble on a single container.
    • web: The Docker container will serve as a web server.
    • celery: The Docker container will serve as a Celery node.
    • sql: The Docker container will run the central PostgreSQL service.
    • redis: The Docker container will run the central Redis service.
    • rabbitmq: The Docker container will run the central RabbitMQ service.
    • log: The Docker container will run the central log aggregation service.
    • mail: The Docker container will run Exim in order to accept e-mails.
  • SERVERHOSTNAME: In a multi-server arrangement, all docassemble application servers need to be able to communicate with each other using port 9001 (the supervisor port). All application servers “register” with the central SQL server. When they register, they each provide their hostname; that is, the hostname at which the specific individual application server can be found. Then, when an application server wants to send a message to the other application servers, the application server can query the central SQL server to get a list of hostnames of other application servers. This is necessary so that any one application server can send a signal to the other application servers to install a new package or a new version of a package, so that all servers are running the same software. If you are running docassemble in a multi-server arrangement, and you are starting an application server, set SERVERHOSTNAME to the hostname with which other application servers can find that server. Note that you do not need to worry about setting SERVERHOSTNAME if you are using EC2, because Docker containers running on EC2 can discover their actual hostnames by querying a specific IP address.

The other options you can set in env.list are global for your entire docassemble installation, rather than specific to the server being started.

The following eight options indicate where an existing configuration file can be found on S3 or Azure blob storage. If a configuration file exists in the cloud at the indicated location, that configuration file will be used to set the configuration of your docassemble installation. If no configuration file yet exists in the cloud at the indicated location, docassemble will create an initial configuration file and store it in the indicated location.

  • S3ENABLE: Set this to True if you are using S3 as a repository for uploaded files, Playground files, the configuration file, and other information. This environment variable, along with others that begin with S3, populates values in s3 section of the initial configuration file. If this is unset, but S3BUCKET is set, it will be assumed to be True.
  • S3BUCKET: If you are using S3, set this to the bucket name. Note that docassemble will not create the bucket for you. You will need to create it for yourself beforehand. The bucket should be empty.
  • S3ACCESSKEY: If you are using S3, set this to the S3 access key. You can ignore this environment variable if you are using EC2 with an IAM role that allows access to your S3 bucket.
  • S3SECRETACCESSKEY: If you are using S3, set this to the S3 access secret. You can ignore this environment variable if you are using EC2 with an IAM role that allows access to your S3 bucket.
  • S3REGION: If you are using S3, set this to the region you are using (e.g., us-west-1, us-west-2, ca-central-1).
  • AZUREENABLE: Set this to True if you are using Azure blob storage as a repository for uploaded files, Playground files, the configuration file, and other information. This environment variable, along with others that begin with AZURE, populates values in azure section of the configuration file. If this is unset, but AZUREACCOUNTNAME, AZUREACCOUNTKEY, and AZURECONTAINER are set, it will be assumed to be True.
  • AZURECONTAINER: If you are using Azure blob storage, set this to the container name. Note that docassemble will not create the container for you. You will need to create it for yourself beforehand.
  • AZUREACCOUNTNAME: If you are using Azure blob storage, set this to the account name.
  • AZUREACCOUNTKEY: If you are using Azure blob storage, set this to the account key.

The following options are useful for pre-populating a fresh configuration with particular values. These environment variables are effective only during an initial run of the Docker container, when a configuration file does not already exist. If you are using persistent volumes, or you have set the options above for S3/Azure blob storage and a configuration file exists in your cloud storage, the values in that configuration file will take precedence over any values you specify in env.list.

  • DBHOST: The hostname of the PostgreSQL server. Keep undefined or set to null in order to use the PostgreSQL server on the same host. This environment variable, along with others that begin with DB, populates values in db section of the configuration file.
  • DBNAME: The name of the PostgreSQL database. The default is docassemble.
  • DBUSER: The username for connecting to the PostgreSQL server. The default is docassemble.
  • DBPASSWORD: The password for connecting to the SQL server. The default is abc123. The password cannot contain the character #.
  • DBPREFIX: This sets the prefix for the database specifier. The default is postgresql+psycopg2://.
  • DBPORT: This sets the port that docassemble will use to access the SQL server. If you are using the default port for your database backend, you do not need to set this.
  • DBTABLEPREFIX: This allows multiple separate docassemble implementations to share the same SQL database. The value is a prefix to be added to each table in the database.
  • EC2: Set this to True if you are running Docker on EC2. This tells docassemble that it can use an EC2-specific method of determining the hostname of the server on which it is running. See the ec2 configuration directive.
  • USEHTTPS: Set this to True if you would like docassemble to communicate with the browser using encryption. Read the HTTPS section for more information. Defaults to False. See the use https configuration directive.
  • DAHOSTNAME: Set this to the hostname by which web browsers can find docassemble. This is necessary for HTTPS to function. See the external hostname configuration directive.
  • USELETSENCRYPT: Set this to True if you are using Let’s Encrypt. The default is False. See the use lets encrypt configuration directive.
  • LETSENCRYPTEMAIL: Set this to the e-mail address you use with Let’s Encrypt. See the lets encrypt email configuration directive.
  • LOGSERVER: This is used in the multi-server arrangement where there is a separate server for collecting log messages. The default is none, which causes the server to run Syslog-ng. See the log server configuration directive.
  • REDIS: If you are running docassemble in a multi-server arrangement, set this to the host name at which the Redis server can be accessed. See the redis configuration directive.
  • RABBITMQ: If you are running docassemble in a multi-server arrangement, set this to the host name at which the RabbitMQ server can be accessed. Note that RabbitMQ is very particular about hostnames. If the RabbitMQ server is running on a machine on which the hostname command evaluates to abc, then your application servers will need to set RABBITMQ to abc and nothing else. It is up to you to make sure that abc resolves to an IP address. Note that if you run docassemble using the instructions in the scalability section, you do not need to worry about this. See the rabbitmq configuration directive.
  • URLROOT: If users access docassemble at https://docassemble.example.com, set URLROOT to https://docassemble.example.com. See the url root configuration directive.
  • POSTURLROOT: If users access docassemble at https://docassemble.example.com/da, set URLROOT to /da/. The trailing slash is important. If users access docassemble at https://docassemble.example.com, you can ignore this. The default value is /. See the root configuration directive.
  • BEHINDHTTPSLOADBALANCER: Set this to True if a load balancer is in use and the load balancer accepts connections in HTTPS but forwards them to web servers as HTTP. This lets docassemble know that when it forms URLs, it should use the https scheme even though requests appear to be coming in as HTTP requests. See the behind https load balancer configuration directive.
  • TIMEZONE: You can use this to set the time zone of the server. The value of the variable is stored in /etc/timezone and dpkg-reconfigure -f noninteractive tzdata is run in order to set the system time zone. The default is America/New_York. See the timezone configuration directive.
  • LOCALE: You can use this to enable a locale on the server. When the server starts, the value of LOCALE is appended to /etc/locale.gen and locale-gen and update-locale are run. The default is en_US.UTF-8 UTF-8. See the os locale configuration directive.
  • OTHERLOCALES: You can use this to set up other locales on the system besides the default locale. Set this to a comma separated list of locales. The values need to match entries in Debian’s /etc/locale.gen. See the other os locales configuration directive.
  • PACKAGES: If your interviews use code that depends on certain Debian packages being installed, you can provide a comma-separated list of Debian packages in the PACKAGES environment variable. The packages will be installed when the image is run. See the debian packages configuration directive.
  • DASECRETKEY: The secret key for protecting against cross-site forgery. See the [secret key] configuration directive. If DASECRETKEY is not set, a random secret key will be generated.

Changing the configuration

If you already have an existing docassemble installation and you want to run a new Docker container using it, but you want to change the configuration of the container, there are some things you will need to keep in mind.

The existing configuration file takes precedence over the environment variables that you set using Docker.

If you want to change the configuration, and the server is running, you can edit the configuration using the web interface.

If the server is not running, and you are using persistent volumes, you can use docker volume inspect to find the location of the persistent volume, and find

When docassemble starts up on a Docker container, it:

  • Creates a configuration file from a template, using environment variables for initial values, if a configuration file does not already exist.
  • Initializes a PostgreSQL database, if one is not already initialized.
  • Configures the Apache configuration, if one is not already configured.
  • Runs Let’s Encrypt if the configuration indicates that Let’s Encrypt should be used, and Let’s Encrypt has not yet been configured.

When docassemble stops, it saves the configuration file, a backup of the PostgreSQL database, and backups of the Apache and Let’s Encrypt configuration. If you are using persistent volumes, the information will be stored there. If you are using S3 or Azure blob storage, the information will be stored in the cloud.

When docassemble starts again, it will retrieve the configuration file, the backup of the PostgreSQL database, and backups of the Apache and Let’s Encrypt configuration from storage and use them for the container.

Suppose you have an existing installation that uses HTTPS and Let’s Encrypt, but you want to change the DAHOSTNAME. The Apache configuration files will not be overwritten if they already exist when a new container starts up. So if you had been using Let’s Encrypt, but then you decide to change the DAHOSTNAME, you will need to delete the saved configuration before running a new container. If you are using S3, you can go to the S3 Console and delete the “Apache” folder and the “letsencrypt.tar.gz” file. If you are using Azure blob storage, you can go to the Azure Portal and delete the “Apache” folder and the “letsencrypt.tar.gz” file.

Also, if a configuration file exists on S3/Azure blob storage (config.yml) or in a persistent volume, then the values in that configuration will take precedence over the corresponding environment variables that are passed to Docker. Once a configuration file exists, you should make changes to the configuration file rather than passing environment variables to Docker. However, if your configuration is on S3/Azure blob storage, you will at least need to pass sufficient access keys (e.g., S3BUCKET, AZURECONTAINER, etc.) to access that storage; otherwise your container will not know where to find the configuration.

Also, there are some environment variables that do not exist in the configuration file because they are specific to the individual server being started. These include the CONTAINERROLE and SERVERHOSTNAME environment variables.

Data storage

Docker containers are volatile. They are designed to be run, turned off, and destroyed. When using Docker, the best way to upgrade docassemble to a new version is to destroy and rebuild your containers.

But what about your data? If you run docassemble, you are accumulating valuable data in SQL, in files, and in Redis. If your data are stored on the Docker container, they will be destroyed by docker rm.

There are two ways around this problem. The first, and most preferable solution, is to get an account on Amazon Web Services (AWS) or Microsoft Azure. If you use AWS, create an S3 bucket for your data, and then when you launch your container, set the S3BUCKET and associated environment variables. If you use Microsoft Azure, create an Azure blob storage resource, and a blob storage container within it, and then when you launch your container, set the AZUREACCOUNTNAME, AZUREACCOUNTKEY, and AZURECONTAINER environment variables. When docker stop is run, docassemble will backup the SQL database, the Redis database, the configuration, and your uploaded files to your S3 bucket or blob storage container. Then, when you issue a docker run command with environment variables pointing docassemble to your S3 bucket/Azure blob storage resource, docassemble will make restore from the backup. You can docker rm your container and your data will persist in the cloud.

The second way is to use persistent volumes, which is a feature of Docker. This will store the data in directories on the Docker host, so that when you destroy the container, these directories will be untouched, and when you start up a new container, it will use the saved directories.

These two options are explained in the following subsections.

Using S3

To use S3 for persistent storage, sign up with Amazon Web Services, go to the S3 Console, click “Create Bucket,” and pick a name. If your site is at docassemble.example.com, a good name for the bucket is docassemble-example-com. (Client software will have trouble accessing your bucket if it contains . characters.) Under “Region,” pick the region nearest you.

Then you need to obtain an access key and a secret access key for S3. To obtain these credentials, go to IAM Console and create a user with “programmatic access.” Under “Attach existing policies directly,” find the policy called AmazonS3FullAccess and attach it to the user.

When you run a docassemble Docker container, set the configuration options S3BUCKET, S3ACCESSKEY, and S3SECRETACCESSKEY.

Note that if you run docassemble on EC2, you can launch your EC2 instances with an IAM role that allows docassemble to access to an S3 bucket without the necessity of setting S3ACCESSKEY and S3SECRETACCESSKEY. In this case, the only environment variable you need to pass is S3BUCKET.

These secret access keys will become available to all developers who use your docassemble server, since they are in the configuration file. If you want to limit access to a particular bucket, you do not have to use the AmazonS3FullAccess policy when obtaining S3 credentials. Instead, you can create your own policy with the following definition:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::docassemble-example-com"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::docassemble-example-com/*"
            ]
        }
    ]
}

Replace docassemble-example-com in the above text with the name of your S3 bucket.

Using Microsoft Azure

Using Microsoft Azure is very similar to using S3. From the Azure Portal dashboard, search for “Storage accounts” in the “Resources.” Click “Add” to create a new storage account. Under “Deployment model,” choose “Resource manager.” Under “Account kind,” choose “Blob storage.” Under Performance, choose “Access tier,” you can choose either “Cool” or “Hot,” but you may have to pay more for “Hot.”

Once the storage account is created, go into it and click “+ Container” to add a new container. Set the “Access type” to “Private.” The name of the container corresponds with the AZURECONTAINER environment variable. Back at the storage account, click “Access keys.” The “Storage account name” corresponds with the environment variable AZUREACCOUNTNAME. The “key1” corresponds with the AZUREACCOUNTKEY environment variable. (You can also use “key2.”)

If you enable both S3 and Azure blob storage, only S3 will be used.

Using persistent volumes

To run docassemble in a single-server arrangement in such a way that the configuration, the Playground files, the uploaded files, and other data persist after the Docker container is removed or updated, run the image as follows:

docker run --env-file=env.list \
-v dabackup:/usr/share/docassemble/backup \
-d -p 80:80 -p 443:443 jhpyle/docassemble

where --env-file=env.list is an optional parameter that refers to a file env.list containing environment variables for the configuration. A template for the env.list file is included in distribution.

An advantage of using persistent volumes is that you can completely replace the docassemble container and rebuild it from scratch, and when you run the jhpyle/docassemble image again, docassemble will keep running where it left off.

If you are using HTTPS with your own certificates (as opposed to using Let’s Encrypt), you can use a persistent volume to provide the certificates to the Docker container. Just add -v dacerts:/usr/share/docassemble/certs to your docker run command.

To see what volumes exist on your Docker system, you can run:

docker volume ls

Docker volumes are actual directories on the file system. To find the path of a given volume, use docker volume inspect:

docker volume inspect dabackup

For example, if you are using HTTPS with your own certificates, and you need to update the certificates your server should use, you can find the path where the dacerts volume lives (docker volume inspect dacerts), copy your certificates to that path (cp mycertificate.key /var/lib/docker/volumes/dacerts/data/docassemble.key), and then stop the container (docker stop -t 60 <containerid>) and start it again (docker start <containerid>).

To delete all of the volumes, do:

docker volume rm $(docker volume ls -qf dangling=true)

Ultimately, the better data storage solution is to use cloud storage (S3, Azure blob storage) because:

  1. S3 and Azure blob storage make scaling easier. They are the “cloud” way of storing persistent data, at least until cloud-based network file systems become more robust.
  2. It is easier to upgrade your virtual machines to the latest software and operating system if you can just destroy them and recreate them, rather than running update scripts. If your persistent data is stored in the cloud, you can destroy and recreate virtual machines at will, without ever having to worry about copying your data on and off the machines.

Multi-server arrangement

Services on different machines

The docassemble application consists of several services, some of which are singular and some of which can be plural.

The singular services include:

The (potentially) plural services include:

The docassemble Docker container will run any subset of these six services, depending on the value of the environment variable CONTAINERROLE, which is passed to the container at startup. In a single-server arrangement (CONTAINERROLE = all, or left undefined), the container runs all of the services (except the log message aggregator, which is not necessary in the case of a single-server arrangement).

You can run docassemble in a multi-server arrangement using Docker by running the docassemble image on different hosts using different configuration options.

In a multi-server arrangement, you can have one machine run SQL, another machine run Redis and RabbitMQ, and any number of machines run web servers and Celery nodes. You can decide how to allocate services to different machines. For example, you might want to run central tasks on a powerful server, while running many web servers on less powerful machines.

Since the SQL, Redis, and RabbitMQ services are standard services, they do not have to be run from docassemble Docker containers. For example, if you are already running a SQL server, a Redis server, and a RabbitMQ server, you could just point docassemble to those resources.

To change the SQL server that docassemble uses, edit the DBHOST, DBNAME, DBUSER, DBPASSWORD, DBPREFIX, DBPORT, and DBTABLEPREFIX configuration options.

To change the Redis server that docassemble uses, edit the REDIS configuration option.

To change the RabbitMQ server that docassemble uses, edit the RABBITMQ configuration option.

Port opening

Note that for every service that a Docker container provides, appropriate ports need to be forwarded from the Docker host machine to the container.

For example:

docker run \
-e CONTAINERROLE=sql:redis \
...
-d -p 5432:5432 -p 6379:6379 -p 9001:9001 \
jhpyle/docassemble
docker run \
-e CONTAINERROLE=web:celery \
...
-d -p 80:80 -p 443:443 -p 9001:9001 \
jhpyle/docassemble

Note that Docker will fail if any of these ports is already in use. For example, many Linux distributions run a mail tranport agent on port 25 by default; you will have to stop that service in order to start Docker with -p 25:25. For example, on Amazon Linux you may need to run:

sudo /etc/init.d/sendmail stop

File sharing

If you run multiple docassemble Docker containers on different machines, the containers will need to have a way to share files with one another.

One way to share files among containers is to make /usr/share/docassemble/ a persistent volume on a network file system. This directory contains the configuration, SSL certificates, Python virtual environment, and uploaded files. However, network file systems present problems.

A preferable way to share files is with Amazon S3 or Azure blob storage, which docassemble supports. See the using S3 and using Azure blob storage sections for instructions on setting this up.

Configuration file

Note that when you use the cloud (S3 or Azure blob storage) for data storage, docassemble will copy the config.yml file out of the cloud on startup, and save config.yml to the cloud whenever the configuration is modified.

This means that as long as there is a config.yml file in the cloud with the configuration you want, you can start docassemble containers without specifying a lot of configuration options; you simply have to refer to your cloud storage bucket/container, and docassemble will take it from there. For example, to run a central server, you can do:

docker run \
-e CONTAINERROLE=sql:redis:rabbitmq:log:cron:mail \
-e S3BUCKET=docassemble-example-com \
-e S3ACCESSKEY=FWIEJFIJIDGISEJFWOEF \
-e S3SECRETACCESSKEY=RGERG34eeeg3agwetTR0+wewWAWEFererNRERERG \
-d -p 80:8080 -p 25:25 -p 5432:5432 -p 514:514 \
-p 6379:6379 -p 4369:4369 -p 5671:5671 \
-p 5672:5672 -p 25672:25672 -p 9001:9001 \
jhpyle/docassemble

To run an application server, you can do:

docker run \
-e CONTAINERROLE=web:celery \
-e S3BUCKET=docassemble-example-com \
-e S3ACCESSKEY=FWIEJFIJIDGISEJFWOEF \
-e S3SECRETACCESSKEY=RGERG34eeeg3agwetTR0+wewWAWEFererNRERERG \
-d -p 80:80 -p 443:443 -p 9001:9001 \
jhpyle/docassemble

Encrypting communications

Using HTTPS

If you are running docassemble on EC2, the easiest way to enable HTTPS support is to set up an Application Load Balancer that accepts connections in HTTPS format and forwards them to the web servers in HTTP format. In this configuration Amazon takes care of creating and hosting the necessary SSL certificates.

If you are not using a load balancer, you can use HTTPS either by setting up Let’s Encrypt or by providing your own certificates.

With Let’s Encrypt

If you are running docassemble in a single-server arrangement, or in a multi-server arrangement with only one web server, you can use Let’s Encrypt to enable HTTPS. If you have more than one web server, you can enable encryption without Let’s Encrypt by installing your own certificates.

To use Let’s Encrypt, set the following environment variables in your task definition or env.list file:

  • USELETSENCRYPT: set this to True.
  • LETSENCRYPTEMAIL: Let’s Encrypt requires an e-mail address, which it will use to get in touch with you about renewing the SSL certificates.
  • DAHOSTNAME: set this to the hostname that users will use to get to the web application. Let’s Encrypt needs this in order to verify that you have access to the host.
  • USEHTTPS: set this to True.

For example, your env.list may look like:

CONTAINERROLE=all
USEHTTPS=true
DAHOSTNAME=docassemble.example.com
USELETSENCRYPT=true
[email protected]
TIMEZONE=America/New_York

The first time the server is started, the letsencrypt utility will be run, which will change the Apache configuration in order to use the appropriate SSL certificates. When the server is later restarted, the letsencrypt renew command will be run, which will refresh the certificates if they are within 30 days of expiring.

In addition, a script will run on a weekly basis to attempt to renew the certificates.

If you are using a multi-server arrangement with a single web server, you need to run the cron role on the same server that runs the web role. If you use the e-mail receiving feature with TLS encryption, the mail role also has to share the server with the web and cron roles.

Without Let’s Encrypt

Using your own SSL certificates with Docker requires that your SSL certificates reside within each container. There are several ways to accomplish this:

  • Use S3 or Azure blob storage and upload the certificates to your bucket/container.
  • Build your own private image in which your SSL certificates are placed in Docker/apache.key, Docker/apache.crt, and Docker/apache.ca.pem. During the build process, these files will be copied into /usr/share/docassemble/certs.
  • Use persistent volumes and copy the SSL certificate files (apache.key, apache.crt, and apache.ca.pem) into the volume for /usr/share/docassemble/certs before starting the container.

The default Apache configuration file expects SSL certificates to be located in the following files:

SSLCertificateFile /etc/ssl/docassemble/apache.crt
SSLCertificateKeyFile /etc/ssl/docassemble/apache.key 
SSLCertificateChainFile /etc/ssl/docassemble/apache.ca.pem

The meaning of these files is as follows:

  • apache.crt: this file is generated by your certificate authority when you submit a certificate signing request.
  • apache.key: this file is generated at the time you create your certificate signing request.
  • apache.ca.pem: this file is generated by your certificate authority. It is variously known as the “chain file” or the “root bundle.”

In order to make sure that these files are replicated on every web server, the supervisor will run the docassemble.webapp.install_certs module before starting the web server.

If you are using S3 or Azure blob storage, this module will copy the files from the certs/ prefix in your bucket/container to /etc/ssl/docassemble. You can use the S3 Console or the Azure Portal to create a folder called certs and upload your certificate files into that folder.

If you are not using S3 or Azure blob storage, the docassemble.webapp.install_certs module will copy the files from /usr/share/docassemble/certs to /etc/ssl/docassemble.

There are two ways that you can put your own certificate files into the /usr/share/docassemble/certs directory. The first way is to create your own Docker image of docassemble and put your certificates into the Docker/ssl directory. The contents of this directory are copied into /usr/share/docassemble/certs during the build process.

The second way is to use persistent volumes. If you have a persistent volume called certs for the directory /usr/share/docassemble/certs, then you can run docker volume inspect certs to get the directory on the Docker host corresponding to this directory, and you can copy the SSL certificate files into that directory before starting the container.

Note that the files need to be called apache.crt, apacke.key, and apache.ca.pem, because this is what the standard web server configuration expects.

If you want to use different filesystem or cloud locations, the docassemble.webapp.install_certs can be configured to use different locations. See the configuration variables certs and cert install directory.

Using TLS for incoming e-mail

If you use the e-mail receiving feature, you can use TLS to encrypt incoming e-mail communications. By default, docassemble will install self-signed certificates into the Exim configuration, but for best results you should use certificates that match your incoming mail domain.

If you are using Let’s Encrypt to obtain your HTTPS certificates in a single-server arrangement, then docassemble will use your Let’s Encrypt certificates for Exim.

However, if you are running your mail server as part of a dedicated backend server that does not include web, you will need to create and install your own certificates. In addition, if your incoming mail domain is different from your external hostname (DAHOSTNAME), then you will also need to install your own certificates.

The process of installing your own Exim certificates is very similar to the process of installing HTTPS certificates.

If you are using S3 or Azure blob storage, copy your certificate and private key to the certs folder of your S3 bucket or Azure blob storage container, using the filenames exim.crt and exim.key, respectively.

If you are not using S3 or Azure blob storage, save these files as:

  • /usr/share/docassemble/certs/exim.crt (certificate)
  • /usr/share/docassemble/certs/exim.key (private key)

On startup, docassemble.webapp.install_certs will copy these files into the appropriate location (/etc/exim4) with the appropriate ownership and permissions.

Installing on a machine already using a web server

The simplest way to run docassemble is to give it a dedicated machine and run Docker on ports 80 and 443. However, if you have an existing machine that is already running a web server, it is possible to run docassemble on that machine using Docker. You can configure the web server on that machine to forward traffic to the Docker container.

The following example illustrates how to do this. Your situation will probably be different, but this example will still help you figure out how to configure your system.

In this example, we will show how to run docassemble using Docker on an Ubuntu 16.04 server. The machine will run a web server using encryption. The web server will be accessible at https://justice.example.com and will serve resources other than docassemble. The docassemble resources will be accessible at https://justice.example.com/da. Docker will run on the machine and will listen on port 8080. The web server will accept HTTPS requests at /da and forward them HTTP requests to port 8080. The SSL certificate will be installed on the Ubuntu server, and the Docker container will run an HTTP server. Docker will be controlled by the user account ubuntu, which is assumed to have sudo privileges.

First, let’s install Apache, Let’s Encrypt (the certbot utility), and Docker. We log in to the server as ubuntu and do:

sudo add-apt-repository ppa:certbot/certbot
sudo apt-get -y update
sudo apt-get -y install apache2 python-certbot-apache docker.io 
sudo usermod -a -G docker ubuntu

The last command changes the user privileges of the ubuntu user. For these changes to take effect, we need to log out and log in again. (E.g., exit the ssh session and start a new one.)

Before we run certbot, let’s add some basic information to the Apache configuration. Edit the standard HTTP configuration:

sudo vi /etc/apache2/sites-available/000-default.conf

and add the following lines, replacing any other lines that begin with ServerName or ServerAdmin.

ServerName justice.example.com
ServerAdmin [email protected]

We then do the same with the HTTPS configuration.

sudo vi /etc/apache2/sites-available/default-ssl.conf

Now, let’s enable some necessary components of Apache:

sudo a2ensite default-ssl
sudo a2enmod proxy_http
sudo a2enmod headers
sudo service apache2 reload

Then, we need to make sure that justice.example.com is directed to this server. This may require going to our DNS provider and adding a CNAME record or an A record.

We can test whether everything is working by going to http://justice.example.com in a web browser. We should see the default Apache page.

Once that is done, we are ready to run certbot.

sudo certbot --apache

When certbot asks “Please choose whether HTTPS access is required or optional,” we will select “Secure - Make all requests redirect to secure HTTPS access.” This will cause all HTTP traffic to be forwarded to HTTPS.

The certbot command will obtain certificates and modify the Apache configuration. A cron job will be installed that takes care of renewing the certificates.

We can test whether the SSL certificates work by going to https://justice.example.com. We should see the default Apache page again, but with encryption this time.

Now we are ready to run docassemble. First we need to create a short text file called env.list that contains some configuration options for docassemble.

vi env.list

Set the contents of env.list to:

BEHINDHTTPSLOADBALANCER=true
DAHOSTNAME=justice.example.com
URLROOT=https://justice.example.com
POSTURLROOT=/da/

The POSTURLROOT directive, which is set to /da/, indicates the path after the domain at which docassemble can be accessed. The Apache web server will be able to provide other resources at other paths, but /da/ will be reserved for the exclusive use of docassemble. The beginning slash and the trailing slash are both necessary.

Now, let’s download, install, and run docassemble.

docker run --env-file=env.list -d -p 8080:80 jhpyle/docassemble

The option -p 8080:80 means that port 8080 on the Ubuntu machine will be mapped to port 80 within the Docker container.

Now that docassemble is running, let’s configure Apache on the Ubuntu machine to forward requests to docassemble. Edit the HTTPS configuration file:

sudo vi /etc/apache2/sites-available/default-ssl.conf

Add the following lines within the <VirtualHost> area:

ProxyPass "/da"  "http://localhost:8080/da"
ProxyPassReverse "/da"  "http://localhost:8080/da"
RequestHeader set X-Forwarded-Proto "https"

Then restart the web server:

sudo service apache2 restart

Now, when we go to https://justice.example.com/da, we will see the docassemble demo interview.

Creating your own Docker image

To create your own Docker image, first make sure git is installed:

sudo apt-get -y install git

or

sudo yum -y install git

Then download docassemble:

git clone https://github.com/jhpyle/docassemble

To make changes to the configuration of the docassemble application that will be installed in the image, edit the following files:

To build the image, run:

cd docassemble
docker build -t yourdockerhubusername/mydocassemble .

You can then run your image:

docker run -d -p 80:80 -p 443:443 -p 25:25 -p 9001:9001 yourdockerhubusername/mydocassemble

Or push it to Docker Hub:

docker push yourdockerhubusername/mydocassemble

Upgrading docassemble when using Docker

As new versions of docassemble become available, you can obtain the latest version by running this within the “docassemble” directory:

docker pull jhpyle/docassemble

Then, subsequent docker run commands will use the latest docassemble image.

When you are using Docker to run docassemble, you can upgrade docassemble to the newest version simply by running docker pull jhpyle/docassemble, then running docker stop and docker rm on your docassemble container, followed by docker run. Note, however, that docker rm will delete all of the data on the server unless you are using a data storage system.

Cleaning up after Docker

If you run docker pull to retrieve new versions of docassemble, or you build your own docassemble images more than once, you may find your disk space being used up. The full docassemble image is about 4GB in size, and whenever you run docker pull or build a new image, a new image is created – the old image is not overwritten.

The following three lines will stop all containers, remove all containers, and then remove all of the images that Docker created during the build process.

docker stop $(docker ps -a -q)
docker rm $(docker ps -a -q)
docker rmi $(docker images | grep "^<none>" | awk "{print $3}")

The last line, which deletes images, frees up the most disk space. However, it may be necessary to remove the containers first, as the containers depend on the images.