docassemble can easily be scaled in the cloud. A cluster of web servers can serve responses to client browsers, while communicating with centralized services. The limit to scalability will then be a question of how responsive a single SQL server can be and how responsive a single Redis server can be.
Multi-server configuration on EC2 Container Service
If you want to run docassemble in a scalable multi-server arrangement, there are a variety of ways to do so, but the following is the recommended way to get started. It takes advantage of the many features of Amazon Web Services (AWS).
In this example, one server will perform central functions (SQL, Redis, RabbitMQ, and log message aggregation) and two separate servers will act as application servers. The application servers will act as web servers, receiving requests that are distributed by a load balancer. These servers will also operate as Celery nodes, running background processes for docassemble interviews. All three of the servers will be EC2 virtual machines situated within a private network on AWS called a Virtual Private Cloud (VPC).
Users will point their browsers at a URL like
https://docassemble.example.com, where the DNS entry for
docassemble.example.com is a CNAME that points to an
Application Load Balancer (at an address like
Application Load Balancer will take care of generating and serving
SSL certificates for the encrypted connection.
The Application Load Balancer will convert the HTTPS requests into HTTP requests and send them to the web servers. From the perspective of the web servers, all incoming requests will use the HTTP scheme. But from the perspective of the user’s web browser, the web site uses the HTTPS scheme.
Within the VPC, the web servers will communicate with the central server using a variety of TCP ports.
The three servers will all be part of an Auto Scaling Group. This means that as more people use your site, you can expand beyond three virtual machines to any number of virtual machines. The new virtual machines that are created will act as web servers, and the Application Load Balancer will distribute the traffic among them. If traffic dies down, some of those virtual machines can be turned off.
The “backend” service consists of a single “task,” where the task is
defined as a single Docker “container” running the “image”
jhpyle/docassemble with the environment variable
sql:redis:rabbitmq:log:cron:mail. You will ask for one of these services
to run (i.e. the “desired count” of this service is set to 1).
The “app” service consists of a single “task,” where the task is
defined as a single Docker “container” running the “image”
jhpyle/docassemble with the environment variable
web:celery. You will ask for two of these services to run
(i.e. the “desired count” of this service is set to 2). (You will be
able to change this count in the future.)
The result of this configuration will be that three EC2 virtual machines will exist, all of which will be running Docker. Each virtual machine will “run” the same Docker container, except that the “backend” instance will run with different environment variables than the “app” instances.
To shutdown your docassemble setup, you would first edit the “app” service and set the “count” to zero. Then, after a few minutes, the Docker containers will stop. Once the “app” services are no longer running, you will then do the same with the “backend” service. Then, if you want to turn off your EC2 virtual machines, you would edit the Auto Scaling Group and set the desired number of instances to zero.
You can then restart your docassemble system and it will pick up exactly where it left off. This is because docassemble will back up SQL, Redis, and other information to S3 (or Azure blob storage if you prefer) when the containers shut down, and restore from the backups when they start up again. To restart, you would edit the Auto Scaling Group to set the desired number of instances to 3. When they are up and running, you would then update the “backend” service and set the “desired count” to 1. Once that service is up and running, you would update the “app” service and set the “desired count” to 2.
The following instructions will guide you through the process of using AWS to set up a docassemble installation. AWS itself is beyond the scope of this documentation. If you have never used AWS before, you are encouraged to consult the AWS documentation.
These instructions assume that you own a domain name and can add
DNS entries for the domain. (If you do not have a domain name,
you can purchase one from a site like GoDaddy or Google Domains.)
In this example, we will use
docassemble.example.com as the example
hostname, but you will need to replace this with whatever hostname you
are going to use.
First, sign up for an Amazon Web Services account.
Log in to the AWS Console. If you have not used ECS before,
navigate to the service called “EC2 Container Service” and click “Get
Started” to follow the steps of the introductory “wizard.” Check the
option for “deploy a sample application onto an Amazon ECS Cluster.”
Follow the default selections. The wizard will create a cluster
called “default,” an IAM role called
ecsInstanceRole, and several
resources, including a CloudFormation stack, an Internet Gateway,
VPC, a Route Table, a VPC Attached Gateway, two subnets, a
Security Group for an Elastic Load Balancer, a Security Group
for ECS instances, a Launch Configuration, and an
Auto Scaling Group.
The resources the wizard creates are helpful for educational purposes, but in the long term they are not needed and should be deleted. The running EC2 instance may also cost you money. To delete the unnecessary resources:
- Go to the ECS Console, find the service “sample-webapp,” update it, and set the “Number of tasks” to 0. Then you can delete “sample-webapp.” Also go into “Task Definitions” and delete the one Task Definition there.
- Go to the EC2 Console, go into the “Instances” section, and change the “Instance State” of the one running instance to “Terminate.”
- Then go into the “Auto Scaling Groups” section and delete the one Auto Scaling Group.
- Then go into the “Launch Configurations” section and delete the one Launch Configuration.
- Then go to the VPC Console, select the VPC for which “Default VPC” is “No,” and delete it. (Be careful not to delete the default VPC!)
Next, we need to create an S3 “bucket” in which
your docassemble system can store files, backups, and other
things. To do this, go to the S3 Console, click “Create Bucket,”
and pick a name. If your site is at docassemble.example.com, a good
name for the bucket is
docassemble-example-com. (Client software
will have trouble accessing your bucket if it contains
characters.) Under “Region,” pick the region nearest you.
Next, we need to create an IAM role for the EC2 instances that will run docassemble. This role will empower the server to operate on ECS and to access the S3 “bucket” you created. To do this, go to the IAM Console and go to “Roles” -> “Create New Role.” Call your new role “docassembleInstanceRole” or some other name of your choosing. Under “AWS Service Roles,” select “Amazon EC2 Role for EC2 Container Service.” On the “Attach Policy” screen, check the checkbox next to “AmazonEC2ContainerServiceforEC2Role,” and continue on to create the role.
Once the role is created, go into the “Inline Policies” area of the
role and create a new inline policy. Select “Custom Policy,” enter a
“Policy Name” like “S3DocassembleExampleCom,” and set the “Policy
Document” to the following (substituting the name of the bucket you
created in place of
docassemble-example-com in the two places it
(Note that instead of creating an inline policy that limits access to a particular bucket, you could attach the “AmazonS3FullAccess” policy to the role and give EC2 instances full access to all of your account’s S3 buckets.)
This inline policy allows any application running on a machine given
docassembleInstanceRole role to access your S3 bucket. This
saves you from having to include long and complicated secret keys in
the environment variables you pass to docassemble Docker
containers. All of the EC2 instances you create in later steps will
be given the
docassembleInstanceRole IAM role.
Go to VPC Console and inspect the description of the default VPC.
(If you have more than one VPC, look for the VPC where “Default
VPC” is “Yes.”) Make a note of the VPC CIDR address for the VPC,
which will be something like
Go to the EC2 Console and set up a new Security Group called
docassembleSg with two rules. One rule should allow traffic of
“Type” SSH from “Source” Anywhere. The other rule should allow “All
traffic” from “Source” Custom IP, where the address is the CIDR
address you noted. This will provide the “firewall rules” for your
servers so that you can connect to them via SSH and so that they can
communicate among each other.
Go to the “Launch Configuration” section of the EC2 Console and
create a new Launch Configuration called
docassembleLc. When it
asks what AMI you want to use, go to “Community AMIs,” search for
the keyword “ecs-optimized” and pick the most recent AMI that comes
up. The AMIs will not be listed in any useful order, so you have to
look carefully at the names, which contain the dates the AMIs were
created. There is also a page on Amazon’s web site that lists the
ECS-optimized AMIs by region. As of this writing, the most recent
ECS-optimzed AMI is “amzn-ami-2017.09.c-amazon-ecs-optimized.” If
you use a different AMI, make sure that you give the machine at least
20GB of storage. The ECS-optimized AMIs provide 30GB of storage in
total, which should be sufficient.
In the “Choose Instance Type” section, select an instance type that has at least 2GB of RAM (e.g., t2.small).
In the “Configure details” section, set the “IAM role” of the
Launch Configuration to
docassembleInstanceRole, the IAM role
you created earlier. open the “Advanced Details” and add the
following as “User data”:
In the “Configure Security Group” section, set the security group to
docassembleSg, the Security Group you created earlier.
Then go to the “Auto Scaling Groups” section of the EC2 Console and
create a new Auto Scaling Group called
docassembleAsg. Connect it
docassembleLc, the Launch Configuration you just created.
Use a fixed number of instances without scaling policies that respond
to CloudWatch alarms. Set the desired number of instances to 3.
Once the Auto Scaling Group is saved, AWS should start running
three EC2 instances. Since you chose an ECS-optimized AMI, the
instances should automatically register with your ECS cluster.
The next step is to create an Application Load Balancer. The load balancer will accept HTTPS requests from the outside world and forward them to the application servers as HTTP requests. It will forward two types of requests: regular HTTP requests and WebSocket requests. (The live help features of docassemble use the WebSocket protocol for real time communication between operators and users.) The Application Load Balancer needs to treat WebSocket requests differently. While regular HTTP requests can be forwarded randomly to any application server, WebSocket requests for a given session need to be forwarded to the same server every time.
To set up the Application Load Balancer, first you need to create a
Security Group that will allow communication from the outside world
through HTTPS. Go to the “Security Groups” section of the
EC2 Console and create a new Security Group. Set the “Security
group name” to
docassembleLbSg and set the “Description” to
“docassemble load balancer security group.” Attach it to your default
VPC. Add two “Inbound” rules to allow HTTP and HTTPS traffic from
anywhere. For the first rule, set “Type” to HTTPS and “Source” to
“Anywhere.” For the second rule, set “Type” to HTTP and “Source” to
Then go to the “Load Balancing” -> “Target Groups” section of the
EC2 Console and create a new “Target Group.” Set the “Target group
web, set the “Protocol” to HTTP, set the “Port” to 80, and
set the “VPC” to your default VPC. Under “Health check settings,”
set the “Protocol” to HTTP and the “Path” to
“health check” is the load balancer’s way of telling whether a web
server is healthy. The path
/health_check on a docassemble web
server is a page that responds with a simple “OK.” (All the load
balancer cares about is whether the page returns an
HTTP success code or not.) Under “Advanced health check settings,”
set the “Healthy threshold” to 10, “Unhealthy threshold” to 2,
“Timeout” to 10 seconds, “Interval” to 120 seconds, and keep other
settings at their defaults. Then click “Create.”
Once that “Target Group” is created, create a second “Target Group”
websocket with the same settings. Then, once the
“Target Group” is created, do Actions -> Edit Attributes on it, and
under “Stickiness,” select “Enable load balancer generated cookie
stickiness.” Keep other settings at their defaults. Then click
Finally, create a third “Target Group” called
the “Protocol” to HTTP, set the “Port” to 8081, and set the “VPC” to
your default VPC. Under “Health check settings,” set the “Protocol”
to HTTP and the “Path” to
/health_check. Under “Advanced health
check settings,” change “Success codes” from 200 to 301. This is
because all requests to this target group should respond with an
HTTP redirect response, the code for which is 301. Then click
The purpose of the
http-redirect target group is very limited: it
will forward any HTTP requests to a special port on your
docassemble web server, which will return a special “redirect”
response that causes the user’s browser to be redirected back to the
HTTPS listener on your load balancer.
Then go to the “Load Balancers” section of the EC2 Console and create a “Load Balancer.” Select “Application Load Balancer” as the type of load balancer.
On the “Configure Load Balancer” page, set the name to
docassembleLb. Under “Listeners,” keep the HTTP listener listening
to port 80 and click “Add listener” to add a second listener. Set the
“Load Balancer Protocol” to HTTPS and set the “Load Balancer Port” to
Under “Availability Zones,” make sure your default VPC is selected. Then select all of the “available subnets” by clicking the plus buttons next to each one. (If it gives you any trouble about adding subnets, just add as many subnets as it will let you add.)
On the “Configure Security Settings” page, it will ask about SSL certificates and security policies. If you have never been here before, accept all of the defaults. This should result in Amazon creating an SSL certificate for you. If you have created certificates in the past, either select the certificate you want to use, or click “Request a new certificate from ACM.”
On the “Configure Routing” page, set “Target group” to “Existing
target group” and select
http-redirect as the “Name” of the Target
Group. The “Protocol” should be HTTP and the Port should be 8081.
Skip past the “Register Targets” page. On the “Review” page, click “Create” to create the Load Balancer.
Note that load balancers, like EC2 instances, will cost you money even you are not using them. The cost is $5 per month or more. So if you are not using your load balancer, delete it. You do not need to delete the other resources you created (e.g., target groups, security groups) because they do not cost money to maintain.
docassembleLb load balancer is created, you need to make a
few manual changes to it.
In the “Load Balancers” section, select the
balancer, and open the “Listeners” tab. Click “Edit” next to the HTTP
listener. Set the “Default target group” to “http-redirect,” if it is
not selected already.
Once those changes are saved, edit the “rules” for the HTTPS listener.
There will be one default rule set up, and it will incorrectly say
that requests should be routed to the
http-redirect target group. This
is the proper setting for HTTP (port 80), but not for HTTPS (port
443), so you need to change it. Click the button to edit rules, and
edit the default rule. Change it so that it forwards to the
target group. Then press “Update” to save your changes.
We need to make one more change on this screen. Click the button to
add a new rule. Make it the first rule in the list of rules.
Construct the new rule so that it says, in effect, “if the path is
/ws/*, forward the request to the
websocket target group.” Then
click “Save.” Your rules should look like this (keeping in mind that
AWS might have changed its user interface since the time this
documentation was written).
docassembleLb load balancer will listen to port 443 and act
on requests according to two “Rules.” The first rule says that if the
path of the HTTPS request starts with
/ws/, which is
docassemble’s path for WebSocket communication, then the request
will be forwarded using the
websocket “Target Group,” which has the
“stickiness” feature enabled. The second rule says that all other
traffic will use the
web “Target Group,” for which “stickiness” is
not enabled. Your
docassembleLb load balancer will also listen to
port 80 and forward all those requests to port 8081 on your web
servers, which will respond by redirecting the user to port 443 of
your load balancer.
Finally, go to the “Description” tab for the
balancer and make note of the “DNS name.” It will be something like
Edit the DNS configuration of your domain and create a CNAME that
maps the DNS name that your users will use (e.g.,
docassemble.example.com) to the DNS name of the load balancer
will allow your users to connect to the load balancer.
Then go to the ECS Console.
CONTAINERROLE variable indicates that this server will serve the
functions of SQL, Redis, RabbitMQ, and log message aggregation.
DAHOSTNAME variable indicates the hostname at which the user
will access the docassemble application. The
instructs docassemble that it will be running on EC2, which
means that the server needs to use a special method of obtaining its
own fully-qualified domain name.
Then, check the ECS Console and look at the
There should be three container instances available. (If not,
something is wrong with the Auto Scaling Group you created, or
perhaps the instances are still starting up.)
Then, create a Service called
backend that uses the task definition
you just created. Set the number of tasks to 1. Do not choose an
Elastic Load Balancer. This will cause the docassemble Docker
image to be installed on one of the container instances. It will
take some time for the image to download and start. In the meantime,
the ECS Console should show one “pending task.”
Create a Service called
app that uses the task definition
Set the “Number of tasks” to 2. Under “Elastic load balancing,” click
the “Configure ELB” button. Set “ELB type” to “Elastic Load
Balancer.” The “IAM role for service” should be
IAM role that ECS creates for you automatically. Set “ELB Name”
docassembleLb, the name of the Application Load Balancer you
created earlier. Under “Container to load balance,” select
“app:80:80” and click “Add to ELB.” Set the “Target group name” to
web, the “Target Group” you created earlier. Then click “Save.”
Note that the
app task definition is much briefer than the
task definition. This is because the
backend service will save a
configuration file in the S3 bucket. All that the
needs to do is retrieve that configuration file. Note also that it is
not necessary to include any secret keys in the JSON configuration.
This is because the Launch Configuration of your virtual machines
includes the “IAM Role” of
docassembleInstanceRole; the virtual
machines themselves are authorized to access the S3 bucket.
Just one more thing needs to be done to make the docassemble
server fully functional: you need to associate the
http-redirect “Target Groups” with the same EC2 instances that are
associated with the
web “Target Group.” (Unfortunately, this is not
something that the ECS system can do automatically yet.) To fix
this, go to the EC2 Console, go to the “Target Groups” section,
web Target Group, go to the “Targets” tab, and note the
Instance IDs of the “Registered Instances.” Now de-select
websocket, and click the “Edit” button within the “Targets”
tab. On the “Register and deregister instances” page that appears,
select the instances you just noted and click the “Add to registered”
button. Then do the same with
To shut down a multi-server docassemble configuration:
#. Go to the ECS Console. Go into the
default cluster, and update
the “app” ECS service. Set “Number of tasks” to zero. This will
gracefully stop the Docker containers that serve in the role of
#. Wait until the tasks have stopped.
#. Update the “backend” ECS service. Set “Number of tasks” to zero.
This will stop the Docker container that provides centralized
#. Wait until the task has stopped.
#. The docassemble application is now fully shut down. However,
you are still spending money because there are EC2 instances
running. To shut down these instances, go to the EC2 Console and
edit the Auto Scaling Group you created (
the desired number of instances to 0. This will cause the three
instances you created to shut down.
#. Now you are no longer paying money to keep instances running, but
you are still paying to keep the load balancer alive (approximately
$5 per month). To avoid this cost, go to the “Load Balancers”
section of the EC2 Console and delete the
#. If you do not care about retaining the database or configuration of
the docassemble application you just shut down, go to the
S3 Console and delete the bucket you created. Note that the cost
of maintaining data in S3 is minimal as long as the volume of
data is low. The data that docassemble stores in S3 will
only be significant if you have a lot of uploaded files or a lot of
static files in your Playground.
After you have shut down these resources, many other AWS resources that you created will still exist, but you do not need to delete them to avoid incurring costs. “Target groups” and “Launch Configurations,” for example, are just configuration data; AWS does not need to reserve computing power or IP addresses to maintain them in your account.
Controlling AWS from the command line
In the docassemble GitHub repository, there is a command-line
da-cli (which is short for “docassemble command
line interface”) that you can use to manage your multi-server
configuration on AWS.
Here are some ways that the command-line application can be used:
./da-cli start_up 2- bring up two
appservices and one
backendservice after bringing up three EC2 instances with the
docassembleAsgAuto Scaling Group. It also registers the appropriate instances for the
./da-cli shut_down- bring the count of the
backendservices down to zero, then bring the
docassembleAsgAuto Scaling Group count down to zero.
./da-cli shutdown_unused_ec2_instances- finds EC2 instances in the
docassembleAsgAuto Scaling Group that are not being used to host a service, and terminates them.
./da-cli update_ec2_instances 4- increases the
docassembleAsgAuto Scaling Group count to 4 and waits for the instances to become available.
./da-cli connect_string app- prints a command that you can use to SSH to each EC2 instance running the service
./da-cli fix_web_sockets- registers target instances for the
./da-cli update_desired_count app 4- increases the desired count for the
appservice to 4.
If your AWS resources go by names different from those in the above
setup instructions, you can easily edit the
da-cli script, which is
a simple Python module. You might also want to extend the
functionality of the script; anything that can be done in the AWS
web interface can be done using the boto3 library.
Single-server configuration on EC2 Container Service
If you want, you can use EC2 to run docassemble in a single-server arrangement. (It is not particularly “scalable” to do so, but you might find it convenient.)
To do so, follow the instructions above for the multi-server configuration, but skip the creation of the S3 bucket, IAM role, Application Load Balancer, Target Groups, Auto Scaling Group, Task Definitions, and the ECS services.
Instead, just start an EC2 instance using an ECS-optimized AMI with a
Security Group that allows incoming connections over HTTP (port
80). Then go to the ECS Console and set up a “service” called
docassemble (or whatever name you want to give it) that runs a
single “task” with a Task Definition like the following.
To start up the server, you would configure the ECS service to start one task. To shut down the server, you would reduce the number of tasks to 0.
A more complicated example of a configuration is the following, which assumes that you have a domain name that is mapped to your EC2 instance, that you want to use HTTPS, and you have set up the S3 bucket and IAM role so that your server can use S3 persistent storage.
For more information about configuring your server, see the section on Docker.
How it works
Each docassemble application server reads its configuration from
/usr/share/docassemble/config/config.yml. The default configuration for
connecting to the central SQL database is:
This will cause docassemble to connect to PostgreSQL on the
local machine as the user “docassemble” and open the database
“docassemble.” This configuration can be modified to connect to a
remote server. If you use a remote SQL server that is not a Docker
container running the
sql, then you need to make
sure that you create the user and database first, and give permission
to the user to create tables in the database.
By default, the Redis server is assumed to be found at
redis://localhost and the RabbitMQ server at
pyamqp://[email protected]//, where
the value of
socket.gethostname(). However, you can modify the
locations of the Redis server and RabbitMQ server in the
configuration using the
Log file aggregation
In a multi-server configuration, log files can be centralized and
aggregated by using Syslog-ng on the application servers to forward
local log files to port 514 on a central server, which in turn runs
Syslog-ng to listen to port 514 and write what it hears to files in
/usr/share/docassemble/log. The hostname of this central server is
set using the
log server directive in the configuration:
If this directive is set, docassemble will write its own log
messages to port 514 on the central server rather than appending to
log server is set, the “Logs” page of the web interface will
call http://log.example.local:8080/ to get a list of available log
files, and will retrieve the content of files by accessing URLs like
The following files make this possible:
Docker/cgi-bin/index.sh- CGI script that should be copied to
Docker/config/docassemble-log.conf.dist- template for an Apache site configuration file that connects port 8080 to the
index.shCGI script above.
Docker/docassemble-syslog-ng.conf- on application servers, this is copied into
/etc/syslog-ng/conf.d/docassemble. It depends on an environment variable
LOGSERVERwhich should be set to the hostname of the central log server (e.g.,
log.example.local). It causes log messages to be forwarded to port 514 on the central log server. Note that Syslog-ng needs to run in an environment where the
LOGSERVERenvironment variable is defined, or the file needs to be edited to include the hostname explicitly.
Docker/syslog-ng.conf- on the central log server, this is copied to
/etc/syslog-ng/syslog-ng.conf. It causes Syslog-ng to listen to port 514 and copy messages to files in
Auto-discovery of services
If any of the following configuration directives are
undefined, and S3/Azure blob storage is enabled, then a
docassemble application server will try to “autodiscover” the
hostname of the service.
If a key is defined, docassemble will assume that the value is a
hostname, and will set the corresponding configuration variable
appropriately (e.g., by adding a
redis:// prefix to the Redis
The Docker initialization script runs the
docassemble.webapp.cloud_register module, which writes the
hostname to the appropriate S3 keys/Azure blob storage objects
depending on the value of the environment variable
Configuring a cluster of docassemble servers requires centralizing
the location of uploaded files, by using an Amazon S3 bucket (the
s3 configuration setting), using an Azure blob storage container
azure configuration setting), or by making the uploaded file
directory a network drive mount (the
The default location of uploaded user files is defined by the
uploads configuration setting:
The web server will restart, and re-read its Python source code, if
the modification time on the WSGI file,
/usr/share/docassemble/webapp/docassemble.wsgi, is changed. The
path of the WSGI file is defined in the Apache configuration and in
the docassemble configuration file:
EC2 hostname discovery
If you are using Amazon EC2, set the following in the configuration:
This can also be set with the
EC2 environment variable if
This setting is necessary so that the server can correctly identify its own hostname.
To start a postgresql server in an Amazon Linux instance:
The “createuser” line will ask for a password; enter something like
Now that the server is running, you can set up web servers to communicate with the server by setting something like the following in the configuration:
If you installed the web servers before installing the SQL server,
note that the
create_tables module will need to be run in order to
create the database tables that docassemble expects.
If you are launching docassemble within EC2, note that Amazon does not allow e-mail to be sent directly from SMTP servers operating within an EC2 instance unless you obtain special permission. Therefore, you may wish to use an SMTP server on the internet with which you can connect through SMTP Authentication.
For more information on setting up e-mail sending, see the installation section.
Using S3 without passing access keys in the configuration
If you are running docassemble on an EC2 instance, or on a
Docker container within an EC2 instance, you can set up the
instance with an IAM role that allows access S3 without supplying
credentials. In this case, the configuration in docassemble does
not include an
access_key_id or a
Instructions on how to set this up are available above.
With one central server and two application servers, all three of which are t2.small instances running on Amazon Web Services, docassemble can handle 100 concurrent requests without errors or significant slowdown:
You can expect each application server to maintain 11 concurrent connections to PostgreSQL:
- 5 connections for the web server
- 5 connections for the WebSocket server
- 1 connnection for the Celery server
This assumes that the server has one CPU. The Celery server will start one process per CPU. So if the server has two CPUs, the total number of PostgreSQL connections will be 12.