Installing CKAN with Docker Compose¶
This chapter is a tutorial on how to install the latest CKAN (master or any stable version) with Docker Compose. The scenario shown here is one of many possible scenarios and environments in which CKAN can be used with Docker.
This chapter aims to provide a simple, yet fully customizable deployment - easier to configure than a source install, more customizable than a package install.
The discussed setup can be useful as a development / staging environment; additional care has to be taken to use this setup in production.
Some design decisions are opinionated (see notes), which does not mean that the alternatives are any worse. Some decisions may or may not be suitable for production scenarios, e.g. the use of CKAN master. Notably, this tutorial does not use Docker Swarm; additional steps may need to be taken to adapt the setup to use Docker Swarm.
This tutorial was tested on Ubuntu 14.04 LTS and Ubuntu 16.04 LTS, respectively. The hosts can be local environments or cloud VMs. It is assumed that the user has direct access (via terminal / ssh) to the systems and root permissions.
Using a cloud based VM, external storage volumes are cheaper than VMs and easy to backup.
In our use case, we use a cloud based VM with 16 GB storage, have mounted a 100 GB btrfs-formatted
external storage volume, and symlinked
/var/lib/docker to the external volume.
This allows us to store the bulky and/or precious cargo – Docker images, Docker data volumes
containing the CKAN databases, filestore, and config – on a cheaper service.
On the other hand, a snapshotting filesystem like btrfs is ideal for rolling backups.
The same cost consideration might apply to other cloud-based providers.
This setup stores data in named volumes, mapped to folder locations which can be
networked or local storage.
An alternative would be to re-write
docker-compose.yml to map local storage
directly, bypassing named volumes. Both solutions will save data to a specified location.
Further reading: Docker Volumes.
Docker is installed system-wide following the official Docker CE installation guidelines.
To verify a successful Docker installation, run
docker run hello-world.
docker version should output versions for client and server.
- Docker Compose
Docker Compose is installed system-wide following the official Docker Compose installation guidelines. Alternatively, Docker Compose can be installed inside a virtualenv, which would be entirely separate from the virtualenv used inside the CKAN container, and would need to be activated before running Docker Compose commands.
To verify a successful Docker Compose installation, run
- CKAN source
Clone CKAN into a directory of your choice:
cd /path/to/my/projects git clone https://github.com/ckan/ckan.git
This will use the latest CKAN master, which may not be stable enough for production use. To use a stable version, checkout the respective tag, e.g.:
git checkout tags/ckan-2.8.0
2. Build Docker images¶
In this step we will build the Docker images and create Docker data volumes with user-defined, sensitive settings (e.g. database passwords).
- Sensitive settings and environment variables
contrib/docker/.env and follow instructions
within to set passwords and other sensitive or user-defined variables.
The defaults will work fine in a development environment on Linux. For Windows and OSX, the CKAN_SITE_URL must be updated.
- Build the images
Inside the CKAN directory:
cd contrib/docker docker-compose up -d --build
For the remainder of this chapter, we assume that
docker-compose commands are all run inside
.env are located.
On first runs, the postgres container could need longer to initialize the database cluster than the ckan container will wait for. This time span depends heavily on available system resources. If the CKAN logs show problems connecting to the database, restart the ckan container a few times:
docker-compose restart ckan docker ps | grep ckan docker-compose logs -f ckan
Earlier versions of
ckan-entrypoint.sh used to wait and ping the db container
using detailed db credentials (host, port, user, password).
While this delay sometimes worked, it also obfuscated other possible problems.
However, the db cluster needs to initialize only once, and starts up quickly in subsequent
runs. This setup chose the very opinionated option of doing away with the delay altogether in
favour of failing early.
After this step, CKAN should be running at
There should be five containers running (
ckan: CKAN with standard extensions
db: CKAN’s database, later also running CKAN’s datastore database
redis: A pre-built Redis image.
solr: A pre-built SolR image set up for CKAN.
datapusher: A pre-built CKAN Datapusher image.
There should be four named Docker volumes (
docker volume ls | grep docker). They will be
prefixed with the Docker Compose project name (default:
docker or value of host environment
docker_ckan_config: home of production.ini
docker_ckan_home: home of ckan venv and source, later also additional CKAN extensions
docker_ckan_storage: home of CKAN’s filestore (resource files)
docker_pg_data: home of the database files for CKAN’s default and datastore databases
The location of these named volumes needs to be backed up in a production environment. To migrate CKAN data between different hosts, simply transfer the content of the named volumes. A detailed use case of data transfer will be discussed in step 5.
- Convenience: paths to named volumes
The files inside named volumes reside on a long-ish path on the host.
Purely for convenience, we’ll define environment variables for these paths.
We’ll use a prefix
VOL_ to avoid overriding variables in
# Find the path to a named volume docker volume inspect docker_ckan_home | jq -c '. | .Mountpoint' # "/var/lib/docker/volumes/docker_ckan_config/_data" export VOL_CKAN_HOME=`docker volume inspect docker_ckan_home | jq -r -c '. | .Mountpoint'` echo $VOL_CKAN_HOME export VOL_CKAN_CONFIG=`docker volume inspect docker_ckan_config | jq -r -c '. | .Mountpoint'` echo $VOL_CKAN_CONFIG export VOL_CKAN_STORAGE=`docker volume inspect docker_ckan_storage | jq -r -c '. | .Mountpoint'` echo $VOL_CKAN_STORAGE
We won’t need to access files inside
docker_pg_data directly, so we’ll skip creating the shortcut.
As shown further below, we can use
psql from inside the
ckan container to run commands
against the database and import / export files from
3. Datastore and datapusher¶
To enable the datastore, the datastore database and database users have to be created before
enabling the datastore and datapusher settings in the
- Create and configure datastore database
With running CKAN containers, execute the built-in setup scripts against the
docker exec -it db psql -U ckan -f 00_create_datastore.sh docker exec ckan /usr/local/bin/ckan-paster --plugin=ckan datastore set-permissions -c /etc/ckan/production.ini | docker exec -i db psql -U ckan
The first script will create the datastore database and the datastore readonly user in the
container. The second script is the output of
paster ckan set-permissions - however,
as this output can change in future versions of CKAN, we set the permissions directly.
The effect of these scripts is persisted in the named volume
We re-use the already privileged default user of the CKAN database as read/write user
for the datastore. The database user (
ckan) is hard-coded, the password is supplied through
A new user
datastore_ro is created (and also hard-coded) as readonly user with password
Hard-coding the database table and usernames allows to prepare the set-permissions SQL script,
while not exposing sensitive information to the world outside the Docker host environment.
After this step, the datastore database is ready to be enabled in the
- Enable datastore and datapusher in
production.ini (note: requires sudo):
sudo vim $VOL_CKAN_CONFIG/production.ini
datastore datapusher to
ckan.plugins and enable the datapusher option
The remaining settings required for datastore and datapusher are already taken care of:
/var/lib/ckan) is hard-coded in
Dockerfile. This path is hard-coded as it remains internal to the containers, and changing it would have no effect on the host system.
ckan.datastore.write_url = postgresql://ckan:POSTGRES_PASSWORD@db/datastoreand
ckan.datastore.read_url = postgresql://datastore:DATASTORE_READONLY_PASSWORD@db/datastoreare provided by
ckan container to apply changes to the
docker-compose restart ckan
Now the datastore API should return content when visiting:
4. Create CKAN admin user¶
With all images up and running, create the CKAN admin user (johndoe in this example):
docker exec -it ckan /usr/local/bin/ckan-paster --plugin=ckan sysadmin -c /etc/ckan/production.ini add johndoe
Now you should be able to login to the new, empty CKAN. The admin user’s API key will be instrumental in tranferring data from other instances.
5. Migrate data¶
This section illustrates the data migration from an existing CKAN instance
into our new Docker Compose CKAN instance assuming direct (ssh) access to
- Transfer resource files
Assuming the CKAN storage directory on
SOURCE_CKAN is located at
resource files and uploaded images in
storage), we’ll simply
SOURCE_CKAN’s storage directory into the named volume
sudo rsync -Pavvr [email protected]_CKAN:/path/to/files/ $VOL_CKAN_STORAGE
- Transfer users
Users could be exported using the python package
ckanapi, but their password hashes will be
excluded. To transfer users preserving their passwords, we need to dump and restore the
On source CKAN host with access to source db
ckan_default, export the
pg_dump -h CKAN_DBHOST -P CKAN_DBPORT -U CKAN_DBUSER -a -O -t user -f user.sql ckan_default
On the target host, make
user.sql accessible to the source CKAN container.
Transfer user.sql into the named volume
chown it to the docker user:
rsync -Pavvr [email protected]:/path/to/user.sql $VOL_CKAN_HOME/venv/src # $VOL_CKAN_HOME is owned by the user "ckan" (UID 900) as created in the CKAN Dockerfile sudo ls -l $VOL_CKAN_HOME # drwxr-xr-x 1 900 900 62 Jul 17 16:13 venv # Chown user.sql to the owner of $CKAN_HOME (ckan, UID 900) sudo chown 900:900 $VOL_CKAN_HOME/venv/src/user.sql
Now the file
user.sql is accessible from within the
docker exec -it ckan /bin/bash -c "export TERM=xterm; exec bash" [email protected]:/$ psql -U ckan -h db -f $CKAN_VENV/src/user.sql
- Export and upload groups, orgs, datasets
Using the python package
ckanapi we will dump orgs, groups and datasets from the source CKAN
instance, then use
ckanapi to load the exported data into the target instance.
The datapusher will automatically ingest CSV resources into the datastore.
- Rebuild search index
Trigger a Solr index rebuild:
docker exec -it ckan /usr/local/bin/ckan-paster --plugin=ckan search-index rebuild -c /etc/ckan/production.ini
6. Add extensions¶
There are two scenarios to add extensions:
- Maintainers of production instances need extensions to be part of the
ckanimage and an easy way to enable them in the
production.ini. Automating the installation of existing extensions (without needing to change their source) requires customizing CKAN’s
Dockerfileand scripted post-processing of the
- Developers need to read, modify and use version control on the extensions’ source. This adds additional steps to the maintainers’ workflow.
For maintainers, the process is in summary:
- Run a bash shell inside the running
ckancontainer, download and install extension. Alternatively, add a
pip installstep for the extension into a custom CKAN Dockerfile.
ckanservice, read logs.
- Download and install extension from inside
The process is very similar to installing extensions in a source install. The only difference is that the installation steps will happen inside the running container, and they will use the virtualenv created inside the ckan image by CKAN’s Dockerfile.
The downloaded and installed files will be persisted in the named volume
# Enter the running ckan container: docker exec -it ckan /bin/bash -c "export TERM=xterm; exec bash" # Inside the running container, activate the virtualenv source $CKAN_VENV/bin/activate && cd $CKAN_VENV/src/ # Option 1: From source git clone https://github.com/ckan/ckanext-geoview.git cd ckanext-geoview pip install -r pip-requirements.txt python setup.py install python setup.py develop cd .. # Option 2: Pip install from GitHub pip install -e "git+https://github.com/ckan/ckanext-showcase.git#egg=ckanext-showcase" # Option 3: Pip install from PyPi pip install ckanext-envvars # exit the ckan container: exit
Some extensions require database upgrades, often through paster scripts. E.g., ckanext-spatial:
# Enter the running ckan container: docker exec -it ckan /bin/bash -c "export TERM=xterm; exec bash" # Inside the running ckan container source $CKAN_VENV/bin/activate && cd $CKAN_VENV/src/ git clone https://github.com/ckan/ckanext-spatial.git cd ckanext-spatial pip install -r pip-requirements.txt python setup.py install && python setup.py develop exit # On the host docker exec -it db psql -U ckan -f 20_postgis_permissions.sql docker exec -it ckan /usr/local/bin/ckan-paster --plugin=ckanext-spatial spatial initdb -c /etc/ckan/production.ini sudo vim $VOL_CKAN_CONFIG/production.ini # Inside production.ini, add to [plugins]: spatial_metadata spatial_query ckanext.spatial.search_backend = solr
- Modify CKAN config
Follow the respective extension’s instructions to set CKAN config variables:
sudo vim $VOL_CKAN_CONFIG/production.ini
Demonstrate how to set
production.ini settings from environment variables using
- Reload and debug
docker-compose restart ckan docker-compose logs ckan
- Develop extensions: modify source, install, use version control
While maintainers will prefer to use stable versions of existing extensions, developers of extensions will need access to the extensions’ source, and be able to use version control.
The use of Docker and the inherent encapsulation of files and permissions makes the development of extensions harder than a CKAN source install.
Firstly, the absence of private SSH keys inside Docker containers will make interacting with GitHub a lot harder. On the other hand, two-factor authentication on GitHub breaks BasicAuth (HTTPS, username and password) and requires a “personal access token” in place of the password.
To use version control from inside the Docker container:
- Clone the HTTPS version of the GitHub repo.
- On GitHub, create a personal access token with “full control of private repositories”.
- Copy the token code and use as password when running
Secondly, the persisted extension source at
VOL_CKAN_HOME is owned by the CKAN container’s
docker user (UID 900) and therefore not writeable to the developer’s host user account by
default. There are various workarounds. The extension source can be accessed from both outside and
inside the container.
Option 1: Accessing the source from inside the container:
docker exec -it ckan /bin/bash -c "export TERM=xterm; exec bash" source $CKAN_VENV/bin/activate && cd $CKAN_VENV/src/ # ... work on extensions, use version control ... # in extension folder: python setup.py install exit # ... edit extension settings in production.ini and restart ckan container sudo vim $VOL_CKAN_CONFIG/production.ini docker-compose restart ckan
Option 2: Accessing the source from outside the container using
sudo vim $VOL_CKAN_CONFIG/production.ini sudo vim $VOL_CKAN_HOME/venv/src/ckanext-datawagovautheme/ckanext/datawagovautheme/templates/package/search.html
Option 3: The Ubuntu package
bindfs makes the write-protected volumes accessible to a system
sudo apt-get install bindfs mkdir ~/VOL_CKAN_HOME sudo chown -R `whoami`:docker $VOL_CKAN_HOME sudo bindfs --map=900/`whoami` $VOL_CKAN_HOME ~/VOL_CKAN_HOME cd ~/VOL_CKAN_HOME/venv/src # Do this with your own extension fork # Assumption: the host user running git clone (you) has write access to the repository git clone https://github.com/parksandwildlife/ckanext-datawagovautheme.git # ... change files, use version control...
Changes in HTML templates and CSS will be visible right away.
For changes in code, we’ll need to unmount the directory, change ownership back to the
user, and follow the previous steps to
python setup.py install and
pip install -r requirements.txt from within the running container, modify the
and restart the container:
sudo umount ~/VOL_CKAN_HOME sudo chown -R 900:900 $VOL_CKAN_HOME # Follow steps a-c
Mounting host folders as volumes instead of using named volumes may result in a simpler development workflow. However, named volumes are Docker’s canonical way to persist data. The steps shown above are only some of several possible approaches.
7. Environment variables¶
Sensitive settings can be managed in (at least) two ways, either as environment variables, or as
This section illustrates the use of environment variables provided by the Docker Compose
This section is targeted at CKAN maintainers seeking a deeper understanding of variables,
and at CKAN developers seeking to factor out settings as new
Variable substitution propagates as follows:
.env.templateholds the defaults and the usage instructions for variables.
- The maintainer copies
.env.templateand modifies it following the instructions.
- Docker Compose interpolates variables in
- Docker Compose can pass on these variables to the containers as build time variables (when building the images) and / or as run time variables (when running the containers).
ckan-entrypoint.shhas access to all run time variables of the
ckan-entrypoint.shinjects environment variables (e.g.
CKAN_SQLALCHEMY_URL) into the running
ckancontainer, overriding the CKAN config variables from
See Configuration Options for a list of environment variables
CKAN_SQLALCHEMY_URL) which CKAN will accept to override
After adding new or changing existing
.env variables, locally built images and volumes may
need to be dropped and rebuilt. Otherwise, docker will re-use cached images with old or missing
docker-compose down docker-compose up -d --build # if that didn't work, try: docker rmi $(docker images -q -f dangling=true) docker-compose up -d --build # if that didn't work, try: docker rmi $(docker images -q -f dangling=true) docker volume prune docker-compose up -d --build
Removing named volumes will destroy data.
docker volume prune will delete any volumes not attached to a running(!) container.
Backup all data before doing this in a production setting.
8. Steps towards production¶
As mentioned above, some design decisions may not be suitable for a production setup.
A possible path towards a production-ready environment is:
- Use the above setup to build docker images.
- Add and configure extensions.
- Make sure that no sensitive settings are hard-coded inside the images.
- Push the images to a docker repository.
- Create a separate “production”
docker-compose.ymlwhich uses the custom built images.
- Run the “production”
docker-compose.ymlon the production server with appropriate settings.
- Transfer production data into the new server as described above using volume orchestration tools or transferring files directly.
- Bonus: contribute a write-up of working production setups to the CKAN documentation.