Option 1: Package Installation

This section describes how to install CKAN from packages. This is the recommended and by far the easiest way to install CKAN.

Package install requires you to use Ubuntu 10.04: either locally, through a virtual machine or Amazon EC2. Your options are as follows:

  • Using Ubuntu 10.04 directly.
  • Option A: Using VirtualBox. This is suitable if you want to host your CKAN instance on a machine running any other OS.
  • Option B: Using Amazon EC2. This is suitable if you want to host your CKAN instance in the cloud, on a ready-made Ubuntu OS.

Note

We recommend you use package installation unless you are a core CKAN developer or have no access to Ubuntu 10.04 through any of the methods above, in which case, you should use Option 2: Install from Source.

For support during installation, please contact the ckan-dev mailing list.

Prepare your System

CKAN runs on Ubuntu 10.04. If you are already using Ubuntu 10.04, you can continue straight to Run the Package Installer.

However, if you’re not, you can either use VirtualBox to set up an Ubuntu VM on Windows, Linux, Macintosh and Solaris. Alternatively, you can use an Amazon EC2 instance.

Option A: Using VirtualBox

This option is suitable if you want to install CKAN on a machine running an OS other than Ubuntu 10.04. VirtualBox lets you set up a virtual machine to run Ubuntu 10.04.

Pre-requisites and Downloads

First, check your machine meets the pre-requisites for VirtualBox. These include a fairly recent processor and some spare memory.

Then download the installation files.

Install VirtualBox

Note

This tutorial is for a Mac, but you can find instructions for installing VirtualBox on any OS in the VirtualBox Manual.

To install, double-click on the VirtualBox installer:

The VirtualBox installer - getting started

Click Continue to begin the installation process. Enter your password when required, and wait for the installation to finish.

Create Your Virtual Machine

Go to Applications and open VirtualBox, then click New:

The VirtualBox installer - the New Virtual Machine Wizard

Give your VM a name - we’ll call ours ubuntu_ckan. Under OS Type, choose Linux and Ubuntu (32 or 64-bit).

The VirtualBox installer - choosing your operating system

Leave the memory size as 512MB, and choose Create new hard disk (be aware that for production use you should probably allow 1.5GB RAM). This will open a new wizard:

The VirtualBox installer - creating a new hard disk

You can leave the defaults unchanged here too - click Continue, and then Done, and Done again, to create a new VM.

Next, choose your VM from the left-hand menu, and click Start:

Starting your new VM

This will open the First Run Wizard:

The VirtualBox First Run Wizard

After clicking Continue, you’ll see Select Installation Media. This is where we need to tell our VM to boot from Ubuntu. Click on the file icon, and find your Ubuntu .iso file:

When you get to Select Installation Media, choose your Ubuntu .iso file

Click Done, wait for a few seconds, and you will see your Ubuntu VM booting.

Set Up Ubuntu

During boot, you will be asked if you want to try Ubuntu, or install it. Choose Install Ubuntu:

Booting Ubuntu - choose the Install Ubuntu option

You can then follow the usual Ubuntu installation process.

After Ubuntu is installed, from the main menu, choose System > Administration > Update Manager. You’ll be asked if you want to install updates - say yes.

When all the updates have been downloaded and installed, you’ll be prompted to reboot Ubuntu.

At this point, you can proceed to Run the Package Installer.

Option B: Using Amazon EC2

If you prefer to run your CKAN package install in the cloud, you can use an Amazon EC2 instance, which is a fairly cheap and lightweight way to set up a server.

Create an Amazon Account

If you don’t already have an Amazon AWS account you’ll need to create one first. You can create an Amazon AWS account for EC2 here.

Configure EC2

Once you have an EC2 account, you’ll need to configure settings for your CKAN instance.

Start by logging into your Amazon AWS Console and click on the EC2 tab.

Select the region you want to run your CKAN instance in - the security group you set up is region-specific. In this tutorial, we use EU West, so it will be easier to follow if you do too.

_images/1.png
Set up a Security Group

Click the Security Groups link in the My Resources section in the right-hand side of the dashboard.

_images/2.png

Create a security group called web_test that gives access to ports 22, 80 and 5000 as shown below. This is needed so that you’ll actually be able to access your server once it is created. You can’t change these settings once the instance is running, so you need to do so now.

_images/3a.png _images/3b.png
Create a Keypair

Now create a new keypair ckan_test to access your instance:

_images/4.png

When you click Create, your browser will prompt you to save a keypair called ckan_test.pem:

_images/5.png

In this tutorial, we save the keypair in ~/Downloads/ckan_test.pem, but you should save it somewhere safe.

Note

If you plan to boot your EC2 instance from the command line, you need to remember where you’ve put this file.

Boot the EC2 Image

CKAN requires Ubuntu 10.04 to run (either the i386 or amd64 architectures). Luckily Canonical provide a range of suitable images.

The cheapest EC2 instance is the micro one, but that isn’t very powerful, so in this tutorial, we’ll use the 32-bit small version.

We’re in eu-west-1 and we’ll use an instance-only image (i.e. all the data will be lost when you shut it down) so we need the ami-3693a542 AMI.

Note

There are more recent Ubuntu images at http://cloud.ubuntu.com/ami/ but we need the older 10.04 LTS release.

At this point, you can either boot this image from the AWS console or launch it from the command line.

Option 1: Boot the EC2 Image AMI via the AWS Console

From the EC2 dashboard, choose Launch instance >:

Choose launch instance from the EC2 dashboard

Now work through the wizard as shown in the following screenshots.

In the first step search for ami-3693a542 and select it from the results (it may take a few seconds for Amazon to find it).

Warning

No image other than ami-3693a542 will work with CKAN.

Search for image ami-3693a542

You can keep the defaults for all of the following screens:

Keep the defaults while setting up your instance Keep the defaults while setting up your instance Keep the defaults while setting up your instance Keep the defaults while setting up your instance

Choose the web_test security group you created earlier:

Choose the web_test security group you created earlier

Then finish the wizard:

Finish the wizard

Finally click the View your instances on the Instances page link:

View your instance

After a few seconds you’ll see your instance has booted. Now skip to Log in to the Instance.

Option 2: Boot the EC2 Image AMI from the Command Line

[You can skip this section if you’ve just booted from the AWS console and go straight to Log in to the Instance]

To boot from the command line you still need the same information but you enter it in one command. I’ll show you now.

Install The EC2 Tools Locally

If you are on Linux, you can just install the tools like this:

sudo apt-get install ec2-ami-tools
sudo apt-get install ec2-api-tools

If you are on Windows or Mac you’ll need to download them from the Amazon website.

Once the software is installed you can use the files you’ve just downloaded to do create your instance.

Get Security Certificates

Next click on the Account link, right at the top of the screen, and you’ll see this screen:

The Account screen

From this screen choose Security Credentials from the left hand side. Once the page has loaded scroll down and you’ll see the Access Credentials section. Click on the X.509 Certificate tab:

The Access Credentials screen

Here you’ll be able to create an X.509 certificate and private key.

Tip

You can only have two X.509 certificates at any given time, so you might need to inactivate an old one first and then delete it before you are allowed to create a new one, as in the screenshot above.

Once you click the Create New Certificate link you get a popup which allows you to download the certificate and private key - do this. Once again, ours are in ~/Downloads, but you should save it somewhere safe.

Download your certificate

Tip

Amazon will only give you a private key file once when you create it so although you can always go back to get a copy of the certificate, you can only get the private key once. Make sure you save it in a safe place.

You now have:

  • Your private key (pk-[ID].pem)
  • Your certificate file (cert-[ID].pem)
  • Your new keypair (ckan-test.pem)

The private key and the certificate files have the same name in the ID part.

Create an Ubuntu Instance

Once the tools are installed, run this command:

ec2-run-instances ami-3693a542 --instance-type m1.small --region eu-west-1 --group web_test \
    --key ckan_test \
    --private-key ~/Downloads/pk-[ID].pem \
    --cert ~/Downloads/cert-[ID].pem

Note

The --key argument is the name of the keypair (ckan_test), not the certificate itself (ckan_test.pem).

Warning

Amazon charge you for a minimum of one hour usage, so you shouldn’t create and destroy lots of EC2 instances unless you want to be charged a lot.

Log in to the Instance

Once your instance has booted, you will need to find out its public DNS. Give it a second or two for the instance to load then browse to the running instance in the AWS console. If you tick your instance you’ll be able to find the public DNS by scrolling down to the bottom of the Description tag.

Find the public DNS

Here you can see that our public DNS is ec2-79-125-86-107.eu-west-1.compute.amazonaws.com. The private DNS only works from other EC2 instances so isn’t any use to us.

Once you’ve found your instance’s public DNS, ensure the key has the correct permissions:

chmod 0600 "ckan_test.pem"

You can then log in like this:

ssh -i ~/Downloads/ckan_test.pem [email protected]

The first time you connect you’ll see this, choose yes:

RSA key fingerprint is 6c:7e:8d:a6:a5:49:75:4d:9e:05:2e:50:26:c9:4a:71.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'ec2-79-125-86-107.eu-west-1.compute.amazonaws.com,79.125.86.107' (RSA) to the list of known hosts.

When you log in you’ll see a welcome message. You can now proceed to Run the Package Installer.

Note

If this is a test install of CKAN, when you have finished using CKAN, you can shut down your EC2 instance through the AWS console.

Warning

Shutting down your EC2 instance will lose all your data. Also, Amazon charge you for a minimum usage of one hour, so don’t create and destroy lots of EC2 instances unless you want to be charged a lot!

Run the Package Installer

On your Ubuntu 10.04 system, open a terminal and run these commands to prepare your system:

sudo apt-get update
sudo apt-get install -y wget
echo "deb http://apt.okfn.org/ckan-1.5 lucid universe" | sudo tee /etc/apt/sources.list.d/okfn.list
wget -qO- "http://apt.okfn.org/packages_public.key" | sudo apt-key add -
sudo apt-get update

Now you are ready to install. If you already have a PostgreSQL and Solr instance that you want to use set up on a different server you don’t need to install postgresql-8.4 and solr-jetty locally. For most cases you’ll need CKAN, PostgreSQL and Solr all running on the same server so run:

sudo apt-get install -y ckan postgresql-8.4 solr-jetty

The install will whirr away. With ckan, postgresql-8.4 and solr-jetty chosen, over 180Mb of packages will be downloaded (on a clean install). This will take a few minutes, then towards the end you’ll see this:

Setting up solr-jetty (1.4.0+ds1-1ubuntu1) ...
 * Not starting jetty - edit /etc/default/jetty and change NO_START to be 0 (or comment it out).

If you’ve installed solr-jetty locally you’ll also need to configure your local Solr server for use with CKAN. You can do so like this:

sudo ckan-setup-solr

This changes the Solr schema to support CKAN, sets Solr to start automatically and then starts Solr. You shouldn’t be using the Solr instance for anything apart from CKAN because the command above modifies its schema.

You can now create CKAN instances as you please using the ckan-create-instance command. It takes these arguments:

Instance name

This should be a short letter only string representing the name of the CKAN instance. It is used (amongst other things) as the basis for:

  • The directory structure of the instance in /var/lib/ckan, /var/log/ckan, /etc/ckan and elsewhere
  • The name of the PostgreSQL database to use
  • The name of the Solr core to use

Instance Hostname/domain name

The hostname that this CKAN instance will be hosted at. It is used in the Apache configuration virutal host in /etc/apache2/sites-available/<INSTANCE_NAME>.common so that Apache can resolve requests directly to CKAN.

If you are using Amazon EC2, you will use the public DNS of your server as this argument. These look soemthing like ec2-46-51-149-132.eu-west-1.compute.amazonaws.com. If you are using a VM, this will be the hostname of the VM you have configured in your /etc/hosts file.

If you install more than one CKAN instance you’ll need to set different hostnames for each. If you ever want to change the hostname CKAN responds on you can do so by editing /etc/apache2/sites-available/<INSTANCE_NAME>.common and restarting apache with sudo /etc/init.d/apache2 restart.

Local PostgreSQL support ("yes" or "no")

If you specify "yes", CKAN will also set up a local database user and database and create its tables, populating them as necessary and saving the database password in the config file. You would normally say "yes" unless you plan to use CKAN with a PostgreSQL on a remote machine.

If you choose "no" as the third parameter to tell the install command not to set up or configure the PostgreSQL database for CKANi you’ll then need to perform any database creation and setup steps manually yourself.

For production use the second argument above is usually the domain name of the CKAN instance, but in our case we are testing, so we’ll use the default hostname buildkit sets up to the server which is default.vm.buildkit (this is automatically added to your host machine’s /etc/hosts when the VM is started so that it will resovle from your host machine - for more complex setups you’ll have to set up DNS entries instead).

Create a new instance like this:

sudo ckan-create-instance std default.vm.buildkit yes

You’ll need to specify a new instance name and different hostname for each CKAN instance you set up.

Don’t worry about warnings you see like this during the creation process, they are harmless:

/usr/lib/pymodules/python2.6/ckan/sqlalchemy/engine/reflection.py:46: SAWarning: Did not recognize type 'tsvector' of column 'search_vector' ret = fn(self, con, *args, **kw)

You can now access your CKAN instance from your host machine as http://default.vm.buildkit/

Tip

If you get taken straight to a login screen it is a sign that the PostgreSQL database initialisation may not have run. Try running:

INSTANCE=std
sudo paster --plugin=ckan db init --config=/etc/ckan/${INSTANCE}/${INSTANCE}.ini

If you specified "no" as part of the create-ckan-instance you’ll need to specify database and solr settings in /etc/ckan/std/std.ini. At the moment you’ll see an “Internal Server Error” from Apache. You can always investigate such errors by looking in the Apache and CKAN logs for that instance.

Sometimes things don’t go as planned so let’s look at some of the log files.

This is the CKAN log information (leading data stripped for clarity):

$ sudo -u ckanstd tail -f /var/log/ckan/std/std.log
WARNI [vdm] Skipping adding property Package.all_revisions_unordered to revisioned object
WARNI [vdm] Skipping adding property PackageTag.all_revisions_unordered to revisioned object
WARNI [vdm] Skipping adding property Group.all_revisions_unordered to revisioned object
WARNI [vdm] Skipping adding property PackageGroup.all_revisions_unordered to revisioned object
WARNI [vdm] Skipping adding property GroupExtra.all_revisions_unordered to revisioned object
WARNI [vdm] Skipping adding property PackageExtra.all_revisions_unordered to revisioned object
WARNI [vdm] Skipping adding property Resource.all_revisions_unordered to revisioned object
WARNI [vdm] Skipping adding property ResourceGroup.resources_all to revisioned object

No error here, let’s look in Apache (leading data stripped again) in the case where we chose "no" to PostgreSQL installation:

$ tail -f /var/log/apache2/std.error.log
    self.connection = self.__connect()
  File "/usr/lib/pymodules/python2.6/ckan/sqlalchemy/pool.py", line 319, in __connect
    connection = self.__pool._creator()
  File "/usr/lib/pymodules/python2.6/ckan/sqlalchemy/engine/strategies.py", line 82, in connect
    return dialect.connect(*cargs, **cparams)
  File "/usr/lib/pymodules/python2.6/ckan/sqlalchemy/engine/default.py", line 249, in connect
    return self.dbapi.connect(*cargs, **cparams)
OperationalError: (OperationalError) FATAL:  password authentication failed for user "ckanuser"
FATAL:  password authentication failed for user "ckanuser"
 None None

There’s the problem. If you don’t choose "yes" to install PostgreSQL, you need to set up the sqlalchemy.url option in the config file manually. Edit it to set the correct settings:

sudo -u ckanstd vi /etc/ckan/std/std.ini

Notice how you have to make changes to CKAN config files and view CKAN log files using the username set up for your CKAN user.

Each instance you create has its own virtualenv that you can install extensions into at /var/lib/ckan/std/pyenv and its own system user, in this case ckanstd. Any time you make changes to the virtualenv, you should make sure you are running as the correct user otherwise Apache might not be able to load CKAN. For example, say you wanted to install a ckan extension, you might run:

sudo -u ckanstd /var/lib/ckan/std/pyenv/bin/pip install <name-of-extension>

You can now configure your instance by editing /etc/ckan/std/std.ini:

sudo -u ckanstd vi /etc/ckan/std/std.ini

After any change you can touch the wsgi.py to tell Apache’s mod_wsgi that it needs to take notice of the change for future requests:

sudo touch /var/lib/ckan/std/wsgi.py

Or you can of course do a full restart if you prefer:

sudo /etc/init.d/apache2 restart

Caution

CKAN has etag caching enabled by default which encourages your browser to cache the homepage and all the dataset pages. This means that if you change CKAN’s configuration you’ll need to do a ‘force refresh’ by pressing Shift + Ctrl + F5 together or Shift + Ctrl + R (depending on browser) before you’ll see the change.

One of the key things it is good to set first is the ckan.site_description option. The text you set there appears in the banner at the top of your CKAN instance’s pages.

You can enable and disable particular CKAN instances by running:

sudo a2ensite std
sudo /etc/init.d/apache2 reload

or:

sudo a2dissite std
sudo /etc/init.d/apache2 reload

respectively.

Now you should be up and running. Don’t forget you there is the a help page for dealing with Common error messages.

Visit your CKAN instance - either at your Amazon EC2 hostname, or at on your host PC or virtual machine. You’ll be redirected to the login screen because you won’t have set up any permissions yet, so the welcome screen will look something like this.

_images/9.png

You can now proceed to Post-Installation Setup.

Warning

If you use the ckan-create-instance command to create more than one instance there are a couple of things you need to be aware of. Firstly, you need to change the Apache configurations to put mod_wsgi into daemon mode and secondly you need to watch your Solr search index carefully to make sure that the different instances are not over-writing each other’s data.

To change the Apache configuration uncomment the following lines for each instance in /etc/apache2/sites-available/std.common and make sure ${INSTANCE} is replaced with your instance name:

# Deploy as a daemon (avoids conflicts between CKAN instances)
# WSGIDaemonProcess ${INSTANCE} display-name=${INSTANCE} processes=4 threads=15 maximum-requests=10000
# WSGIProcessGroup ${INSTANCE}

If you don’t do this and you install different versions of the same Python packages into the different pyenvs in /var/lib/ckan for each instance, there is a chance the CKAN instances might use the wrong package.

The CKAN team have also recently had difficulties with CKAN instances writing over each other’s Solr search indexes. These have been documented in ticket #1430. If you run into the same problems send an email to ckan-dev.

CKAN packaging is well tested and reliable with single instance CKAN installs. Multi-instance support is newer, and whilst we believe will work well, hasn’t had the same degree of testing. If you hit any problems with multi-instance installs, do let us know and we’ll help you fix them.