The following sections describe features (both core features and built-in
extensions) that come with CKAN, including how to enable, setup and configure
each feature or extension.
Beyond these core features, further features can be added to CKAN by
downloading and installing external extensions. A good place to find extensions
is the
list of extensions on the CKAN wiki.
Command Line Interface
Most common CKAN administration tasks can be carried out from the command line
on the server that CKAN is installed on, using the paster
command.
If you have trouble running paster commands, see
Troubleshooting Paster Commands below.
Note
Before running a CKAN paster
command, you have to activate your CKAN
virtualenv and change to the ckan
directory, for example:
. /usr/lib/ckan/default/bin/activate
cd /usr/lib/ckan/default/src/ckan
To run a paster command without activating the virtualenv first, you have
to give the full path the paster script within the virtualenv, for example:
/usr/lib/ckan/default/bin/paster --plugin=ckan user list -c /etc/ckan/default/development.ini
To run a paster command without changing to the ckan directory first, add
the --plugin=ckan
option to the command. For example:
paster --plugin=ckan user list -c /etc/ckan/default/development.ini
In the example commands below, we assume you’re running the commands with
your virtualenv activated and from your ckan directory.
The general form of a CKAN paster
command is:
paster command --config=/etc/ckan/default/development.ini
The --config
option tells CKAN where to find your config file, which it
reads for example to know which database it should use. As you’ll see in the
examples below, this option can be given as -c
for short.
command
should be replaced with the name of the CKAN command that you wish
to execute. Most commands have their own subcommands and options. For example,
to print out a list of all of your CKAN site’s users do:
paster user list -c /etc/ckan/default/development.ini
(Here user
is the name of the CKAN command you’re running, and list
is
a subcommand of user
.)
For a list of all available commands, simply run paster
on its own with no
command, or see Paster Commands Reference. In this case we don’t need the
-c
option, since we’re only asking CKAN to print out information about
commands, not to actually do anything with our CKAN site:
Each command has its own help text, which tells you what subcommands and
options it has (if any). To print out a command’s help text, run the command
with the --help
option:
Troubleshooting Paster Commands
Virtualenv not activated, or not in ckan dir
Most errors with paster commands can be solved by remembering to activate
your virtual environment and change to the ckan directory before running
the command:
. /usr/lib/ckan/default/bin/activate
cd /usr/lib/ckan/default/src/ckan
Error messages such as the following are usually caused by forgetting to do
this:
- Command ‘foo’ not known (where foo is the name of the command you
tried to run)
- The program ‘paster’ is currently not installed
- Command not found: paster
- ImportError: No module named fanstatic (or other
ImportError
s)
Running paster commands provided by extensions
If you’re trying to run a CKAN command provided by an extension that you’ve
installed and you’re getting an error like Command ‘foo’ not known even
though you’ve activated your virtualenv and changed to the ckan directory, this
is because you need to run the extension’s paster commands from the extension’s
source directory not CKAN’s source directory. For example:
. /usr/lib/ckan/default/bin/activate
cd /usr/lib/ckan/default/src/ckanext-spatial
paster foo -c /etc/ckan/default/development.ini
This should not be necessary when using the pre-installed extensions that come
with CKAN.
Alternatively, you can give the extension’s name using the --plugin
option,
for example
paster --plugin=ckanext-foo foo -c /etc/ckan/default/development.ini
Todo
Running a paster shell with paster --plugin=pylons shell -c ...
.
Useful for development?
Wrong config file path
- AssertionError: Config filename development.ini does not exist
- This means you forgot to give the
--config
or -c
option to tell CKAN
where to find your config file. (CKAN looks for a config file named
development.ini
in your current working directory by default.)
- ConfigParser.MissingSectionHeaderError: File contains no section headers
- This happens if the config file that you gave with the
-c
or --config
option is badly formatted, or if you gave the wrong filename.
- IOError: [Errno 2] No such file or directory: ‘...’
- This means you gave the wrong path to the
--config
or -c
option
(you gave a path to a file that doesn’t exist).
Paster Commands Reference
The following paster commands are supported by CKAN:
celeryd |
Control celery daemon. |
check-po-files |
Check po files for common mistakes |
color |
Create or remove a color scheme. |
create-test-data |
Create test data in the database. |
dataset |
Manage datasets. |
datastore |
Perform commands to set up the datastore. |
db |
Perform various tasks on the database. |
front-end-build |
Creates and minifies css and JavaScript files |
less |
Compile all root less documents into their CSS counterparts |
minify |
Create minified versions of the given Javascript and CSS files. |
notify |
Send out modification notifications. |
plugin-info |
Provide info on installed plugins. |
profile |
Code speed profiler |
ratings |
Manage the ratings stored in the db |
rdf-export |
Export active datasets as RDF. |
search-index |
Creates a search index for all datasets |
sysadmin |
Gives sysadmin rights to a named user. |
tracking |
Update tracking statistics. |
trans |
Translation helper functions |
user |
Manage users. |
celeryd: Control celery daemon
Usage:
celeryd <run> - run the celery daemon
celeryd run concurrency - run the celery daemon with
argument 'concurrency'
celeryd view - view all tasks in the queue
celeryd clean - delete all tasks in the queue
check-po-files: Check po files for common mistakes
Usage:
check-po-files [options] [FILE] ...
color: Create or remove a color scheme
After running this command, you’ll need to regenerate the css files. See less: Compile all root less documents into their CSS counterparts for details.
Usage:
color - creates a random color scheme
color clear - clears any color scheme
color <'HEX'> - uses as base color eg '#ff00ff' must be quoted.
color <VALUE> - a float between 0.0 and 1.0 used as base hue
color <COLOR_NAME> - html color name used for base color eg lightblue
create-test-data: Create test data
As the name suggests, this command lets you load test data when first setting up CKAN. See Creating Test Data for details.
dataset: Manage datasets
Usage:
dataset DATASET_NAME|ID - shows dataset properties
dataset show DATASET_NAME|ID - shows dataset properties
dataset list - lists datasets
dataset delete [DATASET_NAME|ID] - changes dataset state to 'deleted'
dataset purge [DATASET_NAME|ID] - removes dataset from db entirely
db: Manage databases
Lets you initialise, upgrade, and dump the CKAN database.
Initialization
Before you can run CKAN for the first time, you need to run db init
to
initialize your database:
paster db init -c /etc/ckan/default/production.ini
If you forget to do this you’ll see this error message in your web browser:
503 Service Unavailable: This site is currently off-line. Database is not
initialised.
Cleaning
You can delete everything in the CKAN database, including the tables, to start
from scratch:
Warning
This will delete all data from your CKAN database!
paster db clean -c /etc/ckan/default/production.ini
After cleaning the db you must do a db init
or db load
before CKAN will
work again.
Dumping and Loading databases to/from a file
You can ‘dump’ (save) the exact state of the database to a file on disk and at
a later point ‘load’ (restore) it again.
Tip
You can also dump the database from one CKAN instance, and then load it into
another CKAN instance on the same or another machine. This will even work if
the CKAN instance you dumped the database from is an older version of CKAN
than the one you load it into, the database will be automatically upgraded
during the load command. (But you cannot load a database from a newer
version of CKAN into an older version of CKAN.)
To export a dump of your CKAN database:
paster db dump -c /etc/ckan/default/production.ini my_database_dump.sql
To load it in again, you first have to clean the database (this will delete all
data in the database!) and then load the file:
paster db clean -c /etc/ckan/default/production.ini
paster db load -c /etc/ckan/default/production.ini my_database_dump.sql
Exporting Datasets to JSON or CSV
You can export all of your CKAN site’s datasets from your database to a JSON file
using the db simple-dump-json
command:
paster db simple-dump-json -c /etc/ckan/default/production.ini my_datasets.json
To export the datasets in CSV format instead, use db simple-dump-csv
:
paster db simple-dump-csv -c /etc/ckan/default/production.ini my_datasets.csv
This is useful to create a simple public listing of the datasets, with no user
information. Some simple additions to the Apache config can serve the dump
files to users in a directory listing. To do this, add these lines to your
virtual Apache config file (e.g. /etc/apache2/sites-available/ckan_default):
Alias /dump/ /home/okfn/var/srvc/ckan.net/dumps/
# Disable the mod_python handler for static files
<Location /dump>
SetHandler None
Options +Indexes
</Location>
Warning
Don’t serve an SQL dump of your database (created using the paster db
dump
command), as those contain private user information such as email
addresses and API keys.
Exporting User Accounts to CSV
You can export all of your CKAN site’s user accounts from your database to a CSV file
using the db user-dump-csv
command:
paster db user-dump-csv -c /etc/ckan/default/production.ini my_database_users.csv
front-end-build: Creates and minifies css and JavaScript files
Usage:
less: Compile all root less documents into their CSS counterparts
Usage:
minify: Create minified versions of the given Javascript and CSS files
Usage:
paster minify [--clean] PATH
For example:
paster minify ckan/public/base
paster minify ckan/public/base/css/*.css
paster minify ckan/public/base/css/red.css
If the –clean option is provided any minified files will be removed.
notify: Send out modification notifications
Usage:
notify replay - send out modification signals. In "replay" mode,
an update signal is sent for each dataset in the database.
plugin-info: Provide info on installed plugins
As the name suggests, this commands shows you the installed plugins, their description, and which interfaces they implement
profile: Code speed profiler
Provide a ckan url and it will make the request and record how long each function call took in a file that can be read
by runsnakerun.
Usage:
The result is saved in profile.data.search. To view the profile in runsnakerun:
runsnakerun ckan.data.search.profile
You may need to install the cProfile python module.
ratings: Manage dataset ratings
Manages the ratings stored in the database, and can be used to count ratings, remove all ratings, or remove only anonymous ratings.
For example, to remove anonymous ratings from the database:
paster --plugin=ckan ratings clean-anonymous --config=/etc/ckan/std/std.ini
rdf-export: Export datasets as RDF
This command dumps out all currently active datasets as RDF into the specified folder:
paster rdf-export /path/to/store/output
search-index: Rebuild search index
Rebuilds the search index. This is useful to prevent search indexes from getting out of sync with the main database.
For example:
paster --plugin=ckan search-index rebuild --config=/etc/ckan/std/std.ini
This default behaviour will clear the index and rebuild it with all datasets. If you want to rebuild it for only
one dataset, you can provide a dataset name:
paster --plugin=ckan search-index rebuild test-dataset-name --config=/etc/ckan/std/std.ini
Alternatively, you can use the -o or –only-missing option to only reindex datasets which are not
already indexed:
paster --plugin=ckan search-index rebuild -o --config=/etc/ckan/std/std.ini
If you don’t want to rebuild the whole index, but just refresh it, use the -r or –refresh option. This
won’t clear the index before starting rebuilding it:
paster --plugin=ckan search-index rebuild -r --config=/etc/ckan/std/std.ini
There is also an option available which works like the refresh option but tries to use all processes on the
computer to reindex faster:
paster --plugin=ckan search-index rebuild_fast --config=/etc/ckan/std/std.ini
There are other search related commands, mostly useful for debugging purposes:
search-index check - checks for datasets not indexed
search-index show DATASET_NAME - shows index of a dataset
search-index clear [DATASET_NAME] - clears the search index for the provided dataset or for the whole ckan instance
sysadmin: Give sysadmin rights
Gives sysadmin rights to a named user. This means the user can perform any action on any object.
For example, to make a user called ‘admin’ into a sysadmin:
paster --plugin=ckan sysadmin add admin --config=/etc/ckan/std/std.ini
tracking: Update tracking statistics
Usage:
tracking update [start_date] - update tracking stats
tracking export FILE [start_date] - export tracking stats to a csv file
trans: Translation helper functions
Usage:
trans js - generate the javascript translations
trans mangle - mangle the zh_TW translations for testing
user: Create and manage users
Lets you create, remove, list and manage users.
For example, to create a new user called ‘admin’:
paster --plugin=ckan user add admin --config=/etc/ckan/std/std.ini
To delete the ‘admin’ user:
paster --plugin=ckan user remove admin --config=/etc/ckan/std/std.ini
Authorization
Changed in version 2.0: Previous versions of CKAN used a different authorization system.
CKAN’s authorization system controls which users are allowed to carry out which
actions on the site. All actions that users can carry out on a CKAN site are
controlled by the authorization system. For example, the authorization system
controls who can register new user accounts, delete user accounts, or create,
edit and delete datasets, groups and organizations.
Authorization in CKAN can be controlled in three ways:
- Organizations
- Configuration file options
- Extensions
The following sections explain each of the three methods in turn.
Note
An organization admin in CKAN is an administrator of a particular
organization within the site, with control over that organization and its
members and datasets. A sysadmin is an administrator of the site itself.
Sysadmins can always do everything, including adding, editing and deleting
datasets, organizations and groups, regardless of the organization roles and
configuration options described below.
Organizations
Organizations are the primary way to control who can see, create and update
datasets in CKAN. Each dataset can belong to a single organization, and each
organization controls access to its datasets.
Datasets can be marked as public or private. Public datasets are visible to
everyone. Private datasets can only be seen by logged-in users who are members
of the dataset’s organization. Private datasets are not shown in general
dataset searches but are shown in dataset searches within the organization.
When a user joins an organization, an organization admin gives them one of
three roles: member, editor or admin.
An organization admin can:
- View the organization’s private datasets
- Add new datasets to the organization
- Edit or delete any of the organization’s datasets
- Make datasets public or private.
- Add users to the organization, and choose whether to make the new user a
member, editor or admin
- Change the role of any user in the organization, including other admin users
- Remove members, editors or other admins from the organization
- Edit the organization itself (for example: change the organization’s title,
description or image)
- Delete the organization
An editor can:
- View the organization’s private datasets
- Add new datasets to the organization
- Edit or delete any of the organization’s datasets
A member can:
- View the organization’s private datasets.
When a user creates a new organization, they automatically become the first
admin of that organization.
Configuration File Options
The following configuration file options can be used to customize CKAN’s
authorization behavior:
ckan.auth.anon_create_dataset
Example:
ckan.auth.anon_create_dataset = False
Default value: False
Allow users to create datasets without registering and logging in.
ckan.auth.create_unowned_dataset
Example:
ckan.auth.create_unowned_dataset = False
Default value: True
Allow the creation of datasets not owned by any organization.
ckan.auth.create_dataset_if_not_in_organization
Example:
ckan.auth.create_dataset_if_not_in_organization = False
Default value: True
Allow users who are not members of any organization to create datasets,
default: true. create_unowned_dataset
must also be True, otherwise
setting create_dataset_if_not_in_organization
to True is meaningless.
ckan.auth.user_create_groups
Example:
ckan.auth.user_create_groups = False
Default value: True
Allow users to create groups.
ckan.auth.user_create_organizations
Example:
ckan.auth.user_create_organizations = False
Default value: True
Allow users to create organizations.
ckan.auth.user_delete_groups
Example:
ckan.auth.user_delete_groups = False
Default value: True
Allow users to delete groups.
ckan.auth.user_delete_organizations
Example:
ckan.auth.user_delete_organizations = False
Default value: True
Allow users to delete organizations.
ckan.auth.create_user_via_api
Example:
ckan.auth.create_user_via_api = False
Default value: False
Allow new user accounts to be created via the API.
Extensions
CKAN allows extensions to change the authorization rules used. Please see
individual extensions for details.
Todo
Insert cross-reference to IAuthFunctions
docs.
Data Viewer
The CKAN resource page can contain a preview of the resource’s data.
This works by either:
- Embedding the data into the page, either directly or by loading the data
in an iframe.
- Using a custom widget (such as Recline)
to view the data.
Generally, the decision as to which action to take is determined by the type of
resource being viewed.
In general, images will be directly embedded, unstructured or plain text
files will be loaded in an iframe, and more complex data types will need to
use a custom widget.
The data preview functionality that is provided by CKAN is described in
the following sections:
These sections list the resource formats that each extension can preview and
provide instructions for how to enable each extension.
It is also possible for developers to create new extensions that can preview
different types of resources.
For more information on this topic see
Writing Extensions.
Viewing images and text files
Configuration required: None.
Images and text files (that match one of the file types given below) will be
previewed automatically by default.
Resource formats: images, plain text (details below).
By default, the following types of resources will be embedded directly into
the resource read page:
The types of resources that are embedded directly can be specified in the
CKAN config file. See ckan.preview.direct for more information.
The following types of resources will be loaded in an iframe if there is no
extension that can preview these types:
plain
txt
html
htm
xml
rdf+xml
owl+xml
n3
n-triples
turtle
atom
rss
The types of resources that are loaded in an iframe can be specified in the
CKAN config file. See ckan.preview.loadable for more information.
Note that these documents will be directly linked by the browser, so the
way in which they are shown may vary. If you want to ensure for instance that
XML based documents are correctly previewed, have a look at Viewing highlighted XML, JSON and plain text data.
Viewing structured data: the Data Explorer
New in version 2.0: the recline_preview
extension is new in CKAN 2.0.
Configuration required: The recline_preview
extension must be added to
ckan.plugins
in your CKAN configuration file.
This extension is part of CKAN and so does not need to be installed separately.
Resource formats: DataStore, csv
, xls
.
Structured data can be previewed using the
Recline Data Explorer.
The Data Explorer provides a rich, queryable view of the data, and allows the
data to be filtered, graphed and mapped.
To be viewed, the data must either be:
- In the CKAN DataStore.
This is the recommended way to preview structured data.
Or:
- In
csv
or xls
format.
In this case, CKAN will first have to try to convert the file into a more
structured format by using the
Dataproxy.
This is an automatic process that does not require any additional
configuration.
However, as the resource must be downloaded by the Dataproxy service and
then analysed before it is viewed, this option is generally slower and less
reliable than viewing data that is in the DataStore.
Viewing highlighted XML, JSON and plain text data
Configuration required: The text_preview
extension must be added to
ckan.plugins
in your CKAN configuration file.
This extension is part of CKAN and does not need to be installed
separately.
Resource formats:
json
, gjson
, geojson
(can be configured by setting ckan.preview.json_formats
)
jsonp
(can be configured by setting ckan.preview.jsonp_formats
)
xml
, rdf
, rdf+xml
, owl+xml
, atom
, rss
(can be configured by setting ckan.preview.xml_formats
)
text/plain
, txt
, plain
(can be configured by setting ckan.preview.text_formats
)
The text_preview
extension provides previews of many file types that have
been added to a CKAN instance. To view the data the resource format must be
set to one of the resource formats from above (case insensitive).
See also
The resourceproxy extension
If you want to preview linked-to text files (and not only files that have
been uploaded to CKAN) you need to enable the resource_proxy
extension
as well.
Viewing PDF documents
Configuration required: The pdf_preview
extension must be added to
ckan.plugins
in your CKAN configuration file. This extension is part of
CKAN and does not need to be installed separately.
Resource formats: pdf
, x-pdf
, acrobat
, vnd.pdf
.
The pdf_preview
extension provides previews of any pdf
documents that
have been added to a CKAN instance. This extension uses Mozilla’s pdf.js library.
See also
The resourceproxy extension
If you want to preview linked-to PDF files (and not only files that have
been uploaded to CKAN) you need to enable the resource_proxy
extension
as well.
Viewing remote resources: the resource proxy
Configuration required: The resource_proxy
extension must be added to
ckan.plugins
in your CKAN configuration file.
This extension is part of CKAN and so does not need to be installed separately.
This extension must be enabled if you wish to preview resources that are on a
different domain. That means if this extension is not enabled, e.g.
PDF, or JSON files that are on www.example.com
while CKAN is on
www.ckan.org
cannot be previewed by any extension.
Previewing is prevented by the
same origin policy which
prevents files from different domains (different origins) from being loaded
into browsers. This extension gets around the same origin policy by pretending
that all files are served from the same domain (same origin) that
CKAN is on (e.g. www.ckan.org
).
If you are writing a custom preview extension that requires resources to be
proxied, you need to replace the URL that is used to load the file. This can
be done using the function ckanext.resourceproxy.plugin.get_proxified_resource_url()
.
To find out whether the resource proxy is enabled, check ckan.resource_proxy_enabled
from the config. You can find a complete example in the
CKAN source.
Embedding Previews In Other Web Pages
Changed in version 2.0: The URL that is used to obtain the contents of the resource preview has
changed from /dataset/{name}/resource/{resource_id}/embed
to /dataset/{name}/resource/{resource_id}/preview
.
For each resource, the preview content can be viewed at
/dataset/{dataset id}/resource/{resource id}/preview
.
The preview content can therefore be embedded in other web pages by loading
the contents of this URL in an iframe.
FileStore and File Uploads
CKAN allows users to upload files directly to file storage either on the local
file system or to online ‘cloud’ storage like Amazon S3 or Google Storage. The
uploaded files will be stored in the configured location.
Setup the FileStore with Local File Storage
To setup CKAN’s FileStore with local file storage:
Create the directory where CKAN will store uploaded files:
sudo mkdir -p /var/lib/ckan/default
Add the following lines to your CKAN config file, after the [app:main]
line:
ofs.impl = pairtree
ofs.storage_dir = /var/lib/ckan/default
Set the permissions of the storage_dir
. For example if you’re running
CKAN with Apache, then Apache’s user (www-data
on Ubuntu) must have
read, write and execute permissions for the storage_dir
:
sudo chown www-data /var/lib/ckan/default
sudo chmod u+rwx /var/lib/ckan/default
Make sure you’ve set ckan.site_url in your config file.
Restart your web server, for example to restart Apache:
sudo service apache2 reload
Setup the FileStore with Cloud Storage
Important: you must install boto library for cloud storage to function:
In your config for google:
## OFS configuration
ofs.impl = google
ofs.gs_access_key_id = GOOG....
ofs.gs_secret_access_key = ....
For S3:
## OFS configuration
ofs.impl = s3
ofs.aws_access_key_id = ....
ofs.aws_secret_access_key = ....
FileStore Web Interface
Upload of files to storage is integrated directly into the the Dataset creation
and editing system with files being associated to Resources.
FileStore API
CKAN’s FileStore API lets you upload files to CKAN’s
FileStore. If you’re looking for an example,
ckanclient contains
Python code for uploading a file to CKAN using the FileStore API.
FileStore Request Authentication API
Provides credentials for doing operations on storage directly from a client.
Warning
This API is currently disabled and will likely be deprecated.
Use the form authentication instead.
The API is at:
/api/storage/auth/request/{label}
Provide authentication information for a request so a client can
interact with backend storage directly:
:param label: label.
:param kwargs: sent either via query string for GET or json-encoded
dict for POST). Interpreted as http headers for request plus an
(optional) method parameter (being the HTTP method).
Examples of headers are:
Content-Type
Content-Encoding (optional)
Content-Length
Content-MD5
Expect (should be '100-Continue')
:return: is a json hash containing various attributes including a
headers dictionary containing an Authorization field which is good for
15m.
DataStore Extension
Todo
What features does the datastore actually provide that users care about?
Why would they want to use it?
- API for reading, writing data without downloading, uploading entire file
- Enables Recline previews
- API for searching data, including search across resources
The CKAN DataStore provides a database for structured storage of data together
with a powerful Web-accessible Data API, all seamlessly integrated into the CKAN
interface and authorization system. At the same time, we kept the layer between the
underlying database and the user as thin as possible.
The DataStore is distinct but complementary to the FileStore (see
FileStore and File Uploads). In contrast to the the FileStore which provides ‘blob’
storage of whole files with no way to access or query parts of that file, the
DataStore is like a database in which individual data elements are accessible
and queryable. To illustrate this distinction, consider storing a spreadsheet
file like a CSV or Excel document. In the FileStore this file would be stored
directly. To access it you would download the file as a whole. By contrast, if
the spreadsheet data is stored in the DataStore, one would be able to access
individual spreadsheet rows via a simple web API, as well as being able to make
queries over the spreadsheet contents.
Note
The DataStore requires PostgreSQL 9.0 or later. It is possible to use the
DataStore on versions prior to 9.0 (for example 8.4). However, the
datastore_search_sql()
will not be
available and the set-up is slightly different. Make sure, you read
Legacy mode: use the DataStore with old PostgreSQL versions for more details.
Warning
The DataStore does not support hiding resources in a private dataset.
1. Enable the plugin
Add the datastore
plugin to your CKAN config file:
2. Set-up the database
Warning
Make sure that you follow the steps in Set Permissions below correctly. Wrong settings could lead to serious security issues.
The DataStore requires a separate PostgreSQL database to save the resources to.
List existing databases:
Check that the encoding of databases is UTF8
, if not internationalisation may be a problem. Since changing the encoding of PostgreSQL may mean deleting existing databases, it is suggested that this is fixed before continuing with the datastore setup.
Create users and databases
Tip
If your CKAN database and DataStore databases are on different servers, then
you need to create a new database user on the server where the DataStore
database will be created. As in Installing CKAN from Source we’ll name the
database user ckan_default:
sudo -u postgres createuser -S -D -R -P -l ckan_default
Create a database_user called datastore_default. This user will be given
read-only access to your DataStore database in the Set Permissions step
below:
sudo -u postgres createuser -S -D -R -P -l datastore_default
Create the database (owned by ckan_default), which we’ll call
datastore_default:
sudo -u postgres createdb -O ckan_default datastore_default -E utf-8
Set URLs
Now, uncomment the ckan.datastore.write_url and
ckan.datastore.read_url lines in your CKAN config file and edit them
if necessary, for example:
ckan.datastore.write_url = postgresql://ckan_default:pass@localhost/datastore_default
ckan.datastore.read_url = postgresql://datastore_default:pass@localhost/datastore_default
Replace pass
with the passwords you created for your ckan_default and
datastore_default database users.
Set Permissions
Once the DataStore database and the users are created, the permissions on the DataStore and CKAN database have to be set. Since there are different set-ups, there are different ways of setting the permissions. Only one of the options should be used.
Option 1: Paster command
This option is preferred if CKAN and PostgreSQL are on the same server.
To set the permissions, use this paster command after you’ve set the database URLs (make sure to have your virtualenv activated):
paster datastore set-permissions postgres -c /etc/ckan/default/development.ini
The postgres
in this command should be the name of a postgres
user with permission to create new tables and users, grant permissions, etc.
Typically this user is called “postgres”. See paster datastore
set-permissions -h
.
Option 3: SQL script
This option is for more complex set-ups and requires understanding of SQL and PostgreSQL.
Copy the set_permissions.sql
file to the server that the database runs on. Make sure you set all variables in the file correctly and comment out the parts that are not needed for you set-up. Then, run the script:
sudo -u postgres psql postgres -f set_permissions.sql
3. Test the set-up
The DataStore is now set-up. To test the set-up, (re)start CKAN and run the
following command to list all resources that are in the DataStore:
curl -X GET "http://127.0.0.1:5000/api/3/action/datastore_search?resource_id=_table_metadata"
This should return a JSON page without errors.
To test the whether the set-up allows writing, you can create a new resource in
the DataStore. To do so, run the following command:
curl -X POST http://127.0.0.1:5000/api/3/action/datastore_create -H "Authorization: {YOUR-API-KEY}" -d '{"resource_id": "{RESOURCE-ID}", "fields": [ {"id": "a"}, {"id": "b"} ], "records": [ { "a": 1, "b": "xyz"}, {"a": 2, "b": "zzz"} ]}'
Replace {YOUR-API-KEY}
with a valid API key and {RESOURCE-ID}
with a
resource id of an existing CKAN resource.
A table named after the resource id should have been created on your DataStore
database. Visiting this URL should return a response from the DataStore with
the records inserted above:
http://127.0.0.1:5000/api/3/action/datastore_search?resource_id={RESOURCE_ID}
You can now delete the DataStore table with:
curl -X POST http://127.0.0.1:5000/api/3/action/datastore_delete -H "Authorization: {YOUR-API-KEY}" -d '{"resource_id": "{RESOURCE-ID}"}'
To find out more about the DataStore API, see The DataStore API.
Legacy mode: use the DataStore with old PostgreSQL versions
Tip
The legacy mode can also be used to simplify the set-up since it does not require you to set the permissions or create a separate user.
The DataStore can be used with a PostgreSQL version prior to 9.0 in legacy mode. Due to the lack of some functionality, the datastore_search_sql()
and consequently the HTSQL Support cannot be used. To enable the legacy mode, remove the declaration of the ckan.datastore.read_url
.
The set-up for legacy mode is analogous to the normal set-up as described above with a few changes and consists of the following steps:
- Enable the plugin
- The legacy mode is enabled by not setting the
ckan.datastore.read_url
- Set-Up the database
- Create a separate database
- Create a write user on the DataStore database (optional since the CKAN user can be used)
- Test the set-up
There is no need for a read-only user or special permissions. Therefore the legacy mode can be used for simple set-ups as well.
The DataStore API allows tabular data to be stored inside CKAN quickly and
easily. Each resource in a CKAN instance can have an associated DataStore
table. The API for using the DataStore is outlined below.
Making a DataStore API Request
Making a DataStore API request is the same as making an Action API request: you
post a JSON dictionary in an HTTP POST request to an API URL, and the API also
returns its response in a JSON dictionary. See the The CKAN API for details.
API Reference
Note
Lists can always be expressed in different ways. It is possible to use lists, comma separated strings or single items. These are valid lists: ['foo', 'bar']
, 'foo, bar'
, "foo", "bar"
and 'foo'
. Additionally, there are several ways to define a boolean value. True
, on
and 1
are all vaid boolean values.
Download resource as CSV
A DataStore resource can be downloaded in the CSV file format from {CKAN-URL}/datastore/dump/{RESOURCE-ID}
.
Fields
Fields define the column names and the type of the data in a column. A field is defined as follows:
{
"id": # a string which defines the column name
"type": # the data type for the column
}
Field types are optional and will be guessed by the DataStore from the provided data. However, setting the types ensures that future inserts will not fail because of wrong types. See Field types for details on which types are valid.
Example:
[
{
"id": "foo",
"type": "int4"
},
{
"id": "bar"
# type is optional
}
]
Records
A record is the data to be inserted in a table and is defined as follows:
{
"<id>": # data to be set
# .. more data
}
Example:
[
{
"foo": 100,
"bar": "Here's some text"
},
{
"foo": 42
}
]
Field types
The DataStore supports all types supported by PostgreSQL as well as a few additions. A list of the PostgreSQL types can be found in the type section of the documentation. Below you can find a list of the most common data types. The json
type has been added as a storage for nested data.
In addition to the listed types below, you can also use array types. They are defines by prepending a _
or appending []
or [n]
where n denotes the length of the array. An arbitrarily long array of integers would be defined as int[]
.
- text
- Arbitrary text data, e.g.
Here's some text
.
- json
- Arbitrary nested json data, e.g
{"foo": 42, "bar": [1, 2, 3]}
.
Please note that this type is a custom type that is wrapped by the DataStore.
- date
- Date without time, e.g
2012-5-25
.
- time
- Time without date, e.g
12:42
.
- timestamp
- Date and time, e.g
2012-10-01T02:43Z
.
- int
- Integer numbers, e.g
42
, 7
.
- float
- Floats, e.g.
1.61803
.
- bool
- Boolean values, e.g.
true
, 0
You can find more information about the formatting of dates in the date/time types section of the PostgreSQL documentation.
Resource aliases
A resource in the DataStore can have multiple aliases that are easier to remember than the resource id. Aliases can be created and edited with the datastore_create()
API endpoint. All aliases can be found in a special view called _table_metadata
. See Internal structure of the database for full reference.
HTSQL Support
The ckanext-htsql extension adds an API action that allows a user to search data in a resource using the HTSQL query expression language. Please refer to the extension documentation to know more.
Comparison of different querying methods
The DataStore supports querying with multiple API endpoints. They are similar but support different features. The following list gives an overview of the different methods.
|
datastore_search() |
datastore_search_sql() |
HTSQL |
Ease of use |
Easy |
Complex |
Medium |
Flexibility |
Low |
High |
Medium |
Query language |
Custom (JSON) |
SQL |
HTSQL |
Join resources |
No |
Yes |
No |
Internal structure of the database
The DataStore is a thin layer on top of a PostgreSQL database. Each DataStore resource belongs to a CKAN resource. The name of a table in the DataStore is always the resource id of the CKAN resource for the data.
As explained in Resource aliases, a resource can have mnemonic aliases which are stored as views in the database.
All aliases (views) and resources (tables respectively relations) of the DataStore can be found in a special view called _table_metadata
. To access the list, open http://{YOUR-CKAN-INSTALLATION}/api/3/action/datastore_search?resource_id=_table_metadata
.
_table_metadata
has the following fields:
- _id
- Unique key of the relation in
_table_metadata
.
- alias_of
- Name of a relation that this alias point to. This field is
null
iff the name is not an alias.
- name
- Contains the name of the alias if alias_of is not null. Otherwise, this is the resource id of the CKAN resource for the DataStore resource.
- oid
- The PostgreSQL object ID of the table that belongs to name.
Often, one wants data that is added to CKAN (whether it is linked to or
uploaded to the FileStore) to be automatically added to the
DataStore. This requires some processing, to extract the data from your files
and to add it to the DataStore in the format the DataStore can handle.
This task of automatically parsing and then adding data to the DataStore is
performed by a DataStorer, a queue process that runs asynchronously and can be
triggered by uploads or other activities. The DataStorer is an extension and can
be found, along with installation instructions, at: https://github.com/okfn/ckanext-datastorer
Apps & Ideas
Since 1.7 CKAN has a feature called Apps & Ideas which allows users to provide information on apps, ideas, visualizations, articles etc that are related to a specific dataset. Once created these items will be shown against the dataset but also shown on the apps dashboard which will allow users to filter the results based on popularity, or type, or the data when the items were created.
This feature is enabled by default but can be disabled using the ckan.dataset.show_apps_ideas setting to hide the tab on the dataset pages.
Tag Vocabularies
CKAN sites can have tag vocabularies, which are a way of grouping related
tags together into custom fields.
For example, if you were making a site for music datasets. you might use a tag
vocabulary to add two fields Genre and Composer to your site’s datasets,
where each dataset can have one of the values Avant-Garde, Country or
Jazz in its genre field, and one of the values Beethoven, Wagner, or
Tchaikovsky in its composer field. In this example, genre and composer would
be vocabularies and the values would be tags:
- Vocabulary: Genre
- Tag: Avant-Garde
- Tag: Country
- Tag: Jazz
- Vocabulary: Composer
- Tag: Beethoven
- Tag: Wagner
- Tag: Tchaikovsky
Ofcourse, you could just add Avant-Garde, Beethoven, etc. to datasets as normal
CKAN tags, but using tag vocabularies lets you define Avant-Garde, Country and
Jazz as genres and Beethoven, Wagner and Tchaikovsky as composers, and lets you
enforce restrictions such as that each dataset must have a genre and a
composer, and that no dataset can have two genres or two composers, etc.
Another example use-case for tag vocabularies would be to add a Country Code
field to datasets defining the geographical coverage of the dataset, where each
dataset is assigned a country code such as en, fr, de, etc. See
ckanext/example_idatasetform
for a working example implementation of
country codes as a tag vocabulary.
Properties of Tag Vocabularies
- A CKAN website can have any number of vocabularies.
- Each vocabulary has an ID and name.
- Each tag either belongs to a vocabulary, or can be a free tag that doesn’t
belong to any vocabulary (i.e. a normal CKAN tag).
- A dataset can have more than one tag from the same vocabulary, and can have tags from more than one vocabulary.
Using Vocabularies
To add a tag vocabulary to a site, a CKAN sysadmin must:
- Call the
vocabulary_create()
action of the CKAN API to create the
vocabulary and tags. See The CKAN API.
- Implement an
IDatasetForm
plugin to add a new field for the tag
vocabulary to the dataset schema. See Writing Extensions.
- Provide custom dataset templates to display the new field to users when
adding, updating or viewing datasets in the CKAN web interface.
See Theming.
See ckanext/example_idatasetform
for a working example of these steps.
Linked Data and RDF
CKAN has extensive support for linked data and RDF. In particular, there is
complete and functional mapping of the CKAN dataset schema to linked data
formats.
Enabling and Configuring Linked Data Support
In CKAN <= 1.6 please install the RDF extension: https://github.com/okfn/ckanext-rdf
In CKAN >= 1.7, basic RDF support will be available directly in core.
Configuration
When using the built-in RDF support (CKAN >= 1.7) there is no configuration required. By default requests for RDF data will return the RDF generated from the built-in ‘packages/read.rdf’ template, which can be overridden using the extra-templates directive.
Accessing Linked Data
To access linked data versions just access the The CKAN API in the usual way but
set the Accept header to the format you would like to be returned. For
example:
curl -L -H "Accept: application/rdf+xml" http://thedatahub.org/dataset/gold-prices
curl -L -H "Accept: text/n3" http://thedatahub.org/dataset/gold-prices
An alternative method of retrieving the data is to add .rdf to the name of the dataset to download:
curl -L http://thedatahub.org/dataset/gold-prices.rdf
curl -L http://thedatahub.org/dataset/gold-prices.n3
Schema Mapping
There are various vocabularies that can be used for describing datasets:
- Dublin core: these are the most well-known and basic. Dublin core terms includes the class dct:Dataset.
- DCAT - vocabulary for catalogues of datasets
- VoID - vocabulary of interlinked datasets. Specifically designed for describing rdf datasets. Perfect except for the fact that it is focused on RDF
- SCOVO: this is more oriented to statistical datasets but has a scovo:Dataset class.
At the present CKAN uses mostly DCAT and Dublin Core.
An example schema might look like:
<rdf:RDF xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dcat="http://www.w3.org/ns/dcat#"
xmlns:dct="http://purl.org/dc/terms/">
<dcat:Dataset rdf:about="http://127.0.0.1:5000/dataset/worldwide-shark-attacks">
<owl:sameAs rdf:resource="urn:uuid:424bdc8c-038d-4b44-8f1d-01227e920b69"></owl:sameAs>
<dct:description>Shark attacks worldwide</dct:description>
<dcat:keyword>sharks</dcat:keyword>
<dcat:keyword>worldwide</dcat:keyword>
<foaf:homepage rdf:resource="http://127.0.0.1:5000/dataset/worldwide-shark-attacks"></foaf:homepage>
<rdfs:label>worldwide-shark-attacks</rdfs:label>
<dct:identifier>worldwide-shark-attacks</dct:identifier>
<dct:title>Worldwide Shark Attacks</dct:title>
<dcat:distribution>
<dcat:Distribution>
<dcat:accessURL rdf:resource="https://api.scraperwiki.com/api/1.0/datastore/sqlite?format=csv&name=worldwide_shark_attacks&query=select+*+from+`Europe`&apikey="></dcat:accessURL>
</dcat:Distribution>
</dcat:distribution>
<dcat:distribution>
<dcat:Distribution>
<dcat:accessURL rdf:resource="https://api.scraperwiki.com/api/1.0/datastore/sqlite?format=csv&name=worldwide_shark_attacks&query=select+*+from+`Australia`&apikey="></dcat:accessURL>
</dcat:Distribution>
</dcat:distribution>
<dct:creator>
<rdf:Description>
<foaf:name>Ross</foaf:name>
<foaf:mbox rdf:resource="mailto:ross.jones@okfn.org"></foaf:mbox>
</rdf:Description>
</dct:creator>
<dct:contributor>
<rdf:Description>
<foaf:name>Ross</foaf:name>
<foaf:mbox rdf:resource="mailto:ross.jones@okfn.org"></foaf:mbox>
</rdf:Description>
</dct:contributor>
<dct:rights rdf:resource="http://www.opendefinition.org/licenses/odc-pddl"></dct:rights>
</dcat:Dataset>
</rdf:RDF>
Background Tasks
CKAN allows you to create tasks that run in the ‘background’, that is
asynchronously and without blocking the main application (these tasks can also
be automatically retried in the case of transient failures). Such tasks can be
created in Extensions or in core CKAN.
Background tasks can be essential to providing certain kinds of functionality,
for example:
- Creating webhooks that notify other services when certain changes occur (for
example a dataset is updated)
- Performing processing or validation or on data (as done by the Archiver and
DataStorer Extensions)
Enabling Background Tasks
To manage and run background tasks requires a job queue and CKAN uses celery
(plus the CKAN database) for this purpose. Thus, to use background tasks you
need to install and run celery.
Installation of celery will normally be taken care of by whichever component
or extension utilizes it so we skip that here.
To run the celery daemon you have two options:
In development setup you can just use paster. This can be done as simply
as:
This only works if you have a development.ini
file in ckan root.
In production, the daemon should be run with a different ini file and be run
as an init script. The simplest way to do this is to install supervisor:
apt-get install supervisor
Using this file as a template and copy to /etc/supservisor/conf.d
:
https://github.com/okfn/ckan/blob/master/ckan/config/celery-supervisor.conf
Alternatively, you can run:
paster celeryd --config=/path/to/file.ini
Writing Background Tasks
These instructions should show you how to write an background task and how to
call it from inside CKAN or another extension using celery.
Examples
Here are some existing real examples of writing CKAN tasks:
Setup
An entry point is required inside the setup.py
for your extension, and so
you should add something resembling the following that points to a function in
a module. In this case the function is called task_imports in the
ckanext.NAME.celery_import
module:
entry_points = """
[ckan.celery_task]
tasks = ckanext.NAME.celery_import:task_imports
"""
The function, in this case task_imports
should be a function that returns
fully qualified module paths to modules that contain the defined task (see the
next section). In this case we will put all of our tasks in a file called
tasks.py
and so task_imports
should be in a file called
ckanext/NAME/celery_import.py
:
def task_imports():
return ['ckanext.NAME.tasks']
This returns an iterable of all of the places to look to find tasks, in this
example we are only putting them in one place.
Implementing the tasks
The most straightforward way of defining tasks in our tasks.py
module, is
to use the decorators provided by celery. These decorators make it easy to just
define a function and then give it a name and make it accessible to celery.
Make sure you import celery from ckan.lib.celery_app:
from ckan.lib.celery_app import celery
Implement your function, specifying the arguments you wish it to take. For our
sample we will use a simple echo task that will print out its argument to the
console:
def echo( message ):
print message
Next it is important to decorate your function with the celery task decorator.
You should give the task a name, which is used later on when calling the task:
@celery.task(name = "NAME.echofunction")
def echo( message ):
print message
That’s it, your function is ready to be run asynchronously outside of the main
execution of the CKAN app. Next you should make sure you run python setup.py
develop
in your extensions folder and then go to your CKAN installation
folder (normally pyenv/src/ckan/) to run the following command:
Once you have done this your task name NAME.echofunction
should appear in
the list of tasks loaded. If it is there then you are all set and ready to go.
If not then you should try the following to try and resolve the problem:
- Make sure the entry point is defined correctly in your
setup.py
and that
you have executed python setup.py develop
- Check that your task_imports function returns an iterable with valid module
names in
- Ensure that the decorator marks the functions (if there is more than one
decorator, make sure the celery.task is the first one - which means it will
execute last).
- If none of the above helps, go into #ckan on irc.freenode.net where there
should be people who can help you resolve your issue.
Calling the task
Now that the task is defined, and has been loaded by celery it is ready to be
called. To call a background task you need to know only the name of the task,
and the arguments that it expects as well as providing it a task id.:
import uuid
from ckan.lib.celery_app import celery
celery.send_task("NAME.echofunction", args=["Hello World"], task_id=str(uuid.uuid4()))
After executing this code you should see the message printed in the console
where you ran paster celeryd
.
Retrying on errors
Should your task fail to complete because of a transient error, it is possible
to ask celery to retry the task, after some period of time. The default wait
before retrying is three minutes, but you can optionally specify this in the
call to retry via the countdown parameter, and you can also specify the
exception that triggered the failure. For our example the call to retry would
look like the following - note that it calls the function name, not the task
name given in the decorator:
try:
... some work that may fail, http request?
except Exception, e:
# Retry again in 2 minutes
echo.retry(args=(message), exc=e, countdown=120, max_retries=10)
If you don’t want to wait a period of time you can use the eta datetime
parameter to specify an explicit time to run the task (i.e. 9AM tomorrow)
Email Notifications
CKAN can send email notifications to users, for example when a user has new
activities on her dashboard. Once email notifications have been enabled by a
site admin, each user of a CKAN site can turn email notifications on or off for
herself by logging in and editing her user preferences. To enable email
notifications for a CKAN site, a sysadmin must:
Setup a cron job or other scheduled job on a server to call CKAN’s
send_email_notifications
API action at regular intervals (e.g. hourly)
and send any pending email notifications to users.
On most UNIX systems you can setup a cron job by running crontab -e
in a
shell to edit your crontab file, and adding a line to the file to specify
the new job. For more information run man crontab
in a shell.
CKAN API actions can be called via the paster post
command, which
simulates an HTTP-request. For example, here is a crontab line to send out
CKAN email notifications hourly:
@hourly echo '{}' | /usr/lib/ckan/bin/paster --plugin=ckan post -c /etc/ckan/production.ini /api/action/send_email_notifications > /dev/null
The @hourly
can be replaced with @daily
, @weekly
or @monthly
.
Warning
CKAN will not send email notifications for events older than the
time period specified by the ckan.email_notifications_since
config
setting (default: 2 days), so your cron job should run more frequently
than this. @hourly
and @daily
are good choices.
Note
Since send_email_notifications
is an API action, it can be called from
a machine other than the server on which CKAN is running, simply by
POSTing an HTTP request to the CKAN API (you must be a sysadmin to call
this particular API action). See The CKAN API.
CKAN will not send out any email notifications, nor show the email
notifications preference to users, unless the
ckan.activity_streams_email_notifications option is set to True
, so
put this line in the [app:main]
section of your CKAN config file:
ckan.activity_streams_email_notifications = True
Make sure that ckan.site_url is set correctly in the [app:main]
section of your CKAN configuration file. This is used to generate links in
the bodies of the notification emails. For example:
ckan.site_url = http://publicdata.eu
Make sure that smtp.mail_from is set correctly in the [app:main]
section of your CKAN configuration file. This is the email address that
CKAN’s email notifications will appear to come from. For example:
smtp.mail_from = mailman@publicdata.eu
This is combined with your ckan.site_title to form the From:
header
of the email that are sent, for example:
From: PublicData.eu <mailmain@publicdata.eu>
If you do not have an SMTP server running locally on the machine that hosts
your CKAN instance, you can change the Email Settings to send email via an
external SMTP server. For example, these settings in the [app:main]
section of your configuration file will send emails using a gmail account
(not recommended for production websites!):
smtp.server = smtp.gmail.com:587
smtp.starttls = True
smtp.user = your_username@gmail.com
smtp.password = your_gmail_password
smtp.mail_from = your_username@gmail.com
For the new configuration to take effect you need to restart the web server.
For example if your are using Apache on Ubuntu, run this command in a
shell:
sudo service apache2 reload
Page View Tracking
CKAN can track visits to pages of your site and use this tracking data to:
- Sort datasets by popularity
- Highlight popular datasets and resources
- Show view counts next to datasets and resources
- Show a list of the most popular datasets
- Export page-view data to a CSV file
Enabling Page View Tracking
To enable page view tracking:
Set ckan.tracking_enabled to true in the [app:main]
section of your
CKAN configuration file (e.g. development.ini
or production.ini
):
[app:main]
ckan.tracking_enabled = true
Save the file and restart your web server. CKAN will now record raw page
view tracking data in your CKAN database as pages are viewed.
Setup a cron job to update the tracking summary data.
For operations based on the tracking data CKAN uses a summarised version of
the data, not the raw tracking data that is recorded “live” as page views
happen. The paster tracking update
and paster search-index rebuild
commands need to be run periodicially to update this tracking summary data.
You can setup a cron job to run these commands. On most UNIX systems you can
setup a cron job by running crontab -e
in a shell to edit your crontab
file, and adding a line to the file to specify the new job. For more
information run man crontab
in a shell. For example, here is a crontab
line to update the tracking data and rebuild the search index hourly:
@hourly /usr/lib/ckan/bin/paster --plugin=ckan tracking update -c /etc/ckan/production.ini && /usr/lib/ckan/bin/paster --plugin=ckan search-index rebuild -r -c /etc/ckan/production.ini
Replace /usr/lib/ckan/bin/
with the path to the bin
directory of the
virtualenv that you’ve installed CKAN into, and replace /etc/ckan/production.ini
with the path to your CKAN configuration file.
The @hourly
can be replaced with @daily
, @weekly
or
@monthly
.
Retrieving Tracking Data
Tracking summary data for datasets and resources is available in the dataset
and resource dictionaries returned by, for example, the package_show()
API:
"tracking_summary": {
"recent": 5,
"total": 15
},
This can be used, for example, by custom templates to show the number of views
next to datasets and resources. A dataset or resource’s recent
count is
its number of views in the last 14 days, the total
count is all of its
tracked views (including recent ones).
You can also export tracking data for all datasets to a CSV file using the
paster tracking export
command. For details, run paster tracking -h
.
Note
Repeatedly visiting the same page will not increase the page’s view count!
Page view counting is limited to one view per user per page per day.
Sorting Datasets by Popularity
Once you’ve enabled page view tracking on your CKAN site, you can view datasets
most-popular-first by selecting Popular
from the Order by:
dropdown on
the dataset search page:
The datasets are sorted by their number of recent views.
You can retrieve datasets most-popular-first from the
CKAN API by passing 'sort': 'views_recent desc'
to the
package_search()
action. This could be used, for example, by a custom
template to show a list of the most popular datasets on the site’s front page.
Tip
You can also sort datasets by total views rather than recent views. Pass
'sort': 'views_total desc'
to the package_search()
API, or use the
URL /dataset?q=&sort=views_total+desc
in the web interface.
Highlighting Popular Datasets and Resources
Once you’ve enabled page view tracking on your CKAN site, popular datasets and
resources (those with more than 10 views) will be highlighted with a “popular”
badge and a tooltip showing the number of views:
Multilingual Extension
For translating CKAN’s web interface see Internationalize CKAN. In addition to user interface internationalization, a CKAN administrator can also enter translations into CKAN’s database for terms that may appear in the contents of datasets, groups or tags created by users. When a user is viewing the CKAN site, if the translation terms database contains a translation in the user’s language for the name or description of a dataset or resource, the name of a tag or group, etc. then the translated term will be shown to the user in place of the original.
Setup and Configuration
By default term translations are disabled. To enable them, you have to specify the multilingual plugins using the ckan.plugins
setting in your CKAN configuration file, for example:
# List the names of CKAN extensions to activate.
ckan.plugins = multilingual_dataset multilingual_group multilingual_tag
Of course, you won’t see any terms getting translated until you load some term translations into the database. You can do this using the term_translation_update
and term_translation_update_many
actions of the CKAN API, See The CKAN API for more details.
Loading Test Translations
If you want to quickly test the term translation feature without having to provide your own translations, you can load CKAN’s test translations into the database by running this command from your shell:
paster --plugin=ckan create-test-data translations
See Command Line Interface for more details.
Testing The Multilingual Extension
If you have a source installation of CKAN you can test the multilingual extension by running the tests located in ckanext/multilingual/tests
. You must first install the packages needed for running CKAN tests into your virtual environment, and then run this command from your shell:
nosetests --ckan ckanext/multilingual/tests
See Installing Additional Dependencies for more information.
Stats Extension
CKAN’s stats extension analyzes your CKAN database and displays several tables
and graphs with statistics about your site, including:
- Total number of datasets
- Dataset revisions per week
- Top-rated datasets
- Most-edited Datasets
- Largest groups
- Top tags
- Users owning most datasets
Enabling the Stats Extension
To enable the stats extensions add stats
to the ckan.plugins option
in your CKAN config file, for example:
If you also set the ckanext.stats.cache_enabled option to true
, CKAN
will cache the stats for one day instead of calculating them each time a user
visits the stats page.
Viewing the Statistics
To view the statistics reported by the stats extension, visit the /stats
page, for example: http://demo.ckan.org/stats