The DataStore API

The DataStore API allows tabular data to be stored inside CKAN quickly and easily. Each resource in a CKAN instance can have an associated DataStore table. The API for using the DataStore is outlined below.

Making a DataStore API Request

Making a DataStore API request is the same as making an Action API request: you post a JSON dictionary in an HTTP POST request to an API URL, and the API also returns its response in a JSON dictionary. See the The Action API for details.

API Reference

Note

Lists can always be expressed in different ways. It is possible to use lists, comma separated strings or single items. These are valid lists: ['foo', 'bar'], 'foo, bar', "foo", "bar" and 'foo'. Additionally, there are several ways to define a boolean value. True, on and 1 are all vaid boolean values.

Note

The table structure of the DataStore is explained in Internal structure of the database.

ckanext.datastore.logic.action.datastore_create(context, data_dict)

Adds a new table to the datastore.

The datastore_create action allows a user to post JSON data to be stored against a resource. This endpoint also supports altering tables, aliases and indexes and bulk insertion. This endpoint can be called multiple times to ininially insert more data, add fields, change the aliases or indexes as well as the primary keys.

See Fields and Records for details on how to lay out records.

Parameters:
  • resource_id (string) – resource id that the data is going to be stored against.
  • aliases (list or comma separated string) – names for read only aliases of the resource.
  • fields (list of dictionaries) – fields/columns and their extra metadata.
  • records (list of dictionaries) – the data, eg: [{“dob”: “2005”, “some_stuff”: [“a”, “b”]}]
  • primary_key (list or comma separated string) – fields that represent a unique key
  • indexes (list or comma separated string) – indexes on table

Please note that setting the aliases, indexes or primary_key replaces the exising aliases or constraints. Setting records appends the provided records to the resource.

Results:

Returns:The newly created data object.
Return type:dictionary

See Fields and Records for details on how to lay out records.

ckanext.datastore.logic.action.datastore_upsert(context, data_dict)

Updates or inserts into a table in the datastore

The datastore_upsert API action allows a user to add or edit records to an existing dataStore resource. In order for the upsert and update methods to work, a unique key has to be defined via the datastore_create action. The available methods are:

upsert
Update if record with same key already exists, otherwise insert. Requires unique key.
insert
Insert only. This method is faster that upsert, but will fail if any inserted record matches an existing one. Does not require a unique key.
update
Update only. An exception will occur if the key that should be updated does not exist. Requires unique key.
Parameters:
  • resource_id (string) – resource id that the data is going to be stored under.
  • records (list of dictionaries) – the data, eg: [{“dob”: “2005”, “some_stuff”: [“a”,”b”]}]
  • method (string) – the method to use to put the data into the datastore. Possible options are: upsert (default), insert, update

Results:

Returns:The modified data object.
Return type:dictionary
ckanext.datastore.logic.action.datastore_delete(context, data_dict)

Deletes a table or a set of records from the datastore.

Parameters:
  • resource_id (string) – resource id that the data will be deleted from.
  • filters (dictionary) – filters to apply before deleting (eg {“name”: “fred”}). If missing delete whole table and all dependent views.

Results:

Returns:Original filters sent.
Return type:dictionary

Search a datastore table.

The datastore_search action allows a user to search data in a resource.

Parameters:
  • resource_id (string) – id or alias of the resource to be searched against.
  • filters (dictionary) – matching conditions to select, e.g {“key1”: “a”, “key2”: “b”}
  • q (string) – full text query
  • plain (bool) – treat as plain text query (default: true)
  • language (string) – language of the full text query (default: english)
  • limit (int) – maximum number of rows to return (default: 100)
  • offset (int) – offset this number of rows
  • fields (list or comma separated string) – fields to return (default: all fields in original order)
  • sort (string) – comma separated field names with ordering e.g.: “fieldname1, fieldname2 desc”

Setting the plain flag to false enables the entire PostgreSQL full text search query language.

A listing of all available resources can be found at the alias _table_metadata.

Results:

The result of this action is a dict with the following keys:

Return type:

A dictionary with the following keys

Parameters:
  • fields (list of dictionaries) – fields/columns and their extra metadata
  • offset (int) – query offset value
  • limit (int) – query limit value
  • filters (list of dictionaries) – query filters
  • total (int) – number of total matching records
  • records (list of dictionaries) – list of matching results
ckanext.datastore.logic.action.datastore_search_sql(context, data_dict)

Execute SQL queries on the datastore.

The datastore_search_sql action allows a user to search data in a resource or connect multiple resources with join expressions. The underlying SQL engine is the PostgreSQL engine. There is an enforced timeout on SQL queries to avoid an unintended DOS.

Note

This action is only available when using PostgreSQL 9.X and using a read-only user on the database. It is not available in legacy mode.

Parameters:sql (string) – a single sql select statement

Results:

The result of this action is a dict with the following keys:

Return type:

A dictionary with the following keys

Parameters:
  • fields (list of dictionaries) – fields/columns and their extra metadata
  • records (list of dictionaries) – list of matching results

Fields

Fields define the column names and the type of the data in a column. A field is defined as follows:

{
    "id":    # a string which defines the column name
    "type":  # the data type for the column
}

Field types are optional and will be guessed by the DataStore from the provided data. However, setting the types ensures that future inserts will not fail because of wrong types. See Field types for details on which types are valid.

Example:

[
    {
        "id": "foo",
        "type": "int4"
    },
    {
        "id": "bar"
        # type is optional
    }
]

Records

A record is the data to be inserted in a table and is defined as follows:

{
    "<id>":  # data to be set
    # .. more data
}

Example:

[
    {
        "foo": 100,
        "bar": "Here's some text"
    },
    {
        "foo": 42
    }
]

Field types

The DataStore supports all types supported by PostgreSQL as well as a few additions. A list of the PostgreSQL types can be found in the type section of the documentation. Below you can find a list of the most common data types. The json type has been added as a storage for nested data.

In addition to the listed types below, you can also use array types. They are defines by prepending a _ or appending [] or [n] where n denotes the length of the array. An arbitrarily long array of integers would be defined as int[].

text
Arbitrary text data, e.g. Here's some text.
json
Arbitrary nested json data, e.g {"foo": 42, "bar": [1, 2, 3]}. Please note that this type is a custom type that is wrapped by the DataStore.
date
Date without time, e.g 2012-5-25.
time
Time without date, e.g 12:42.
timestamp
Date and time, e.g 2012-10-01T02:43Z.
int
Integer numbers, e.g 42, 7.
float
Floats, e.g. 1.61803.
bool
Boolean values, e.g. true, 0

You can find more information about the formatting of dates in the date/time types section of the PostgreSQL documentation.

Resource aliases

A resource in the DataStore can have multiple aliases that are easier to remember than the resource id. Aliases can be created and edited with the datastore_create() API endpoint. All aliases can be found in a special view called _table_metadata. See Internal structure of the database for full reference.

HTSQL Support

The ckanext-htsql extension adds an API action that allows a user to search data in a resource using the HTSQL query expression language. Please refer to the extension documentation to know more.

Comparison of different querying methods

The DataStore supports querying with multiple API endpoints. They are similar but support different features. The following list gives an overview of the different methods.

datastore_search() datastore_search_sql() HTSQL
Ease of use Easy Complex Medium
Flexibility Low High Medium
Query language Custom (JSON) SQL HTSQL
Join resources No Yes No

Internal structure of the database

The DataStore is a thin layer on top of a PostgreSQL database. Each DataStore resource belongs to a CKAN resource. The name of a table in the DataStore is always the resource id of the CKAN resource for the data.

As explained in Resource aliases, a resource can have mnemonic aliases which are stored as views in the database.

All aliases (views) and resources (tables respectively relations) of the DataStore can be found in a special view called _table_metadata. To access the list, open http://{YOUR-CKAN-INSTALLATION}/api/3/action/datastore_search?resource_id=_table_metadata.

_table_metadata has the following fields:

_id
Unique key of the relation in _table_metadata.
alias_of
Name of a relation that this alias point to. This field is null iff the name is not an alias.
name
Contains the name of the alias if alias_of is not null. Otherwise, this is the resource id of the CKAN resource for the DataStore resource.
oid
The PostgreSQL object ID of the table that belongs to name.