Customizing dataset and resource metadata fields using IDatasetForm
Storing additional metadata for a dataset beyond the default metadata in CKAN is a common use case. CKAN provides a simple way to do this by allowing you to store arbitrary key/value pairs against a dataset when creating or updating the dataset. These appear under the “Additional Information” section on the web interface and in ‘extras’ field of the JSON when accessed via the API.
Default extras can only take strings for their keys and values, no validation is applied to the inputs and you cannot make them mandatory or restrict the possible values to a defined list. By using CKAN’s IDatasetForm plugin interface, a CKAN plugin can add custom, first-class metadata fields to CKAN datasets, and can do custom validation of these fields.
Warning
In most cases users should use ckanext-scheming rather than the low level interfaces described in this tutorial. The ckanext-scheming extension allows:
Metadata schema configuration using a YAML or JSON schema description
Automatic conversion of custom fields to the internal representation used by CKAN
Automatic use of relevant template snippets according to the field type for editing and display
Use of may pre-configured presets for multiple choice fields, dates, repeating subfields, etc.
See also
In this tutorial we are assuming that you have read the Writing extensions tutorial.
CKAN schemas and validation
When a dataset is created, updated or viewed, the parameters passed to CKAN (e.g. via the web form when creating or updating a dataset, or posted to an API end point) are validated against a schema. For each parameter, the schema will contain a corresponding list of functions that will be run against the value of the parameter. Generally these functions are used to validate the value (and raise an error if the value fails validation) or convert the value to a different value.
For example, the schemas can allow optional values by using the
ignore_missing()
validator or check that a
dataset exists using package_id_exists()
. A list
of available validators can be found at the Validator functions reference.
You can also define your own Custom validators.
We will be customizing these schemas to add our additional fields. The
IDatasetForm
interface allows us to
override the schemas for creation, updating and displaying of datasets.
Return the schema for validating new dataset dicts. |
|
Return the schema for validating updated dataset dicts. |
|
Return a schema to validate datasets before they're shown to the user. |
|
Return |
|
Return an iterable of dataset (package) types that this plugin handles. |
CKAN allows you to have multiple IDatasetForm plugins, each handling different dataset types. So you could customize the CKAN web front end, for different types of datasets. In this tutorial we will be defining our plugin as the fallback plugin. This plugin is used if no other IDatasetForm plugin is found that handles that dataset type.
The IDatasetForm also has other additional functions that allow you to provide a custom template to be rendered for the CKAN frontend, but we will not be using them for this tutorial.
Adding custom fields to datasets
Create a new plugin named ckanext-extrafields
and create a class named
ExampleIDatasetFormPlugins
inside
ckanext-extrafields/ckanext/extrafields/plugin.py
that implements the
IDatasetForm
interface and inherits from SingletonPlugin
and
DefaultDatasetForm
.
# encoding: utf-8
from __future__ import annotations
from ckan.types import Schema
import ckan.plugins as p
import ckan.plugins.toolkit as tk
class ExampleIDatasetFormPlugin(tk.DefaultDatasetForm, p.SingletonPlugin):
p.implements(p.IDatasetForm)
Updating the CKAN schema
The create_package_schema()
function is used whenever a new dataset is created, we’ll want update the
default schema and insert our custom field here. We will fetch the default
schema defined in
default_create_package_schema()
by running
create_package_schema()
’s
super function and update it.
def create_package_schema(self) -> Schema:
# let's grab the default schema in our plugin
schema: Schema = super(
ExampleIDatasetFormPlugin, self).create_package_schema()
# our custom field
schema.update({
'custom_text': [tk.get_validator('ignore_missing'),
tk.get_converter('convert_to_extras')]
})
return schema
The CKAN schema is a dictionary where the key is the name of the field and the
value is a list of validators and converters. Here we have a validator to tell
CKAN to not raise a validation error if the value is missing and a converter to
convert the value to and save as an extra. We will want to change the
update_package_schema()
function
with the same update code.
def update_package_schema(self) -> Schema:
schema: Schema = super(
ExampleIDatasetFormPlugin, self).update_package_schema()
# our custom field
schema.update({
'custom_text': [tk.get_validator('ignore_missing'),
tk.get_converter('convert_to_extras')]
})
return schema
The show_package_schema()
is used
when the package_show()
action is called, we
want the default_show_package_schema to be updated to include our custom field.
This time, instead of converting to an extras field, we want our field to be
converted from an extras field. So we want to use the
convert_from_extras()
converter.
def show_package_schema(self) -> Schema:
schema: Schema = super(
ExampleIDatasetFormPlugin, self).show_package_schema()
schema.update({
'custom_text': [tk.get_converter('convert_from_extras'),
tk.get_validator('ignore_missing')]
})
return schema
Dataset types
The package_types()
function
defines a list of dataset types that this plugin handles. Each dataset has a
field containing its type. Plugins can register to handle specific types of
dataset and ignore others. Since our plugin is not for any specific type of
dataset and we want our plugin to be the default handler, we update the plugin
code to contain the following:
schema: Schema = super(
ExampleIDatasetFormPlugin, self).show_package_schema()
schema.update({
'custom_text': [tk.get_converter('convert_from_extras'),
tk.get_validator('ignore_missing')]
})
return schema
def is_fallback(self):
# Return True to register this plugin as the default handler for
# package types not handled by any other IDatasetForm plugin.
return True
def package_types(self) -> list[str]:
# This plugin doesn't handle any special package types, it just
# registers itself as the default (above).
return []
Updating templates
In order for our new field to be visible on the CKAN front-end, we need to update the templates. Add an additional line to make the plugin implement the IConfigurer interface
class ExampleIDatasetFormPlugin(tk.DefaultDatasetForm, p.SingletonPlugin):
p.implements(p.IDatasetForm)
p.implements(p.IConfigurer)
This interface allows to implement a function
update_config()
that allows us
to update the CKAN config, in our case we want to add an additional location
for CKAN to look for templates. Add the following code to your plugin.
def update_config(self, config: CKANConfig):
# Add this plugin's templates dir to CKAN's extra_template_paths, so
# that CKAN will use this plugin's custom templates.
tk.add_template_directory(config, 'templates')
You will also need to add a directory under your extension directory to store
the templates. Create a directory called
ckanext-extrafields/ckanext/extrafields/templates/
and the subdirectories
ckanext-extrafields/ckanext/extrafields/templates/package/snippets/
.
We need to override a few templates in order to get our custom field rendered.
A common option when using a custom schema is to remove the default custom
field handling that allows arbitrary key/value pairs. Create a template
file in our templates directory called
package/snippets/package_metadata_fields.html
containing
{% ckan_extends %}
{# You could remove 'free extras' from the package form like this, but we keep them for this example's tests.
{% block custom_fields %}
{% endblock %}
#}
This overrides the custom_fields block with an empty block so the default CKAN custom fields form does not render.
Added in version 2.3: Starting from CKAN 2.3 you can combine free extras with custom fields
handled with convert_to_extras
and convert_from_extras
. On prior
versions you’ll always need to remove the free extras handling.
Next add a template in our template
directory called package/snippets/package_basic_fields.html
containing
{% ckan_extends %}
{% block package_basic_fields_custom %}
{{ form.input('custom_text', label=_('Custom Text'), id='field-custom_text', placeholder=_('custom text'), value=data.custom_text, error=errors.custom_text, classes=['control-medium']) }}
{% endblock %}
This adds our custom_text field to the editing form. Finally we want to display
our custom_text field on the dataset page. Add another file called
package/snippets/additional_info.html
containing
{% ckan_extends %}
{% block extras %}
{% if pkg_dict.custom_text %}
<tr>
<th scope="row" class="dataset-label">{{ _("Custom Text") }}</th>
<td class="dataset-details">{{ pkg_dict.custom_text }}</td>
</tr>
{% endif %}
{% endblock %}
This template overrides the default extras rendering on the dataset page and replaces it to just display our custom field.
You’re done! Make sure you have your plugin installed and setup as in the extension/tutorial. Then run a development server and you should now have an additional field called “Custom Text” when displaying and adding/editing a dataset.
Cleaning up the code
Before we continue further, we can clean up the
create_package_schema()
and update_package_schema()
.
There is a bit of duplication that we could remove. Replace the two functions
with:
def _modify_package_schema(self, schema: Schema) -> Schema:
schema.update({
'custom_text': [tk.get_validator('ignore_missing'),
tk.get_converter('convert_to_extras')]
})
return schema
def create_package_schema(self):
schema: Schema = super(
ExampleIDatasetFormPlugin, self).create_package_schema()
schema = self._modify_package_schema(schema)
return schema
def update_package_schema(self):
schema: Schema = super(
ExampleIDatasetFormPlugin, self).update_package_schema()
schema = self._modify_package_schema(schema)
return schema
Custom validators
You may define custom validators in your extensions and
you can share validators between extensions by registering
them with the IValidators
interface.
Any of the following objects may be used as validators as part of a custom dataset, group or organization schema. CKAN’s validation code will check for and attempt to use them in this order:
a function taking a single parameter:
validator(value)
a function taking four parameters:
validator(key, flattened_data, errors, context)
a function taking two parameters
validator(value, context)
Note
Object constructors(including str, int, etc.) and some built-in functions
cannot be used as validators. In order to use them, create a thin wrapper
which passes values into these callables and converts expected exceptions
into ckan.plugins.toolkit.Invalid
.
Example:
def int_validator(value):
try:
return int(value)
except ValueError:
raise Invalid(f"Invalid literal for integer: {value}")
validator(value)
The simplest form of validator is a callable taking a single parameter. For example:
from ckan.plugins.toolkit import Invalid
def starts_with_b(value):
if not value.startswith('b'):
raise Invalid("Doesn't start with b")
return value
The starts_with_b
validator causes a validation error for values
not starting with ‘b’.
On a web form this validation error would
appear next to the field to which the validator was applied.
return value
must be used by validators when accepting data
or the value will be converted to None. This form is useful
for converting data as well, because the return value will
replace the field value passed:
def embiggen(value):
return value.upper()
The embiggen
validator will convert values passed to all-uppercase.
validator(value, context)
Validators that need access to the database or information
about the user may be written as a callable taking two parameters.
context['session']
is the sqlalchemy session object and
context['user']
is the username of the logged-in user:
from ckan.plugins.toolkit import Invalid
def fred_only(value, context):
if value and context['user'] != 'fred':
raise Invalid('only fred may set this value')
return value
Otherwise this is the same as the single-parameter form above.
validator(key, flattened_data, errors, context)
Validators that need to access or update multiple fields may be written as a callable taking four parameters.
All fields and errors in a flattened
form are passed to the
validator. The validator must fetch values from flattened_data
and may replace values in flattened_data
. The return value
from this function is ignored.
key
is the flattened key for the field to which this validator was
applied. For example ('notes',)
for the dataset notes field or
('resources', 0, 'url')
for the url of the first resource of the dataset.
These flattened keys are the same in both the flattened_data
and errors
dicts passed.
errors
contains lists of validation errors for each field.
context
is the same value passed to the two-parameter
form above.
Note that this form can be tricky to use because some of the values in
flattened_data
will have had validators applied
but other fields won’t. You may add this type of validator to the
special schema fields '__before'
or '__after'
to have them
run before or after all the other validation takes place to avoid
the problem of working with partially-validated data.
The validator has to be registered. Example:
from ckan import plugins
class ExampleIValidatorsPlugin(plugins.SingletonPlugin):
plugins.implements(plugins.IValidators)
def get_validators(self) -> dict[str, Validator]:
return {
u'equals_fortytwo': equals_fortytwo,
u'negate': negate,
u'unicode_only': unicode_please,
}
Tag vocabularies
If you need to add a custom field where the input options are restricted to a
provided list of options, you can use tag vocabularies
Tag Vocabularies.
We will need to create our vocabulary first. By calling
vocabulary_create()
. Add a function to your plugin.py
above your plugin class.
def create_country_codes():
user = tk.get_action('get_site_user')({'ignore_auth': True}, {})
context: Context = {'user': user['name']}
try:
data = {'id': 'country_codes'}
tk.get_action('vocabulary_show')(context, data)
except tk.ObjectNotFound:
data = {'name': 'country_codes'}
vocab = tk.get_action('vocabulary_create')(context, data)
for tag in (u'uk', u'ie', u'de', u'fr', u'es'):
data: dict[str, Any] = {'name': tag, 'vocabulary_id': vocab['id']}
tk.get_action('tag_create')(context, data)
This code block is taken from the example_idatsetform plugin
.
create_country_codes
tries to fetch the vocabulary country_codes using
vocabulary_show()
. If it is not found it will
create it and iterate over the list of countries ‘uk’, ‘ie’, ‘de’, ‘fr’, ‘es’.
For each of these a vocabulary tag is created using
tag_create()
, belonging to the vocabulary
country_code
.
Although we have only defined five tags here, additional tags can be created
at any point by a sysadmin user by calling
tag_create()
using the API or action functions.
Add a second function below create_country_codes
def country_codes():
create_country_codes()
try:
tag_list = tk.get_action('tag_list')
country_codes = tag_list({}, {'vocabulary_id': 'country_codes'})
return country_codes
except tk.ObjectNotFound:
return None
country_codes will call create_country_codes
so that the country_codes
vocabulary is created if it does not exist. Then it calls
tag_list()
to return all of our vocabulary tags
together. Now we have a way of retrieving our tag vocabularies and creating
them if they do not exist. We just need our plugin to call this code.
Adding custom fields to resources
In order to customize the fields in a resource the schema for resources needs
to be modified in a similar way to the datasets. The resource schema
is nested in the dataset dict as package[‘resources’]. We modify this dict in
a similar way to the dataset schema. Change _modify_package_schema
to the
following.
def _modify_package_schema(self, schema: Schema):
# Add our custom country_code metadata field to the schema.
schema.update({
'country_code': [
tk.get_validator('ignore_missing'),
cast(
ValidatorFactory,
tk.get_converter('convert_to_tags'))('country_codes')]
})
# Add our custom_test metadata field to the schema, this one will use
# convert_to_extras instead of convert_to_tags.
schema.update({
'custom_text': [tk.get_validator('ignore_missing'),
tk.get_converter('convert_to_extras')]
})
# Add our custom_resource_text metadata field to the schema
cast(Schema, schema['resources']).update({
'custom_resource_text' : [ tk.get_validator('ignore_missing') ]
})
return schema
Update show_package_schema()
similarly
def show_package_schema(self) -> Schema:
schema: Schema = super(
ExampleIDatasetFormPlugin, self).show_package_schema()
# Don't show vocab tags mixed in with normal 'free' tags
# (e.g. on dataset pages, or on the search page)
_extras = cast("list[Validator]",
cast(Schema, schema['tags'])['__extras'])
_extras.append(tk.get_converter('free_tags_only'))
# Add our custom country_code metadata field to the schema.
schema.update({
'country_code': [
cast(
ValidatorFactory,
tk.get_converter('convert_from_tags'))('country_codes'),
tk.get_validator('ignore_missing')]
})
# Add our custom_text field to the dataset schema.
schema.update({
'custom_text': [tk.get_converter('convert_from_extras'),
tk.get_validator('ignore_missing')]
})
cast(Schema, schema['resources']).update({
'custom_resource_text' : [ tk.get_validator('ignore_missing') ]
})
return schema
Add the code below to package/snippets/resource_form.html
{% ckan_extends %}
{% block basic_fields_url %}
{{ super() }}
{{ form.input('custom_resource_text', label=_('Custom Text'), id='field-custom_resource_text', placeholder=_('custom resource text'), value=data.custom_resource_text, error=errors.custom_resource_text, classes=['control-medium']) }}
{% endblock %}
This adds our custom_resource_text to the editing form of the resources.
Save and reload your development server CKAN will take any additional keys from the resource schema and save them the its extras field. The templates will automatically check this field and display them in the resource_read page.
Sorting by custom fields on the dataset search page
Now that we’ve added our custom field, we can customize the CKAN web front end
search page to sort datasets by our custom field. Add a new file called
ckanext-extrafields/ckanext/extrafields/templates/package/search.html
containing:
{% ckan_extends %}
{% block form %}
{% set facets = {
'fields': fields_grouped,
'search': search_facets,
'titles': facet_titles,
'translated_fields': translated_fields,
'remove_field': remove_field }
%}
{% set sorting = [
(_('Relevance'), 'score desc, metadata_modified desc'),
(_('Name Ascending'), 'title_string asc'),
(_('Name Descending'), 'title_string desc'),
(_('Last Modified'), 'metadata_modified desc'),
(_('Custom Field Ascending'), 'custom_text asc'),
(_('Custom Field Descending'), 'custom_text desc')
]
%}
{% snippet 'snippets/search_form.html', type='dataset', query=q, sorting=sorting, sorting_selected=sort_by_selected, count=page.item_count, facets=facets, show_empty=request.args, error=query_error %}
{% endblock %}
This overrides the search ordering drop down code block, the code is the same as the default dataset search block but we are adding two additional lines that define the display name of that search ordering (e.g. Custom Field Ascending) and the SOLR sort ordering (e.g. custom_text asc). If you reload your development server you should be able to see these two additional sorting options on the dataset search page.
The SOLR sort ordering can define arbitrary functions for custom sorting, but this is beyond the scope of this tutorial for further details see http://wiki.apache.org/solr/CommonQueryParameters#sort and http://wiki.apache.org/solr/FunctionQuery
You can find the complete source for this tutorial at https://github.com/ckan/ckan/tree/master/ckanext/example_idatasetform