DCAT ↔ CKAN mapping
RDF DCAT to CKAN dataset mapping¶
The following table provides a generic mapping between the fields of the dcat:Dataset and dcat:Distribution classes and
their equivalents in the CKAN model. In most cases this mapping is deliberately a loose one. For instance, it does not try to link
the DCAT publisher property with a CKAN dataset author, maintainer or organization, as the link between them is not straight-forward
and may depend on a particular instance needs. When mapping from CKAN metadata to DCAT though, there are in some cases fallback fields
that are used if the default field is not present (see RDF Serializer for more details on this).
This mapping is compatible with DCAT-AP v1.1, v2.1 and v3 and DCAT-US v3. It depends on the active profile(s) and the fields present in your custom schema which DCAT properties are mapped.
Sites are encouraged to use ckanext-scheming to manage their metadata schema (see Schemas for all details). This changes in some cases the way metadata is stored internally and presented at the CKAN API level, but should not affect the RDF DCAT output.
Note
Fields prefixed with custom: are custom metadata fields defined via ckanext-scheming. When using euro_dcat_ap
and euro_dcat_ap_2 based profiles, these could also be actual extra fields (e.g. extras=[{"key": "issued", "value": "2024"}]).
It is recommended that site maintainers start to migrate to custom fields by using the euro_dcat_ap_scheming profile as this
fields are properly validated, can use the scheming snippets etc. See Schemas for more details.
| DCAT class | DCAT property | CKAN dataset field | CKAN fallback fields | Stored as | |
|---|---|---|---|---|---|
| dcat:Dataset | - | custom:uri | text | See URIs | |
| dcat:Dataset | dct:title | title | text | ||
| dcat:Dataset | dct:description | notes | text | ||
| dcat:Dataset | dcat:keyword | tags | text | ||
| dcat:Dataset | dcat:theme | custom:theme | list | See Lists | |
| dcat:Dataset | dct:identifier | custom:identifier | custom:guid, id | text | |
| dcat:Dataset | adms:identifier | custom:alternate_identifier | text | ||
| dcat:Dataset | dct:issued | custom:issued | metadata_created | text | |
| dcat:Dataset | dct:modified | custom:modified | metadata_modified | text | |
| dcat:Dataset | owl:versionInfo | version | custom:dcat_version | text | |
| dcat:Dataset | adms:versionNotes | custom:version_notes | text | ||
| dcat:Dataset | dct:language | custom:language | list | See Lists | |
| dcat:Dataset | dcat:landingPage | url | text | ||
| dcat:Dataset | dct:accrualPeriodicity | custom:frequency | text | ||
| dcat:Dataset | dct:conformsTo | custom:conforms_to | list | See Lists | |
| dcat:Dataset | dct:accessRights | custom:access_rights | text | ||
| dcat:Dataset | foaf:page | custom:documentation | list | See Lists | |
| dcat:Dataset | dct:provenance | custom:provenance | text | ||
| dcat:Dataset | dcat-us:liabilityStatement | custom:liability | text | DCAT-US v3 and higher only | |
| dcat:Dataset | dcat-us:purpose | custom:purpose | text | DCAT-US v3 and higher only | |
| dcat:Dataset | skos:scopeNote | custom:usage | text | DCAT-US v3 and higher only | |
| dcat:Dataset | dct:type | custom:dcat_type | text | ||
| dcat:Dataset | prov:wasGeneratedBy | custom:provenance_activity | text | ||
| dcat:Dataset | prov:qualifiedAttribution | custom:qualified_attribution | list | See Lists. Object should contain agent and role | |
| dcat:Dataset | dcat:hasVersion | custom:has_version | list | See Lists. It is assumed that these are one or more URIs referring to another dcat:Dataset | |
| dcat:Dataset | dct:isVersionOf | custom:is_version_of | list | See Lists. It is assumed that these are one or more URIs referring to another dcat:Dataset | |
| dcat:Dataset | dct:source | custom:source | list | See Lists. It is assumed that these are one or more URIs referring to another dcat:Dataset | |
| dcat:Dataset | adms:sample | custom:sample | list | See Lists. It is assumed that these are one or more URIs referring to dcat:Distribution instances | |
| dcat:Dataset | dct:spatial | custom:spatial_uri | text | See Spatial coverage | |
| dcat:Dataset | dct:temporal | custom:temporal_start + custom:temporal_end | text | None, one or both extras can be present | |
| dcat:Dataset | dcat-us:geographicBoundingBox | custom:bbox | list of objects | DCAT-US v3 and higher only | |
| dcat:Dataset | dcat-us:describedBy | custom:data_dictionary | list of objects | DCAT-US v3 and higher only | |
| dcat:Dataset | dcat:temporalResolution | custom:temporal_resolution | list | ||
| dcat:Dataset | dcat:spatialResolutionInMeters | custom:spatial_resolution_in_meters | list | ||
| dcat:Dataset | dct:isReferencedBy | custom:is_referenced_by | list | ||
| dcat:Dataset | dct:publisher | custom:publisher_uri | list of objects | See URIs and Publisher | |
| foaf:Agent | foaf:name | custom:publisher_name | text | ||
| foaf:Agent | foaf:mbox | custom:publisher_email | organization:title | text | |
| foaf:Agent | foaf:homepage | custom:publisher_url | text | ||
| foaf:Agent | dct:type | custom:publisher_type | text | ||
| foaf:Agent | dct:identifier | custom:publisher_id | text | ||
| dcat:Dataset | dct:creator | custom:creator_uri | list of objects | See URIs | |
| foaf:Agent | foaf:name | custom:creator_name | text | ||
| foaf:Agent | foaf:mbox | custom:creator_email | organization:title | text | |
| foaf:Agent | foaf:homepage | custom:creator_url | text | ||
| foaf:Agent | dct:type | custom:creator_type | text | ||
| foaf:Agent | dct:identifier | custom:creator_id | text | ||
| dcat:Dataset | dct:contributor | custom:contributor | list of objects | See URIs. The object properties are the same than publishers and creators. DCAT-US v3 and higher only | |
| dcat:Dataset | dcat:contactPoint | custom:contact_uri | list of objects | See URIs and Contact points | |
| vcard:Kind | vcard:fn | custom:contact_name | maintainer, author | text | |
| vcard:Kind | vcard:hasEmail | custom:contact_email | maintainer_email, author_email | text | |
| vcard:Kind | vcard:hasUID | custom:contact_identifier | text | ||
| dcat:Dataset | dcat:distribution | resources | text | ||
| dcat:Distribution | - | resource:uri | text | See URIs | |
| dcat:Distribution | dct:title | resource:name | text | ||
| dcat:Distribution | dcat:accessURL | resource:access_url | resource:url | text | If downloadURL is not present, accessURL will be used as resource url |
| dcat:Distribution | dcat:downloadURL | resource:download_url | text | If present, downloadURL will be used as resource url | |
| dcat:Distribution | dct:description | resource:description | text | ||
| dcat:Distribution | dcat:mediaType | resource:mimetype | text | ||
| dcat:Distribution | dct:format | resource:format | text | ||
| dcat:Distribution | dct:license | resource:license | text | See Licenses | |
| dcat:Distribution | adms:status | resource:status | text | ||
| dcat:Distribution | dcat:byteSize | resource:size | number | ||
| dcat:Distribution | dct:issued | resource:issued | created | text | |
| dcat:Distribution | dct:modified | resource:modified | metadata_modified | text | |
| dcat:Distribution | dct:rights | resource:rights | text | ||
| dcat:Distribution | foaf:page | resource:documentation | list | See Lists | |
| dcat:Distribution | dct:language | resource:language | list | See Lists | |
| dcat:Distribution | dct:conformsTo | resource:conforms_to | list | See Lists | |
| dcat:Distribution | dcatap:availability | resource:availability | text | ||
| dcat:Distribution | dcat:compressFormat | resource:compress_format | text | ||
| dcat:Distribution | dcat:packageFormat | resource:package_format | text | ||
| dcat:Distribution | cnt:characterEncoding | resource:package_format | text | DCAT-US v3 and higher only | |
| dcat:Distribution | dct:identifier | custom:identifier | custom:guid, id | text | DCAT-US v3 and higher only |
| dcat:Distribution | dcat-us:describedBy | custom:data_dictionary | list of objects | DCAT-US v3 and higher only | |
| dcat:Distribution | dcat:accessService | resource:access_services | list of objects |
| dcat:Catalog | foaf:homepage | custom:catalog_homepage | | text | |
| dcat:DataService | dct:conformsTo | access_service:conforms_to | | list | See Lists | | dcat:DataService | dct:format | access_service:format | | text | | | dcat:DataService | dct:identifier | access_service:identifier | | text | | | dcat:DataService | dct:language | access_service:language | | list | See Lists | | dcat:DataService | dct:rights | access_service:rights | | text | | | dcat:DataService | dcat:landingPage | access_service:landing_page | | text | | | dcat:DataService | dcat:keyword | access_service:keyword | | list | See Lists | | dcat:DataService | dct:title | access_service:title | | text | | | dcat:DataService | dcat:endpointURL | access_service:endpoint_url | | list | | | dcat:DataService | dcat:endpointDescription | access_service:endpoint_description | | text | | | dcat:DataService | dcatap:availability | access_service:availability | | text | | | dcat:DataService | dcat:servesDataset | access_service:serves_dataset | | list | | | dcat:DataService | dct:description | access_service:description | | text | | | dcat:DataService | dct:license | access_service:license | | text | | | dcat:DataService | dct:accessRights | access_service:access_rights | | text | | | spdx:Checksum | spdx:checksumValue | resource:hash | | text | | | spdx:Checksum | spdx:algorithm | resource:hash_algorithm | | text | |
Custom fields¶
Fields marked as custom: are stored as free form extras in the euro_dcat_ap and euro_dcat_ap_2 profiles,
but stored as first level custom fields when using the scheming based profile (euro_dcat_ap_scheming), i.e:
{
"name": "test_dataset_dcat",
"extras": [
{"key": "version_notes", "value": "Some version notes"}
]
}
vs:
URIs¶
Whenever possible, URIs are extracted and stored so there is a clear reference to the original RDF resource. For instance:
<?xml version="1.0" encoding="utf-8" ?>
<rdf:RDF
xmlns:dct="http://purl.org/dc/terms/"
xmlns:dcat="http://www.w3.org/ns/dcat#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<dcat:Dataset rdf:about="http://data.some.org/catalog/datasets/1">
<dct:title>Dataset 1</dct:title>
<dct:publisher>
<foaf:Organization rdf:about="http://orgs.vocab.org/some-org">
<foaf:name>Publishing Organization for dataset 1</foaf:name>
</foaf:Organization>
</dct:publisher>
<!-- ... -->
</dcat:Dataset>
</rdf:RDF>
{
"title": "Dataset 1",
"extras": [
{"key": "uri", "value": "http://data.some.org/catalog/datasets/1"},
{"key": "publisher_uri", "value": "http://orgs.vocab.org/some-org"},
{"key": "publisher_name", "value": "Publishing Organization for dataset 1"}
]
}
Another example:
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<http://data.some.org/catalog/datasets/1>
a dcat:Dataset ;
dct:title "Dataset 1" ;
dcat:distribution
<http://data.some.org/catalog/datasets/1/d/1> .
<http://data.some.org/catalog/datasets/1/d/1>
a dcat:Distribution ;
dct:title "Distribution for dataset 1" ;
dcat:accessURL <http://data.some.org/catalog/datasets/1/downloads/1.csv> .
{
"title": "Dataset 1",
"extras": [
{"key": "uri", "value": "http://data.some.org/catalog/datasets/1"}
],
"resources": [{
"name": "Distribution for dataset 1",
"url": "http://data.some.org/catalog/datasets/1/downloads/1.csv",
"uri": "http://data.some.org/catalog/datasets/1/d/1"
}]
}
Lists¶
On the legacy profiles, lists are stored as a JSON string, eg:
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<http://example.com/data/test-dataset-1>
a dcat:Dataset ;
dct:title "Dataset 1" ;
dct:language "ca" , "en" , "es" ;
dcat:theme "http://eurovoc.europa.eu/100142" , "http://eurovoc.europa.eu/209065", "Earth Sciences" ;
{
"title": "Dataset 1",
"extras": [
{"key": "uri", "value": "http://data.some.org/catalog/datasets/1"},
{"key": "language", "value": "[\"ca\", \"en\", \"es\"]"},
{"key": "theme", "value": "[\"Earth Sciences\", \"http://eurovoc.europa.eu/209065\", \"http://eurovoc.europa.eu/100142\"]"}
]
}
On the scheming-based ones, these are shown as actual lists:
{
"title": "Dataset 1",
"uri": "http://data.some.org/catalog/datasets/1"},
"language": ["ca", "en", "es"],
"theme": ["Earth Sciences", "http://eurovoc.europa.eu/209065", "http://eurovoc.europa.eu/100142"]
}
Contact points and Publisher¶
Properties for dcat:contactPoint and dct:publisher are stored as namespaced extras in the legacy profiles. When using
a scheming-based profile, these are stored as proper objects (and multiple instances are allowed for contact point):
{
"name": "test_dataset_dcat",
"title": "Test dataset DCAT",
"extras": [
{"key":"contact_name","value":"PointofContact"},
{"key":"contact_email","value":"contact@some.org"}
],
}
vs:
{
"name": "test_dataset_dcat",
"title": "Test dataset DCAT",
"contact": [
{
"name": "Point of Contact 1",
"email": "contact1@some.org"
},
{
"name": "Point of Contact 2",
"email": "contact2@some.org"
},
]
}
If no publisher or publisher_* fields are found, the serializers will fall back to getting the publisher properties from the organization the CKAN dataset belongs to. The organization schema can be customized with the schema located in ckanext/dcat/schemas/publisher_organization.yaml to provide the extra properties supported (this will additionally require loading the scheming_organizations plugin in ckan.plugins).
Spatial coverage¶
The following formats for dct:spatial are supported by the default parser. Note that the default serializer will return the single dct:spatial instance form by default.
-
One
dct:spatialinstance, URI only -
One
dct:spatialinstance with text (this should not be used anyway) -
One
dct:spatialinstance with label and/or geometry<dct:spatial rdf:resource="http://geonames/Newark"> <dct:Location> <locn:geometry rdf:datatype="https://www.iana.org/assignments/media-types/application/vnd.geo+json"> {"type": "Polygon", "coordinates": [[[175.0, 17.5], [-65.5, 17.5], [-65.5, 72.0], [175.0, 72.0], [175.0, 17.5]]]} </locn:geometry> <locn:geometry rdf:datatype="http://www.opengis.net/ont/geosparql#wktLiteral"> POLYGON ((175.0000 17.5000, -65.5000 17.5000, -65.5000 72.0000, 175.0000 72.0000, 175.0000 17.5000)) </locn:geometry> <skos:prefLabel>Newark</skos:prefLabel> </dct:Location> </dct:spatial> -
Multiple
dct:spatialinstances (as in GeoDCAT-AP)<dct:spatial rdf:resource="http://geonames/Newark"/> <dct:spatial> <dct:Location> <locn:geometry rdf:datatype="https://www.iana.org/assignments/media-types/application/vnd.geo+json"> {"type": "Polygon", "coordinates": [[[175.0, 17.5], [-65.5, 17.5], [-65.5, 72.0], [175.0, 72.0], [175.0, 17.5]]]} </locn:geometry> <locn:geometry rdf:datatype="http://www.opengis.net/ont/geosparql#wktLiteral"> POLYGON ((175.0000 17.5000, -65.5000 17.5000, -65.5000 72.0000, 175.0000 72.0000, 175.0000 17.5000)) </locn:geometry> </dct:Location> </dct:spatial> <dct:spatial> <dct:Location rdf:nodeID="N8c2a57d92e2d48fca3883053f992f0cf"> <skos:prefLabel>Newark</skos:prefLabel> </dct:Location> </dct:spatial>
If the RDF provides them, profiles should store the textual and geometric representation of the location in:
- For legacy profiles in
spatial_text,spatial_bbox,spatial_centroidorspatial(for any other geometries) extra fields - For scheming-based profiles in objects in the
spatial_coveragefield, for instance:
{
"name": "test_dataset_dcat",
"title": "Test dataset DCAT",
"spatial_coverage": [
{
"geom": {
"type": "Polygon",
"coordinates": [...]
},
"text": "Tarragona",
"uri": "https://sws.geonames.org/6361390/",
"bbox": {
"type": "Polygon",
"coordinates": [
[
[-2.1604, 42.7611],
[-2.0938, 42.7611],
[-2.0938, 42.7931],
[-2.1604, 42.7931],
[-2.1604, 42.7611],
]
],
},
"centroid": {"type": "Point", "coordinates": [1.26639, 41.12386]},
}
]
}
Licenses¶
In the CKAN model, the license field is stored at the dataset level whereas in the DCAT model it
is stored at Distributions level. By default, the RDF parser will try to find a
distribution with a license that matches one of those registered in CKAN
and attach this license to the dataset. The first matching distribution's
license is used, meaning that any discrepancy accross distributions license
will not be accounted for. This behavior can be customized by overridding the
_license() method on a custom profile.
When serializing, distributions can inherit the license from the dataset
if ckanext.dcat.resource.inherit.license is set to true.