FileStore and file uploads
When enabled, CKAN’s FileStore allows users to upload data files to CKAN resources, and to upload logo images for groups and organizations. Users will see an upload button when creating or updating a resource, group or organization.
Added in version 2.12: Add support for configurable storages. Cloud adapters are available in ckanext-file-keeper-cloud.
See also
Resource files linked-to from CKAN or uploaded to CKAN’s FileStore can also be pushed into CKAN’s DataStore, which then enables data previews and a data API for the resources.
Setup file uploads
Attention
This is a classic way to setup filestore. Eventually it will be replaced with the new approach described in Setup file storages
To setup CKAN’s FileStore with local file storage:
Create the directory where CKAN will store uploaded files:
sudo mkdir -p /var/lib/ckan/default
Set the permissions of your ckan.storage_path directory. For example if you’re running CKAN with Nginx, then the Nginx’s user (
www-dataon Ubuntu) must have read, write and execute permissions for the ckan.storage_path:sudo chown www-data /var/lib/ckan/default sudo chmod u+rwx /var/lib/ckan/default
Add the following lines to your CKAN config file, after the
[app:main]line:ckan.storage_path = /var/lib/ckan/default
Restart your web server, for example to restart uWSGI on a package install:
sudo supervisorctl restart ckan-uwsgi:*
Setup file storages
Starting from CKAN 2.12 there is an alternative way to configure FileStore. Instead of using a single directory in the local filesystem it’s possible to configure a storage object that can keep files either in local filesystem, or on cloud, or in database, or somewhere else, depending on the configuration and installed storage adapters.
CKAN is shipped with local filesystem adapter and it will be used for the following example:
Just as in previous section, create a directory for uploaded files and set up correct permissions for it:
sudo mkdir -p /var/lib/ckan/default sudo chown www-data /var/lib/ckan/default sudo chmod u+rwx /var/lib/ckan/default
Add storage configuration that points to the specified directory. As it will store files in local filesystem, use
ckan:fsadapter:ckan.files.storage.default.type = ckan:fs ckan.files.storage.default.path = /var/lib/ckan/default
Restart your web server, for example to restart uWSGI on a package install:
sudo supervisorctl restart ckan-uwsgi:*
Note
You probably noticed, that this version is almost identical to the configuration from the previous section. It’s absolutely correct: storages were designed as a replacement for the previous implementation of filestore and that’s why migration happens with the minimal friction.
The key part of the storage configuration is
ckan.files.storage.default.type option. It specifies the adapter that
CKAN will use for this storage. By replacing ckan:fs with other
adapters, one can start using different types of persistence engine with no
code changes. It’s even possible to use multiple different storage
simultaneously.
Tip
Even though CKAN does not have built-in cloud adapter, it’s still available inside ckanext-file-keeper-cloud. Check documentation of this extension for detailed explanation of installation and usage.
Here’s an example of setting up the storage using AWS S3 bucket:
Create AWS S3 bucket. This example assumes that the bucket has name
ckan_bucket.Install ckanext-file-keeper-cloud:
pip install 'ckanext-file-keeper-cloud[s3]'
Add
file_keeper_cloudto the list of enabled plugins:ckan.plugins = file_keeper_cloud
Configure the storage using
ckan:s3adapter:ckan.files.storage.default.type = ckan:s3 ckan.files.storage.default.bucket = ckan_bucket # specify region of the bucket or leave empty for default value ckan.files.storage.default.region = us-east-1
Export key and secret as environment variables. These values can be specified in the config file as well, but this is not secure. Fortunately,
ckan:s3adapter will checkAWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYenvironment variables and, if they are defined, their value will be used for the storage. In addition, when these variables are empty, CKAN will also check~/.aws/credentialsand IAM Role available on the host machine. We will use environment variable for the example, so there is no need to set credentials in the config:export AWS_ACCESS_KEY_ID=my-aws-key export AWS_SECRET_ACCESS_KEY=my-aws-secret
Restart CKAN
Note, this example only shows the way to configure cloud storage, but it’s not suitable for the real world. If you leave things as is, the given storage will be used for resource uploads and for user, group and organization images. The former are private and latter are public. If the bucket is private, organization images and user avatars will not be shown, even though they will be uploaded to the bucket. If the bucket is public, resources are available to everyone without any permission checks.
This problem can be solved by configuring separate storage for resources and public images. Continue reading this documentation to find the details.
FileStore API
Files can be uploaded to the FileStore using the
resource_create() and
resource_update() action API
functions. You can post multipart/form-data to the API and the key, value
pairs will be treated as if they are a JSON object.
The extra key upload is used to actually post the binary data.
For example, to create a new CKAN resource and upload a file to it using curl:
curl -H'Authorization: your-api-key' 'http://yourhost/api/action/resource_create' \
--form upload=@filetoupload --form package_id=my_dataset
(Curl automatically sends a multipart-form-data heading with you use the
--form option.)
To create a new resource and upload a file to it using the Python library requests:
import requests
requests.post('http://0.0.0.0:5000/api/action/resource_create',
data={"package_id":"my_dataset"},
headers={"Authorization": "21a47217-6d7b-49c5-88f9-72ebd5a4d4bb"},
files=[('upload', open('/path/to/file/to/upload.csv', 'rb'))])
(Requests automatically sends a multipart-form-data heading when you use the
files= parameter.)
To overwrite an uploaded file with a new version of the file, post to the
resource_update() action and use the
upload field:
curl -H'Authorization: your-api-key' 'http://yourhost/api/action/resource_update' \
--form upload=@newfiletoupload --form id=resourceid
To replace an uploaded file with a link to a file at a remote URL, use the
clear_upload field:
curl -H'Authorization: your-api-key' 'http://yourhost/api/action/resource_update' \
--form url=http://expample.com --form clear_upload=true --form id=resourceid
Custom Internet media types (MIME types)
To detect the media type of an uploaded file, depending on the value of ckan.mimetype_guess config option, CKAN uses either the default Python library mimetypes or python-magic.
If some particular format is not included in the ones guessed by the
mimetypes library, a default application/octet-stream value will be
returned. Users can still register a more appropriate media type by using the
mimetypes library. A good way to do so is to use the IConfigurer
interface so the custom types get registered on startup:
import mimetypes
import ckan.plugins as p
class MyPlugin(p.SingletonPlugin):
p.implements(p.IConfigurer)
def update_config(self, config):
mimetypes.add_type('application/json', '.geojson')
# ...
Using configured storages
Note
CKAN is shipped only with filesystem adapter. Adapters for cloud providers are available inside ckanext-file-keeper-cloud.
In CKAN, a “storage” represents a logical container for specific set of files. Each storage can be configured separately and serves a distinct purpose:
Resource Storage: Handles data files uploaded to CKAN resources
User Storage: Manages user avatars
Group Storage: Manages logo images for organizations/groups
Admin Storage: Stores site logo
Default Storage: Used for generic uploads. Also, when resource, user, group, or admin storage is not configured, CKAN uses default storage to initialize corresponding derived storage with sensible configuration.
Custom Storages: can be configured for application-specific files
Each storage operates independently with its own configuration, but they all use the same interface. This allows different types of files to be stored in different locations (local filesystem, cloud storage, etc.) while maintaining a consistent API.
For example, you might configure:
Resource files to be stored in /var/lib/ckan/resources
Organization logos in /var/lib/ckan/logos
Plugin assets in an S3 bucket
All these storages will be accessible through
get_storage() function and from user’s perspective
they will behave identically.
CKAN uses file-keeper as an abstraction layer for low-level interaction with the file storages. It exposes classes with a standard storage interface regardless of the underlying system. As a result, saving files into the local files ystem, a cloud provider or a database looks exactly the same from the code perspective.
Storages are initialized during application startup and must be configured in advance. The exact settings depend on the type of the storage, but in general they look like this:
ckan.files.storage.my_storage.type = ckan:fs
ckan.files.storage.my_storage.path = /tmp/my_storage
ckan.files.storage.my_storage.initialize = true
Any option that starts with ckan.files.storage. is a storage
configuration. After the prefix follows the name of the storage,
my_storage, and everything after the name is an option that will be
consumed by the storage.
In the example above, storage my_storage is detected with configuration
{"type": "ckan:fs", "path": "/tmp/my_storage", "initialize":
true}. Configuration for storages is grouped by the name, and that allows
multiple storages to be configured at the same time:
ckan.files.storage.a.type = xxx
ckan.files.storage.b.type = yyy
ckan.files.storage.c.type = zzz
It results in three storages:
awith configuration{"type": "xxx"}bwith configuration{"type": "yyy"}cwith configuration{"type": "zzz"}
To get the instance of the storage, use the
get_storage() function:
storage = get_storage("my_storage")
To create a new file in the storage use its
upload() method and the
make_upload() function, which can transform a variety
of objects into an uploadable structure:
upload = make_upload(b"hello world")
info = storage.upload("file.txt", upload)
Tip
The snippet above creates file in the storage, but this file will not be tracked by CKAN and creating a reference to such file may require certain efforts. Unless the file is created for internal purpose, it’s recommended to use File API:
import ckan.plugins.toolkit as tk
result = tk.get_action("file_create")({"ignore_auth": True}, {
"storage": "my_storage",
"name": "file.txt",
"upload": b"hello world",
})
file_create() creates in DB a record
tracking the file and returns a dictionary with file metadata. ID can be
later used to create permanent links to the file, manage file access or
remove it.
When the storage instance uploads the file, it returns an object with the file details, namely its location, size, content type and content hash. This information is required to read the file back from the storage:
content = storage.content(info)
When the object with the file details is not available, it can usually be
created manually using the location of the file and the
FileData class:
path = "path/to/file/inside/the/storage.txt"
info = FileData.from_string(path)
content = storage.content(info)
Additional information about storage functionality is available in the file-keeper documentation.
Using configured storages for resource, group, admin and user uploads
CKAN config file does not initially include configuration for 4 internal storages. Instead, it contains following config options responsible for file uploads:
ckan.uploads_enabled = true
ckan.storage_path = |storage_path|
ckan.max_resource_size = 10
ckan.max_image_size = 2
# ...
ckan.upload.user.types = image
ckan.upload.user.mimetypes = image/png image/gif image/jpeg
ckan.upload.group.types = image
ckan.upload.group.mimetypes = image/png image/gif image/jpeg
This configuration means that CKAN expects that there will be writable
/var/lib/ckan/default folder available in the system. Inside this folder CKAN will keep:
resource files inside
resources/sub-directorygroup and organization images inside
storage/uploads/group/sub-directoryuser avatars inside
storage/uploads/user/sub-directorysite logo inside
storage/uploads/admin/sub-directory
Also there are 10MiB limits on resource size and 2MiB limit on group/organization/user image size. Finally, this configuration allows only images to be uploaded as user avatars and group/organization images.
Uploads can be globally disabled for the portal by disabling ckan.uploads_enabled.
To start using storages, configure storage with the name default:
ckan.files.storage.default.type = ckan:fs ckan.files.storage.default.path = /var/lib/ckan/default
This configuration does a lot:
defaultstorage initialized and it will be used by default for files created withfile_create()implicit
resourcesstorage initialized automatically. It has itspathoption is pointing atresources/subfolder of the default storage. It also hasmax_sizeoption that limits max allowed size of resource upload. The value of this option is taken from ckan.max_resource_sizeimplicit
groups,usersandadminsstorages initialized automatically. Theirpathoption is pointing atstorage/uploads/{group,user/admin}correspondingly. Theirmax_sizeoption has the same value as ckan.max_image_size to reject big images.usersandgroupsstorages also havesupported_typesthat limits upload types accepted by storages. The option inherits value values fromAnd these storages have
publicflag enabled. It means that any file from these storages can be accessed directly by the name, without any permission checks.
At this stage, there is no difference between CKAN running with the old
ckan.storage_path instead of configured storages. If there are no
plugins that customize upload process via
IUploader switching between
ckan.storage_path and storages will not make any difference for the end
user.
The main reason to disable classic uploader is a possibility to customize these 4 uploaders mentioned above. If CKAN sees explicit configuration of the any of these storages, the implicit version of the corresponding storage path will not be created.
Instead of configuring default storage, or in addition to it, explicit
configuration for every internal storage can be provided.
Resources storage:
ckan.files.storage.resources.type = ckan:fs ckan.files.storage.resources.path = /var/lib/ckan/default/resources ckan.files.storage.resources.initialize = true ckan.files.storage.resources.max_size = 10MiB
Group and organizations storage:
ckan.files.storage.groups.type = ckan:fs ckan.files.storage.groups.path = /var/lib/ckan/default/storage/uploads/group ckan.files.storage.groups.initialize = true ckan.files.storage.groups.max_size = 2MiB ckan.files.storage.groups.supported_types = image/* ckan.files.storage.groups.public = true
User storage:
ckan.files.storage.users.type = ckan:fs ckan.files.storage.users.path = /var/lib/ckan/default/storage/uploads/user ckan.files.storage.users.initialize = true ckan.files.storage.users.max_size = 2MiB ckan.files.storage.users.supported_types = image/* ckan.files.storage.users.public = true
Admin storage(site logo):
ckan.files.storage.users.type = ckan:fs ckan.files.storage.users.path = /var/lib/ckan/default/storage/uploads/admin ckan.files.storage.users.initialize = true ckan.files.storage.users.max_size = 2MiB ckan.files.storage.users.public = true
Note
If there is an existing storage that uses a name different from
resources, it’s possible to use it as resources storage using following
config option:
ckan.files.default_storages.resource = MY_EXISTING_STORAGE
Assuming there is a configuration for MY_EXISTING_STORAGE
elsewhere(i.e. ckan.files.storage.MY_EXISTING_STORAGE.type = ckan:fs,
etc.), this storage will hold all further resource uploads. Similar options
are available for other standard upload types. Check:
File API
Storage object returned from
get_storage() exposes low-level methods for dealing
with files, but generally it’s expected that files are managed through
API. There is a set of API actions aimed at file management and here’s the list
of the most important ones:
The main difference between file created directly using
upload() and file created via
file_create() is that the latter is registered
in the database and has a corresponding record in the files table. This means
that file created via API is tracked by CKAN, works with permissions system,
and can be accessed using generic download URL, while file created directly via
storage is not registered in DB and can be accessed only via code if its
location is known.
API is build around safe assumptions, making it the recommended way to manage files in CKAN. For example, there is no API method to override or modify file’s content. Once file is created via API, its content is immutable and can be deleted, but not changed. To update the file, it must be deleted and created again with new content. This approach allows CKAN to maintain integrity of the files and avoid potential security issues related to file modifications. Because every file has unique ID, if file once referenced from a different entity (for example, a resource), there is guarantee that a content, hash, size, and type of the file will remain the same as long as the file exists in the system. If file gets deleted and new file will be created in the same location, this new file will have different ID and will not be referenced from the entities that pointed to the previous file, so there is no risk of unintentional content change for the users of the system. For example, it means that it’s impossible to upload an image as a user avatar and then replace it with an HTML page(known way of hacking portals without upload restrictions).
File API has built-in permission checks, so only authorized users can create, delete or view files. By default, only sysadmin can upload files unless ckan.files.authenticated_uploads.allow config option is enabled, which grants every authenticated user with permission to upload files.
Note
When ckan.files.authenticated_uploads.allow is enabled, users are allowed to upload files into storages specified by ckan.files.authenticated_uploads.storages. By default this option is empty and must be also updated when authenticated uploads are enabled.
Once file is created, the user who called file_create action is set as
file’s owner. Owner of the file is used by file permissions system, to decide
whether user is allowed to access the file or intract with it in other way. By
default, only user who owns the file and sysadmin have permissions to access
the it. But these permissions can be extended both through configuration and
plugins.
To extend permissions via configuration, use
ckan.files.owner.cascade_access config option. This option expects space
separated list of entities that can be assigned as a file
owner(e.g. resource, package, group, something-else) and it
allows user to perform operation with file as long as user is allowed to
perform corresponding operation with the owner of the file. For example, if
file transferred to resource using
file_ownership_transfer() API action, then any
user who has permission to call resource_show for the given resource is
also allowed to call file_show for the any file owned by this resource. To
be more precise, when file is ownedy by anything that has type XXX, and this
XXX is listed among ckan.files.owner.cascade_access values, then
XXX_show auth function is called whenever user tries to call file_show.
If file owned by package, package_show is called. If file owned by
group, group_show is called. If file is called by anything_else,
anything_else_show is called. As long as corresponding auth function
exists, it will be used to decide whether user is allowed to read file’s
details. If auth function does not exist, user is not allowed to read file’s
details.
There are 3 types of operations that mapped in this way:
show: any action that reads file’s data is mapped toOWNER_TYPE_showdelete: any action that removes the file is mapped toOWNER_TYPE_deleteupdate: any action that modifies file’s data(file_rename,file_transfer_ownership) is mapped toOWNER_TYPE_update
And these operations cover basic usage scenarios, such as uploading file to resource and then allowing users who can read the resource to read the file, or allowing users who can delete the resource to delete the file, etc.
For more complex scenarios, such as preventing user who can read file’s
metadata via file_show from downloading the file, custom permissions can be
implemented in plugins, by overriding auth functions.
When overriding auth functions, consider hierarchy of permissions. For example,
to override permissions of file_show action that returns file’s metadata:
override
file_showauth function that is used by action directly. Oroverride
permission_read_fileauth function that is internally called byfile_showand can be potentially used by other actions related to obtaining file’s details. Oroverride
permission_owns_filethat is internally called by any action that works with existing file(acceptsidof the file). Oroverride
permission_manage_filesthat is internally called by every action, includingfile_create.
Methods mentioned lower in the list have wider scope and they should be
overridden only if global modification of all corresponding permissions is
intended. The ideal solution is to override auth function with the name that
matches the name of the API action that will be affected, but there are
situations, where it’s not possible. For example, file cannot be downloaded via
API, that’s why downloads are controlled by permission_download_file.
Here’s the full hierarchy of auth functions related to files:
permission_manage_files # Root permission: file management
├─ permission_owns_file # Actions available to file owner
│ ├─ permission_edit_file # Editing capabilities
│ │ ├─ file_rename # Rename file
│ │ ├─ file_pin # Pin file
│ │ ├─ file_unpin # Unpin file
│ │ └─ file_ownership_transfer # Transfer ownership
│ ├─ permission_delete_file # Deletion rights
│ │ └─ file_delete # Delete file
│ └─ permission_read_file # Read access
│ ├─ permission_download_file # Download rights
│ └─ file_show # View file
├─ file_create # Create new file
├─ file_register # Register file in system
└─ file_owner_scan # See all files of the given owner
Download files
While Storage has
stream() and
content() methods that return file content,
it’s not the only way to access files. Any file registered in DB(i.e., created
via file_create or similar API action and tracked via DB record in the
files table) can be accessed using generic download URL. To build the URL for
the file, generate a link to file.download endpoint, providing file’s ID as
an id parameter of the URL:
download_url = h.url_for("file.download", id=FILE_ID)
This endpoint performs generic access check before sending file to user. It
calls permission_download_file auth function, which can be overridden to
implement custom access rules, like restricted downloads even if user has
access to file’s metadata.
Note
By default file is accessible only by sysadmin and user who owns the
file. To extend download permissions, consider transferring file ownership
to organization/package/resource via
file_ownership_transfer() and then enable
cascade access to the given owner via
ckan.files.owner.cascade_access.
Permission checks can be bypassed when using another download view
trusted_download. It works with JWT tokens and can be used to create
temporary URL that gives unrestricted access to file.
To use it, create a JWT token that contain file’s ID inside sub claim and
hardcoded value trusted_download inside aud claim. It’s recommended to
set expiration on the token using exp claim, or else it will give permanent
access to the file that will work as long as file exists. Then generate a URL
with it:
from datetime import datetime, timezone, timedelta
from ckan.lib.api_token import encode_token
expires_at = datetime.now(timezone.utc) + timedelta(hours=1)
token = encode_token({
"sub": FILE_ID,
"aud": "trusted_download",
"exp": expires_at,
})
trusted_download_url = h.url_for("file.trusted_download", token=token)
Views above work for files registered in DB, but do not work for files uploaded
directly to storage, bypassing the File API. To download these files, first
make sure that storage has enabled public flag:
ckan.files.storage.my_storage.public = true
Then, check the location of the file inside the storage. Location it’s the
fragment of the absolute path to the file, after stripping the part specified
in the path setting of the storage. For example, if storage has
ckan.files.storage.my_storage.path = /var/data and the file’s absolute path is
/var/data/my/folder/file.txt, the location is my/folder/file.txt.
Use location and storage name to generate URL for the public_download view:
public_download_url = h.url_for(
"file.public_download",
storage_name="my_storage",
location="my/folder/file.txt",
)
This URL can be accessed by anyone as long as storage is marked as
public. Without public flag, the URL will cause 403 HTTP response.
As an alternative, when writing custom view functions,
ckan.lib.files.Storage.as_response() method can be used to create
Flask’s response object with the file content. Depending on the storage
backend, it can be either a response with the actual file content, or a
redirect response to the external public file location. Such response can be
returned from the view function as is:
@my_blueprint.route("/my/custom/download/<id>")
def download(id: str) -> Response:
try:
item: dict[str, Any] = logic.get_action("file_show")({}, {"id": id})
except logic.NotFound:
return base.abort(404)
file_data: FileData = files.FileData.from_dict(item)
storage: Storage = files.get_storage(item["storage"])
return storage.as_response(data)
Storage types
Configuring a storage requires defining its type of the storage. Apart from
the type, there is a number of common options that are supported by all storage
types.
max_size: The maximum size of a single upload. No limits by default.supported_types: Space-separated list of allowed MIME types. No restrictions by default.overwrite_existing: If file already exists, replace it with new content. Enabled by default.location_transformers: List of transformations applied to the file location. Transformations are not applied automatically - callprepare_location()to get the transformed version of the filename.
The rest of options depends on the specific storage type. CKAN provides the following built-in storage types:
ckan:fs
Example:
ckan.files.storage.my_storage.type = ckan:fs
ckan.files.storage.my_storage.initialize = true
ckan.files.storage.my_storage.path = /var/lib/storage/my_storage
Keeps files inside the local filesystem. Files are uploaded into a directory
specified by the required path option. The directory must exist and be
writable by the CKAN process. If directory does not exist, it’s created when
initialize option is enabled. If initialize is not enabled, exception
is raised during initialization of the storage.
ckan:fs:public
Example:
ckan.files.storage.my_public_storage.type = ckan:fs:public
ckan.files.storage.my_public_storage.initialize = true
ckan.files.storage.my_public_storage.path = /var/lib/storage/my_public_storage
# make storage folder available at application root
extra_public_paths = /var/lib/storage/my_public_storage
Extended version of ckan:fs type. It assumes that path is registered as
CKAN public folder and all files from it are accessible directly from the
browser. Can be used for non-private uploads, such as user avatars or group
images. If path points to the subfolder of the public directory, i.e, CKAN
registers /data/storage as public directory, but storage’s path is set
to /data/storage/nested/path/inside, use public_prefix option to
specify static segment that must be added to file’s location in order to build
valid public URL. In the given example, public_prefix must be set to
nested/path/inside.
This storage type is similar to ckan:fs with public = true. The main
difference in the way files are served:
public = trueuses generic logic implemented in the custom view. Basically CKAN proxifies the content, which makes this flag compatible with any storage type. But it’s not efficient, especially for large files.type = ckan:fs:publicseves files via Flask’ssend_filefunction, which is efficient, but works only with local filesystem.
Storage utilities
- ckan.lib.files.get_storage(name: str | None = None) Storage
Return existing storage instance.
If no name specified, default storage is returned.
Storages are initialized when application is loaded. As result, this function always returns the same storage object for the given name.
Any changes configuration changes that happen after the application start will be ignored.
>>> default_storage = get_storage() >>> assert default_storage.settings.name == "default"
>>> try: >>> storage = get_storage("storage name") >>> except files.exc.UnknownStorageError: >>> log.exception("Storage 'storage name' is not configured")
- Parameters:
name – name of the configured storage
- Returns:
storage instance
- Raises:
UnknownStorageError – storage with the given name is not configured
- files.make_upload(value: Any) Upload = <function make_upload>
Convert value into Upload object.
Works with binary objects,
io.BytesIO, file-objects, and file-fields from submitted forms.Use this function for simple and reliable initialization of Upload object. Avoid creating Upload manually, unless you are 100% sure you can provide correct MIMEtype, size and stream.
>>> upload = make_upload(b"hello world") >>> file_data = storage.upload("file.txt", upload)
- Parameters:
value – content of the file
- Returns:
upload object with specified content
- Raises:
TypeError – content has unsupported type
- class ckan.lib.files.Storage(settings: Mapping[str, Any] | Settings, /)
Base class for storage implementation.
Extends
file_keeper.Storage.Implementation of the custom adapter normally includes definition of factory classes and config declaration.
>>> class MyStorage(Storage): >>> # typechecker may need this line to identify types >>> settings: MySettings >>> >>> SettingsFactory = MySettings >>> UploaderFactory = MyUploader >>> ManagerFactory = MyManager >>> ReaderFactory = MyReader >>> >>> @classmethod >>> def declare_config_options(cls, declaration: Declaration, key: Key): >>> ...
- Parameters:
settings – mapping with storage configuration
- capabilities: Capability
Operations supported by storage. Computed from capabilities of services during storage initialization.
- prepare_location(location: str, sample: Upload | None = None) Location
Transform and sanitize location using configured functions.
This method applies all transformations configured in
location_transformerssetting to the provided location. Each transformer is called in the order they are listed in the setting. The output of the previous transformer is passed as an input to the next one.Example:
>>> location = storage.prepare_location(untrusted_location)
- Parameters:
location – initial location provided by user
sample – optional Upload object that can be used by transformers.
- Returns:
transformed location
- classmethod declare_config_options(declaration: Declaration, key: Key)
Declare configuration of the storage.
All attributes of the storage’s SettingsFactory must be defined here. In this way user can discover available options using config CLI, and configuration is validated/converted by CKAN before it passed to the storage.
>>> @classmethod >>> def declare_config_options(cls, decl, key): >>> decl.declare_bool(key.enable_turbo_mode) >>> decl.declare(key.secret).required()
- as_response(data: FileData, filename: str | None = None, /, send_inline: bool = False, **kwargs: Any) Response
Make Flask response with file attachment.
By default, files are served as attachments and are downloaded as result. Use ckan.files.inline_content_types config option to specify content types that must be served inline. For example, following config option will render images, videos and text files in browser instead of forcing download:
ckan.files.inline_content_types = image text/plain video
If rendering is safe and preferable for individual call, enable
send_inlineflag.If either
send_inlineis set toTrue, or file has content type that matches ckan.files.inline_content_types, it will be rendered on the page. Otherwise it will be sent as an attachment and downloaded by the client.- Parameters:
data – file details
filename – expected name of the file used instead of the real name
send_inline – do not force download and try rendering file in browser
- Returns:
Flask response with file’s content
- validate_size(size: int)
Verify that size of upload does not go over the configured limit.
- Parameters:
size – the actual size of uploaded file in bytes
- Raises:
LargeUploadError – upload exceeds allowed size
- validate_content_type(content_type: str)
Verify that type of upload is allowed by configuration.
- Parameters:
content_type – MIME Type of uploaded file
- Raises:
WrongUploadTypeError – type of upload is not supported
- upload(location: Location, upload: Upload, /, **kwargs: Any) FileData
Upload file to the storage.
Before upload starts, file is validated according to storage settings.
- Parameters:
location – sanitized location of the file in the storage
upload – uploaded object
**kwargs – other parameters that may be used by the storage
- Returns:
details of the uploaded file
- permanent_link(data: FileData, /, **extras: dict[str, Any]) str | None
Generate permanent link for the file.
- class ckan.lib.files.Settings(type: str = '', name: str = 'unknown', overwrite_existing: bool = False, path: str = '', location_transformers: list[str] = <factory>, disabled_capabilities: list[str] = <factory>, initialize: bool = False, skip_in_place_move: bool = True, skip_in_place_copy: bool = True, hashing_algorithm: str = 'md5', _extra_settings: dict[str, ~typing.Any] = <factory>, supported_types: list[str] = <factory>, max_size: int = -1, public: bool = False)
Storage settings definition.
Any configurable parameter must be defined here, as this dataclass accepts options collected from CKAN config file and exposes them to storage and its services.
Generally, Settings should not validate configuration, because validation is provided by the config declarations. Settings object just holds static options and initializes additional objects, like connections to external services.
>>> @dataclasses.dataclass() >>> class MySettings(Settings) >>> >>> # normal configurable parameter. Prefer this type of settings >>> verbose: bool = False >>> >>> # this attribute will be initialized inside __post_init__. All >>> # setting's attributes must be supplied with default values, but we >>> # cannot set "default" connection. Instead we are using `None` and >>> # type-ignore annotation to avoid attention from typechecker. If we >>> # can guarantee that settings will not be initialized without a >>> # connection, that remains safe. >>> conn: Engine = None # pyright: ignore[reportAssignmentType] >>> >>> # db_url will be used to initialize connection and >>> # there is no need to keep it after initialization >>> db_url: dataclasses.InitVar[str] = "" >>> >>> def __post_init__(self, db_url: str, **kwargs: Any): >>> # always call original implementation >>> super().__post_init__(**kwargs) >>> >>> if self.conn is None: # pyright: ignore[reportUnnecessaryComparison] >>> if not db_url: >>> msg = "db_url is not valid" >>> raise files.exc.InvalidStorageConfigurationError( >>> self.name, >>> msg, >>> ) >>> self.conn = create_engine(db_url)
- class ckan.lib.files.Uploader(storage: Storage)
Service responsible for writing data into a storage.
Storageinternally calls methods of this service. For example,Storage.upload(location, upload, **kwargs)results inUploader.upload(location, upload, kwargs).>>> class MyUploader(Uploader): >>> def upload( >>> self, location: Location, upload: Upload, extras: dict[str, Any] >>> ) -> FileData: >>> reader = upload.hashing_reader() >>> with open(location, "wb") as dest: >>> for chunk in reader: >>> dest.write(chunk) >>> return FileData( >>> location, size=upload.size, >>> content_type=upload.content_type, >>> hash=reader.get_hash(), >>> algorithm=self.storage.settings.hashing_algorithm, >>> )
- class ckan.lib.files.Reader(storage: Storage)
Service responsible for reading data from the storage.
Storageinternally calls methods of this service. For example,Storage.stream(data, **kwargs)results inReader.stream(data, kwargs).>>> class MyReader(Reader): >>> def stream( >>> self, data: FileData, extras: dict[str, Any] >>> ) -> Iterable[bytes]: >>> return open(data.location, "rb")
- class ckan.lib.files.Manager(storage: Storage)
Service responsible for maintenance file operations.
Storageinternally calls methods of this service. For example,Storage.remove(data, **kwargs)results inManager.remove(data, kwargs).>>> class MyManager(Manager): >>> def remove( >>> self, data: FileData, extras: dict[str, Any] >>> ) -> bool: >>> os.remove(data.location) >>> return True
- files.Upload = <class 'file_keeper.core.upload.Upload'>
Standard upload details produced by
make_upload().- Upload.stream: PStream
Content as iterable of bytes
- Upload.filename: str
Name of the file
- Upload.size: int
Size of the file
- Upload.content_type: str
MIME Type of the file
- files.FileData = <class 'file_keeper.core.data.FileData'>
Information required by storage to operate the file.
>>> info = FileData("local/path.txt", size=123, content_type="text/plain", hash=md5_of_content, algorithm="md5")
Location of the file usually requires sanitization and as a reminder about this step, typechecker produces warning whenever plain string is passed to the
FileData. The proper way of initializing file data is using already sanitized path wrapped intoLocation.>>> safe_path = Location("sanitized/local/path.txt") >>> info = FileData(location)
Logic of the process is not changed when
Locationcomes into a play, because it’s a mere alias forstrclass. This flow exists to help detecting security issues. If any value can be safely used as a location(for example, file is kept in DB and location will be sanitized during execution of SQL statement), typechecker warnings can be ignored.As sanitization rules depend on storage, the recommended way to sanitize the location is to configure
Settings.location_transformersand apply them to path by callingprepare_location().>>> unsafe_path = "local/path.txt" >>> safe_path = storage.prepare_location(unsafe_path)
- Parameters:
location – filepath, filename or any other type of unique identifier
size – size of the file in bytes
content_type – MIMEtype of the file
hash – checksum of the file
storage_data – additional details set by storage adapter
- files.Capability = <enum 'Capability'>
Enumeration of operations supported by the storage.
>>> read_and_write = Capability.STREAM | Capability.CREATE >>> if storage.supports(read_and_write) >>> ...
- files.Location = file_keeper.core.types.Location
Alias of
strthat represents sanitized location of the file