Writing documentation

This section gives some guidelines to help us to write consistent and good quality documentation for CKAN.

Documentation isn’t source code, and documentation standards don’t need to be followed as rigidly as coding standards do. In the end, some documentation is better than no documentation, it can always be improved later. So the guidelines below are soft rules.

Having said that, we suggest just one hard rule: no new feature (or change to an existing feature) should be missing from the docs (but see todo).

See also

Jacob Kaplan-Moss’s Writing Great Documentation

A series of blog posts about writing technical docs, a lot of our guidelines were based on this.

See also

The quickest and easiest way to contribute documentation to CKAN is to sign up for a free GitHub account and simply edit the CKAN Wiki. Docs started on the wiki can make it onto docs.ckan.org later. If you do want to contribute to docs.ckan.org directly, follow the instructions on this page.

Tip: Use the reStructuredText markup format when creating a wiki page, since reStructuredText is the format that docs.ckan.org uses, this will make moving the documentation from the wiki into docs.ckan.org later easier.

Getting started

This section will walk you through downloading the source files for CKAN’s docs, editing them, and submitting your work to the CKAN project.

CKAN’s documentation is created using Sphinx, which in turn uses Docutils (reStructuredText is part of Docutils). Some useful links to bookmark:

The source files for the docs are in the doc directory of the CKAN git repo. The following sections will walk you through the process of making changes to these source files, and submitting your work to the CKAN project.

Install CKAN into a virtualenv

Create a Python virtual environment (virtualenv), activate it, install CKAN into the virtual environment, and install the dependencies necessary for building CKAN. In this example we’ll create a virtualenv in a folder called pyenv. Run these commands in a terminal:

virtualenv --no-site-packages pyenv
. pyenv/bin/activate
pip install -e 'git+https://github.com/ckan/ckan.git#egg=ckan'
cd pyenv/src/ckan/
pip install -r dev-requirements.txt
pip install -r requirements.txt

Build the docs

You should now be able to build the CKAN documentation locally. Make sure your virtual environment is activated, and then run this command:

python setup.py build_sphinx

Now you can open the built HTML files in build/sphinx/html, e.g.:

firefox build/sphinx/html/index.html

Edit the reStructuredText files

To make changes to the documentation, use a text editor to edit the .rst files in doc/. Save your changes and then build the docs again (python setup.py build_sphinx) and open the HTML files in a web browser to preview your changes.

Once your docs are ready to submit to the CKAN project, follow the steps in Making a pull request.

How the docs are organized

It’s important that the docs have a clear, simple and extendable structure (and that we keep it that way as we add to them), so that both readers and writers can easily answer the questions: If you need to find the docs for a particular feature, where do you look? If you need to add a new page to the docs, where should it go?

As Overview explains, the documentation is organized into several guides, each for a different audience: a user guide for web interface users, an extending guide for extension developers, a contributing guide for core contributors, etc. These guides are ordered with the simplest guides first, and the most advanced last.

In the source, each one of these guides is a subdirectory with its own index.rst containing its own .. toctree:: directive that lists all of the other files in that subdirectory. The root toctree just lists each of these */index.rst files.

When adding a new page to the docs, the first question to ask yourself is: who is this page for? That should tell you which subdirectory to put your page in. You then need to add your page to that subdirectory’s index.rst file.

Within each guide, the docs are broken up by topic. For example, the extending guide has a page for the writing extensions tutorial, a page about testing extensions, a page for the plugins toolkit reference, etc. Again, the topics are ordered with the simplest first and the most advanced last, and reference pages generally at the very end.

The changelog is one page that doesn’t fit into any of the guides, because it’s relevant to all of the different audiences and not only to one particular guide. So the changelog is simply a top-level page on its own. Hopefully we won’t need to add many more of these top-level pages. If you’re thinking about adding a page that serves two or more audiences at once, ask yourself whether you can break that up into separate pages and put each into one of the guides, then link them together using seealso boxes.

Within a particular page, for example a new page documenting a new feature, our suggestion for what sections the page might have is:

  1. Overview: a conceptual overview of or introduction to the feature. Explain what the feature provides, why someone might want to use it, and introduce any key concepts users need to understand. This is the why of the feature.

    If it’s developer documentation (extension writing, theming, API, or core developer docs), maybe put an architecture guide here.

  2. Tutorials: tutorials and examples for how to setup the feature, and how to use the feature. This is the how.

  3. Reference: any reference docs such as config options or API functions.

  4. Troubleshooting: common error messages and problems, FAQs, how to diagnose problems.

Subdirectories

Some of the guides have subdirectories within them. For example Maintainer’s guide contains a subdirectory Installing CKAN that collects together the various pages about installing CKAN with its own doc/maintaining/installing/index.rst file.

While subdirectories are useful, we recommend that you don’t put further subdirectories inside the subdirectories, try to keep it to at most two levels of subdirectories inside the doc directory. Keep it simple, otherwise the structure becomes confusing, difficult to get an overview of and difficult to navigate.

Linear ordering

Keep in mind that Sphinx requires the docs to have a simple, linear ordering. With HTML pages it’s possible to design structure where, for example, someone reads half of a page, then clicks on a link in the middle of the page to go and read another page, then goes back to the middle of the first page and continues reading where they left off. While technically you can do this in Sphinx as well, it isn’t a good idea, things like the navigation links, table of contents, and PDF version will break, users will end up going in circles, and the structure becomes confusing.

So the pages of our Sphinx docs need to have a simple linear ordering - one page follows another, like in a book.

Sphinx

This section gives some useful tips about using Sphinx.

Don’t introduce any new Sphinx warnings

When you build the docs, Sphinx prints out warnings about any broken cross-references, syntax errors, etc. We aim not to have any of these warnings, so when adding to or editing the docs make sure your changes don’t introduce any new ones.

It’s best to delete the build directory and completely rebuild the docs, to check for any warnings:

rm -rf build; python setup.py build_sphinx

Maximum line length

As with Python code, try to limit all lines to a maximum of 79 characters.

versionadded and versionchanged

Use Sphinx’s versionadded and versionchanged directives to mark new or changed features. For example:

================
Tag vocabularies
================

.. versionadded:: 1.7

CKAN sites can have *tag vocabularies*, which are a way of grouping related
tags together into custom fields.

...

With versionchanged you usually need to add a sentence explaining what changed (you can also do this with versionadded if you want):

=============
Authorization
=============

.. versionchanged:: 2.0
   Previous versions of CKAN used a different authorization system.

CKAN's authorization system controls which users are allowed to carry out
which...

Substitutions

Substitutions are a useful way to define a value that’s needed in many places (eg. a command, the location of a file, etc.) in one place and then reuse it many times.

You define the value once like this:

.. |ckan.ini| replace:: /etc/ckan/default/ckan.ini

and then reuse it like this:

Now open your |ckan.ini| file.

|ckan.ini| will be replaced with the full value /etc/ckan/default/ckan.ini.

Substitutions can also be useful for achieving consistent spelling and capitalization of names like reStructuredText, PostgreSQL, Nginx, etc.

The rst_epilog setting in doc/conf.py contains a list of global substitutions that can be used from any file.

Substitutions can’t immediately follow certain characters (with no space in-between) or the substitution won’t work. If this is a problem, you can insert an escaped space, the space won’t show up in the generated output and the substitution will work:

pip install -e 'git+\ |git_url|'

Similarly, certain characters are not allowed to immediately follow a substitution (without a space) or the substitution won’t work. In this case you can just escape the following characters, the escaped character will show up in the output and the substitution will work:

pip install -e 'git+\ |git_url|\#egg=ckan'

Also see Parsed literals below for using substitutions in code blocks.

Parsed literals

Normally things like links and substitutions don’t work within a literal code block. You can make them work by using a parsed-literal block, for example:

Copy your development.ini file to create a new production.ini file::

.. parsed-literal::

   cp |development.ini| |production.ini|

autodoc

We try to use autodoc to pull documentation from source code docstrings into our Sphinx docs, wherever appropriate. This helps to avoid duplicating documentation and also to keep the documentation closer to the code and therefore more likely to be kept up to date.

Whenever you’re writing reference documentation for modules, classes, functions or methods, exceptions, attributes, etc. you should probably be using autodoc. For example, we use autodoc for the Action API reference, the Plugin interfaces reference, etc.

For how to write docstrings, see Docstrings.

todo

No new feature (or change to an existing feature) should be missing from the docs. It’s best to document new features or changes as you implement them, but if you really need to merge something without docs then at least add a todo directive to mark where docs need to be added or updated (if it’s a new feature, make a new page or section just to contain the todo):

=====================================
CKAN's builtin social network feature
=====================================

.. todo::

   Add docs for CKAN's builtin social network for data hackers.

deprecated

Use Sphinx’s deprecated directive to mark things as deprecated in the docs:

.. deprecated:: 3.1
   Use :func:`spam` instead.

seealso

Often one page of the docs is related to other pages of the docs or to external pages. A seealso block is a nice way to include a list of related links:

.. seealso::

   :doc:`The DataStore extension <datastore>`
     A CKAN extension for storing data.

   CKAN's `demo site <https://demo.ckan.org/>`_
     A demo site running the latest CKAN beta version.

Seealso boxes are particularly useful when two pages are related, but don’t belong next to each other in the same section of the docs. For example, we have docs about how to upgrade CKAN, these belong in the maintainer’s guide because they’re for maintainers. We also have docs about how to do a new release, these belong in the contributing guide because they’re for developers. But both sections are about CKAN releases, so we link each to the other using seealso boxes.

Code examples

If you’re going to paste example code into the docs, or add a tutorial about how to do something with code, then:

  1. The code should be in standalone Python, HTML, JavaScript etc. files, not pasted directly into the .rst files. You then pull the code into your .rst file using a Sphinx .. literalinclude:: directive (see example below).

  2. The code in the standalone files should be a complete working example, with tests. Note that not all of the code from the example needs to appear in the docs, you can include just parts of it using .. literalinclude::, but the example code needs to be complete so it can be tested.

This is so that we don’t end up with a lot of broken, outdated examples and tutorials in the docs because breaking changes have been made to CKAN since the docs were written. If your example code has tests, then when someone makes a change in CKAN that breaks your example those tests will fail, and they’ll know they have to fix their code or update your example.

The plugins tutorial is an example of this technique. ckanext/example_iauthfunctions is a complete and working example extension. The tests for the extension are in ckanext/example_iauthfunctions/tests. Different parts of the reStructuredText file for the tutorial pull in different parts of the example code as needed, like this:

.. literalinclude:: ../../ckanext/example_iauthfunctions/plugin_v3.py
   :start-after: # We have the logged-in user's user name, get their user id.
   :end-before: # Finally, we can test whether the user is a member of the curators group.

literalinclude has a few useful options for pulling out just the part of the code that you want. See the Sphinx docs for literalinclude for details.

You may notice that ckanext/example_iauthfunctions contains multiple versions of the same example plugin, plugin_v1.py, plugin_v2.py, etc. This is because the tutorial walks the user through first making a trivial plugin, and then adding more and more advanced features one by one. Each step of the tutorial needs to have its own complete, standalone example plugin with its own tests.

For more examples, look into the source files for other tutorials in the docs.

Style

This section covers things like what tone to use, how to capitalize section titles, etc. Having a consistent style will make the docs nice and easy to read and give them a complete, quality feel.

Use American spelling

Use American spellings everywhere: organization, authorization, realize, customize, initialize, color, etc. There’s a list here: https://wiki.ubuntu.com/EnglishTranslation/WordSubstitution

Spellcheck

Please spellcheck documentation before merging it into master!

Commonly used terms

CKAN

Should be written in ALL-CAPS.

email

Use email not e-mail.

PostgreSQL, SQLAlchemy, Nginx, Python, SQLite, JavaScript, etc.

These should always be capitalized as shown above (including capital first letters for Python and Nginx even when they’re not the first word in a sentence). doc/conf.py defines substitutions for each of these so you don’t have to remember them, see Substitutions.

Web site

Two words, with Web always capitalized

frontend

Not front-end

command line

Two words, not commandline or command-line (this is because we want to be like Neal Stephenson)

CKAN config file or configuration file

Not settings file, ini file, etc. Also, the config file contains config options such as ckan.site_id, and each config option is set to a certain setting or value such as ckan.site_id = demo.ckan.org.

Section titles

Capitalization in section titles should follow the same rules as in normal sentences: you capitalize the first word and any proper nouns.

This seems like the easiest way to do consistent capitalization in section titles because it’s a capitalization rule that we all know already (instead of inventing a new one just for section titles).

Right:

  • Installing CKAN from package

  • Getting started

  • Command line interface

  • Writing extensions

  • Making an API request

  • You’re done!

  • Libraries available to extensions

Wrong:

  • Installing CKAN from Package

  • Getting Started

  • Command Line Interface

  • Writing Extensions

  • Making an API Request

  • You’re Done!

  • Libraries Available To Extensions

For lots of examples of this done right, see Django’s table of contents.

In Sphinx, use the following section title styles:

===============
Top-level title
===============

------------------
Second-level title
------------------

Third-level title
=================

Fourth-level title
------------------

If you need more than four levels of headings, you’re probably doing something wrong, but see: http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#sections

Be conversational

Write in a friendly, conversational and personal tone:

  • Use contractions like don’t, doesn’t, it’s etc.

  • Use “we”, for example “We’ll publish a call for translations to the ckan-dev and ckan-discuss mailing lists, announcing that the new version is ready to be translated” instead of “A call for translations will be published”.

  • Refer to the reader personally as “you”, as if you’re giving verbal instructions to someone in the room: “First, you’ll need to do X. Then, when you’ve done Y, you can start working on Z” (instead of stuff like “First X must be done, and then Y must be done…”).

Write clearly and concretely, not vaguely and abstractly

Politics and the English Language has some good tips about this, including:

  1. Never use a metaphor, simile, or other figure of speech which you are used to seeing in print.

  2. Never use a long word where a short one will do.

  3. If it’s possible to cut out a word, always cut it out.

  4. Never use the passive when you can be active.

  5. Never use a foreign phrase, scientific word or jargon word if you can think of an everyday English equivalent.

This will make your meaning clearer and easier to understand, especially for people whose first language isn’t English.

Facilitate skimming

Readers skim technical documentation trying to quickly find what’s important or what they need, so break walls of text up into small, visually identifiable pieces:

  • Use lots of inline markup:

    *italics*
    **bold**
    ``code``
    

    For code samples or filenames with variable parts, uses Sphinx’s :samp: and :file: directives.

  • Use lists to break up text.

  • Use .. note:: and .. warning::, see Sphinx’s paragraph-level markup.

    (reStructuredText actually supports lots more of these: attention, error, tip, important, etc. but most Sphinx themes only style note and warning.)

  • Break text into short paragraphs of 5-6 sentences each max.

  • Use section and subsection headers to visualize the structure of a page.