String internationalization

All user-facing Strings in CKAN Python, JavaScript and Jinja2 code should be internationalized, so that our translators can then localize the strings for each of the many languages that CKAN supports. This guide shows CKAN developers how to internationalize strings, and what to look for regarding string internationalization when reviewing a pull request.

Note

Internationalization (or i18n) is the process of marking strings for translation, so that the strings can be extracted from the source code and given to translators. Localization (l10n) is the process of translating the marked strings into different languages.

Internationalizating strings in Jinja2 templates

Most user-visible strings should be in the Jinja2 templates, rather than in Python or JavaScript code. This doesn’t really matter to translators, but it’s good for the code to separate logic and content. Of course this isn’t always possible. For example when error messages are delivered through the API, there’s no Jinja2 template involved.

The preferred way to internationalize strings in Jinja2 templates is by using the trans tag from Jinja2’s i18n extension, which is available to all CKAN core and extension templates and snippets.

Most of the following examples are taken from the Jinja2 docs.

To internationalize a string put it inside a {% trans %} tag:

<p>{% trans %}This paragraph is translatable.{% endtrans %}</p>

You can also use variables from the template’s namespace inside a {% trans %}:

<p>{% trans %}Hello {{ user }}!{% endtrans %}</p>

(Only variable tags are allowed inside trans tags, not statements.)

You can pass one or more arguments to the {% trans %} tag to bind variable names for use within the tag:

 <p>{% trans user=user.username %}Hello {{ user }}!{% endtrans %}</p>

{% trans book_title=book.title, author=author.name %}
This is {{ book_title }} by {{ author }}
{% endtrans %}

To handle different singular and plural forms of a string, use a {% pluralize %} tag:

{% trans count=list|length %}
There is {{ count }} {{ name }} object.
{% pluralize %}
There are {{ count }} {{ name }} objects.
{% endtrans %}

(In English the first string will be rendered if count is 1, the second otherwise. For other languages translators will be able to provide their own strings for different values of count.)

The first variable in the block (count in the example above) is used to determine which of the singular or plural forms to use. Alternatively you can explicitly specify which variable to use:

{% trans ..., user_count=users|length %}
   ...
{% pluralize user_count %}
   ...
{% endtrans %}

The {% trans %} tag is preferable, but if you need to pluralize a string within a Jinja2 expression you can use the _() and ungettext() functions:

{% set hello = _('Hello World!') %}

To use variables in strings, use Python format string syntax and then call the .format() method on the string that _() returns:

{% set hello = _('Hello {name}!').format(name=user.name) %}

Singular and plural forms are handled by ungettext():

{% set text = ungettext(
       '{num} apple', '{num} apples', num_apples).format(num=num_apples) %}

Note

There are also gettext() and ngettext() functions available to templates, but we recommend using _() and ungettext() for consistency with CKAN’s Python code. This deviates from the Jinja2 docs, which do use gettext() and ngettext().

_() is not an alias for gettext() in CKAN’s Jinja2 templates, _() is the function provided by Pylons, whereas gettext() is the version provided by Jinja2, their behaviors are not exactly the same.

Internationalizing strings in Python code

CKAN uses the _() and ngettext() functions from the Flask-Babel library to internationalize strings in Python code.

Core CKAN modules should import _() and ungettext() from ckan.common, i.e. from ckan.common import _, ungettext (don’t import flask_babel._() directly, for example).

CKAN plugins should import ckan.plugins.toolkit and use ckan.plugins.toolkit._() and ckan.plugins.toolkit.ungettext(), i.e. do import ckan.plugins.toolkit as toolkit and then use toolkit._() and toolkit.ungettext() (see Plugins toolkit reference).

To internationalize a string pass it to the _() function:

my_string = _("This paragraph is translatable.")

To use variables in a string, call the .format() method on the translated string that _() returns:

hello = _("Hello {user}!").format(user=user.name)

book_description = _("This is { book_title } by { author }").format(
    book_title=book.title, author=author.name)

To handle different plural and singular forms of a string, use ungettext():

translated_string = ungettext(
    "There is {count} {name} object.",
    "There are {count} {name} objects.",
    num_objects).format(count=count, name=name)

Internationalizing strings in JavaScript code

Each CKAN JavaScript module offers the methods _ and ngettext. The ngettext function is used to translate a single string which has both a singular and a plural form, whereas _ is used to translate a single string only:

this.ckan.module('i18n-demo', function($) {
    return {
        initialize: function () {
            console.log(this._('Translate me!'));
            console.log(this.ngettext('%(num)d item', '%(num)d items', 3));
        }
    };
};

To translate a fixed singular string, use _. It returns the translation of the string for the currently selected locale. If the current locale doesn’t provide a translation for the string then it is returned unchanged.

this._('Something that should be translated')

Placeholders are supported via sprintf-syntax, the corresponding values are passed via another parameter:

this._("My name is %(name)s and I'm from %(hometown)s.",
       {name: 'Sarah', hometown: 'Cape Town'})

ngettext allows you to translate a string that may be either singular or plural, depending on some variable:

this.ngettext('Deleted %(num)d item',
              'Deleted %(num)d items',
              items.length)

If items.length is 1 then the translation for the first argument will be returned, otherwise that of the second argument. num is a magical placeholder that is automatically provided by ngettext and contains the value of the third parameter.

Like _, ngettext can take additional placeholders:

this.ngettext("I'm %(name)s and I'm %(num)d year old",
              "I'm %(name)s and I'm %(num)d years old",
              age,
              {name: 'John'})

Note

CKAN’s JavaScript code automatically downloads the appropriate translations at request time from the CKAN server. Since CKAN 2.7 the corresponding translation files are regenerated automatically if necessary when CKAN starts.

You can also regenerate the translation files manually using ckan translation js:

python setup.py extract_messages  # Extract translatable strings
# Update .po files as desired
python setup.py compile_catalog   # Compile .mo files for Python/Jinja
ckan -c /etc/ckan/default/ckan.ini translation js         # Compile JavaScript catalogs

Note

Prior to CKAN 2.7, JavaScript modules received a similar but different _ function for string translation as a parameter. This is still supported but deprecated and will be removed in a future release.

General guidelines for internationalizing strings

Below are some guidelines to follow when marking your strings for translation. These apply to strings in Jinja2 templates or in Python or JavaScript code. These are mostly meant to make life easier for translators, and help to improve the quality of CKAN’s translations:

Leave as much HTML and other code out of the translation string as possible.

For example, don’t include surrounding ... tags in the marked string. These aren’t necessary for the translator to do the translation, and if the translator accidentally changes them in the translation string the HTML will be broken.

Good:
```
{% trans %}Don't put HTML tags inside translatable strings{% endtrans %}
```
Bad ( tags don’t need to be in the translation string):
```
mystring = _("Don't put HTML tags inside translatable strings")
```
But don’t split a string into separate strings.

Translators need as much context as possible to translate strings well, and if you split a string up into separate strings and mark each for translation separately, translators must translate each of these separate strings in isolation. Also, some languages may need to change the order of words in a sentence or even change the order of sentences in a paragraph, splitting into separate strings makes assumptions about word order.

It’s better to leave HTML tags or other code in strings than to split a string. For example, it’s often best to leave HTML <a> tags in rather than split a string.

Good:
```
_("Don't split a string containing some markup into separate strings.")
```
Bad (text will be difficult to translate or untranslatable):
```
_("Don't split a string containing some ") + "" + _("markup") + + _("into separate strings.")
```
You can split long strings over multiple lines using parentheses to avoid long lines, Python will concatenate them into a single string:

Good:
```
_("This is a really long string that would just make this line far too "
  "long to fit in the window")
```
Leave unnecessary whitespace out of translatable strings, but do put punctuation into translatable strings.
Try not to make translators translate strings that don’t need to be translated.

For example, 'templates' is the name of a directory, it doesn’t need to be marked for translation.
Mark singular and plural forms of strings correctly.

In Jinja2 templates this means using {% trans %} and {% pluralize %} or ungettext(). In Python it means using ungettext(). See above for examples.

Singular and plural forms work differently in different languages. For example English has singular and plural nouns, but Slovenian has singular, dual and plural.

Good:
```
num_people = 4
translated_string = ungettext(
    'There is one person here',
    'There are {num_people} people here',
    num_people).format(num_people=num_people)
```
Bad (this assumes that all languages have the same plural forms as English):
```
if num_people == 1:
    translated_string = _('There is one person here')
else:
    translated_string = _(
        'There are {num_people} people here'.format(num_people=num_people))
```
Don’t use old-style %s string formatting in Python, use the new .format() method instead.

Strings formatted with .format() give translators more context. The .format() method is also more expressive, and is the preferred way to format strings in Python 3.

Good:
```
"Welcome to {site_title}".format(site_title=site_title)
```
Bad (not enough context for translators):
```
"Welcome to %s" % site_title
```
Use descriptive names for replacement fields in strings.

This gives translators more context.

Good:
```
"Welcome to {site_title}".format(site_title=site_title)
```
Bad (not enough context for translators):
```
"Welcome to {0}".format(site_title)
```
Worse (doesn’t work in Python 2.6):
```
"Welcome to {}".format(site_title)
```
Use TRANSLATORS: comments to provide extra context for translators for difficult to find, very short, or obscure strings.

For example, in Python:
```
# TRANSLATORS: This is a helpful comment.
_("This is an ambiguous string")
```
In Jinja2:
```
{# TRANSLATORS: This heading is displayed on the user's profile page. #}
<h1>{% trans %}Heading{% endtrans %}</h1>
```
In JavaScript:
```
// TRANSLATORS: "Manual" refers to the user manual
_("Manual")
```
These comments end up in the ckan.pot file and translators will see them when they’re translating the strings (Transifex shows them, for example).

Note

The comment must be on the line before the line with the _(), ungettext() or {% trans %}, and must start with the exact string TRANSLATORS: (in upper-case and with the colon). This string is configured in setup.cfg.

Todo

Explain how to use message contexts, where the same exact string may appear in two different places in the UI but have different meanings.

For example “filter” can be a noun or a verb in English, and may need two different translations in another language. Currently if the string _("filter") appears in different places in CKAN this will only produce one string to be translated in the ckan.pot file.

I think the right way to handle this with gettext is using msgctxt, but it looks like babel doesn’t support it yet.

Todo

Explain how we internationalize dates, currencies and numbers (e.g. different positioning and separators used for decimal points in different languages).