Internationalization: Making the Web Accessible to Everyone

Ecma International, Technical Committee 39, or simply TC39, is a group of JavaScript developers, technology implementers, academics and other interested parties who, together with the community, support and develop JavaScript as a platform.

TC39 participants usually share something interesting using their deep understanding of JavaScript. But some people think that they have gone too far from the problems of ordinary developers. Where is the language developer, and where is the person who writes frontends every day in practice?

Let's get acquainted with the report, which combines both the depth of understanding and high practical applicability . Meet Romulo Cintra’s new story about internationalization issues that will be addressed by the new API, which will soon appear in JavaScript.



Romulo Cintra - TC39 delegate, has been working in development and architecture for over 10 years, specializing in web, mobile development and clouds. In this report, you will learn first-hand the co-chair of MessageFormat Working Group , which options are already available for solving existing problems, and in what form they are going to be solved using the new API in JavaScript itself.

Under the cut, the full textual transcript of the Romulo report and a link to the video. If you like to read - this article has it all; you will not miss anything. If you have time to start the video recording, then you will have about an hour of good video recording with interesting slides and understandable English.

Further narration on behalf of the speaker.


There are three things you need to know about the state of internationalization and localization: everything is very, very, very bad. My name is Romulo Cintra, and I am involved in financial architecture applications. I talk a lot with people from TC39 and see how they try to make the JavaScript world a better place. In addition, I am a strong supporter of open source, and in my free time I am a teacher at the School of Technology in Barcelona.

The issue of internationalization is very important. It so happened that on our Earth there are many different peoples and languages. In the world today there are about 195 countries and 6 thousand languages. This makes our task extremely difficult. Think about something else: I write articles and read reports in English, not in Russian; we already have a problem of internationalization. If someone does not speak English, he will be excluded from our conversation with you. To prevent this from happening, internationalization was invented.

English internationalization abbreviate as i18n. The number 18 is the number of characters between the letters i and n in this word. In short, internationalization is the design of software in order to simplify localization as much as possible. Thanks to internationalization, the software can support local settings, language, currency, and so on. Internationalization makes the web more accessible to everyone. Here you can draw a parallel with the development through testing (test-driven development): there first the test is written, and then the code; with internationalization it is necessary to do the same. Usually people think about internationalization after the code is written, but it’s wrong.

Similar to i18n, l10n is the abbreviation for localization, and 10 is the number of characters between the first l and last n. Localization is the adaptation of a product to the language and cultural environment in which it is distributed. That is, you need to not only translate "Hello" to "Hello", but also use the local currency, decimal separator, and so on, that is, make the software more familiar to the user. It is more than just a translation.

How many languages ​​do your web projects support? Many have more than two. Is there anyone who has more than five? What about 15? We support about 25 languages. We do not have very good support, because internationalization is not organized in the best way. In the course of the report, I will explain how to improve internationalization, and talk about the measures that we are taking.

So, I repeat once again: internationalization means simplifying localization, providing support for it at the architecture level. And localization is the adaptation of software to local realities. The translation very often does not correspond to the original - let's take an example from the film industry, where the name of the film “Pain and Gain” was translated as “Blood and Sweat: Anabolics”.



Or another example: an advertisement for a Russian bath, in which the English translation says “Russian crematorium” (Russian crematorium).



I doubt that this will attract customers, at least alive. Internationalization and localization are so important precisely because they allow us to convey exactly what we want to say to the user. In essence, internationalization is the provision of accessibility, because if you cannot understand the software you work with, then in fact your options are limited. It is beneficial for everyone that the software is more accessible, because it provides a wider range of users, and therefore, a higher income; it also makes the web better.

Messageformat


Consider the code examples:

'es-ES': { 
    HELLO_WORLD: '¡Hola mundo!' 
}, 
'en-GB': { 
    HELLO_WORLD: 'Hello world!'

We need to translate our string objects. We have a variable HELLO_WORLDwith corresponding lines in each language. For such a translation in many languages ​​(for example, Java), the MessageFormat format exists . Let's try to figure out what it is. First, a little about some basic technologies - let's start with Unicode. This is a standard that creates a single space for characters from different languages. Let's draw an analogy with chess: each type of pieces can have a different shape, but we always know where exactly they should be on the board. Well, of course, there are different Unicode formats with different numbers of bytes: UTF-8, UTF-16 and UTF-32. Now the most commonly used meta tag is UTF-8. With Unicode, the browser can display special characters, if this tag`metaforget, no one will understand what kind of symbols we have on the page.

In addition to Unicode, two other important technologies are CLDR and ICU. CLDR is a kind of database of alphabets, countries, currencies, time zones, etc., stored in different languages ​​of the world. Not all 6 thousand languages ​​of the world are present in it, work is still underway on this database. The last update was last October. Another important project is the ICU. This is a huge database of words, numbers and symbols from different languages, which are provided in the form of methods for sorting, normalizing, formatting, etc. These libraries are used in many programming languages. In JavaScript, ICU is at the core of the Intl API. But there are so many diverse materials in the languages ​​of the world that need to be displayed in browsers that the work of including them in these standards is far from complete.

MessageFormat is a format that allows you to associate a specific key with a specific message in a specific language. In some cases, variables can be passed to MessageFormat , it defines them and enters them in the final line. The same problem was solved in a slightly different way in other languages. In Android, MessageFormat is implemented in Java. There, to work with this format, a special library is not needed; Android is able to interact with it itself. In iOS, there is an API that is very similar to the one in JavaScript. It is built into the system, there is no need to download anything there either, just pass the necessary line to the method of this API.

How was this problem resolved in JavaScript? Not yet. But we have many libraries that offer a solution.



It shows the number of downloads of the most popular of them (and two less popular, fluent from Mozilla and fbt from Facebook). Almost two million downloads take place every week, so there is a need for libraries for internationalization.

Libraries


We’ll briefly introduce some of these libraries, and start with i18next. Its developers have made many changes to MessageFormat , and it does not fully follow the ICU. However, it is a very good library. Its implementation of MessageFormat has many advantages, for example, the ability to use string interpolation (which is not available in ICU format). However, there are also disadvantages, for example, plural messages cannot be placed on the same line as the only one, as can be done in the ICU.

One of the most famous internationalization libraries is intl-messageformat. Every week it is downloaded more than 700 thousand times. Her support is handled by my colleague, Long Hu. Its popularity is explained by the fact that react-intl is created on top of it. So if you use React, then most likely you have this library. Her developer also participates in ECMA-402, and therefore he tries to comply with the ICU standard.

var MESSAGES = { 
    'en-US': { 
        NUM_PHOTOS: 'You have {numPhotos, plural, ' +
            '=0 {no photos.}' +
            '=1 {one photo.}' +
            'other {# photos.}}'
}, 
    'es-MX': {
        NUM_PHOTOS: 'Usted {numPhotos, plural, ' +
            '=0 {no tiene fotos.}' + 
            '=1 {tiene una foto.}' 
            +'other {tiene # fotos.}}' 
    } 
};

Its implementation is very similar to MessageFormat . Here you can pass variables and indicate the need for the plural.

Before moving on to the code examples, I will talk about two more new libraries that are now in fashion, they were created by Facebook and Mozilla.



The whole API cannot be shown as a whole, but take my word for it: the developers of these libraries did their best, there is exactly what we are missing right now. True, Facebook made it in its own style: its own markup, the ability to run during layout, the extraction of hash maps from strings that can be automatically translated. The problem is that all this is focused on the scale with which the average programmer rarely works. The project is very young, and they want to integrate it with other well-known libraries, for example, with React. In the future, he is likely to gain popularity.

All of the above are libraries that need to be additionally downloaded, they are not built into the browser. With only one browser, we will not go far, so everything is bad with localization. MessageFormat can help us change this state of affairs . While we can not use it, but believe me: the future lies with him. Now we are actively working on it, establishing stakeholders, looking for fresh ideas for the new MessageFormat . In its original version, this format is already outdated, the needs of developers have evolved significantly since its inception. The new format should be made efficiently and easy to use.

Intl.DateTimeFormat


Browsers already have many built-in mechanisms for internationalization and localization, most of them simply do not know about them and do not use them. Have you heard about Intl.DateTimeFormat? In this project, we are constantly creating new APIs. It is likely that there is no longer need for Moment.js , Day.js , date-fns .

const myDate = new Date(); 
new Intl.DateTimeFormat('ru', { timeStyle : 'short'}).format(myDate); 
// short → 19:49 
// medium → 19:49:17 
// long → 19:49:17 GMT+2
// full → 19:49:17  ,  

There is timeStyle, it was created a few months ago and it allows you to format the date and time without resorting to Moment.js. In addition, there is a formatRange method . Any task associated with choosing a date range (such as on sites with a reservation function) is always difficult. But the method for this already exists in the browser. And, most importantly, this method supports internationalization, while eliminating the need to download additional libraries.

Intl.RelativeTimeFormat


I worked on the documentation for the second part of this project, and if you also want to participate, we need help with translating into Russian and in compliance with standards. RelativeTimeFormat is needed when you need to do a countdown.

const myTime = new Intl.RelativeTimeFormat('ru', { style: 'narrow' }); 
myTime.format(2 , 'quarter'); 
//Style Narrow : +2 . → in 2 qtrs. → dentro de 2 trim. 
//Style Long :  2  → in 2 quarters → dentro de 2 trimestres

Now this is quite simple to do, you can specify the time in two days, two weeks, in a quarter, etc. Previously, such formatting on the web did not exist.

const myTime = new Intl.RelativeTimeFormat('ru', { style: 'narrow' }); 
myTime.format(2 , 'day'); 
//Style Narrow : +2 . → in 2 days → dentro de 2 días 
//Style Long :  2  → dentro de 2 días myTime.format(-1 , 'day');
//Style Narrow : -1 . → 1 day ago → hace 1 día 
//Style Long : 1   → 1 day ago → hace 1 día //Numeric(auto) :  → yesterday → ayer 

Here is an example in Russian. You yourself can test the operation of this code, because it is already in your browser.

const myTime = new Intl.RelativeTimeFormat('ru', { style: 'narrow' }); 
myTime.format(20 , 'seconds'); 
//Style Narrow : +20  → in 20 sec. → dentro de 20 s 
//Style Long :  20  → in 20 seconds → dentro de 20 segundos

This method is very useful, it can give time in the short format that you see above. I emphasize that for all this you do not need to use any third-party libraries.

Intl.NumberFormat


The next thing I wanted to share is Intl.NumberFormat . I will talk about the third stage, but only the second is presented in the examples, because some changes are still being discussed. Intl.NumberFormat works with units and record forms. It is worth paying attention to what he does with units: he allows you to work with different styles.

new Intl.NumberFormat("ru", { 
style: "unit", 
unit: "liter", unitDisplay: "long" 
}).format(16); 
// → 16  → 16 liters → 16 litros

All units are taken from UTC 35, and there are a lot of them. In total, about 140 units for formatting are presented here. So now internationalization is easier than ever. You just need to translate your lines, and all the necessary dynamics is already contained in the browser.

const nbr = 987654321; 
new Intl.NumberFormat('ru', { notation: 'scientific' }).format(nbr); 
// → 9,877E8 → 9.877E8 (en-US) 
new Intl.NumberFormat('ru', { notation: 'engineering' }).format(nbr); 
// → 987,654E6 → 987.654E6 (en-US) 
new Intl.NumberFormat('ru', { notation: 'compact' }).format(nbr); 
// → 988  → 988M (en-US) → 9.9亿 (zn-CN) 
new Intl.NumberFormat('ru', { notation: 'compact', compactDisplay: 'long' }).format(nbr); 
// → 988  → 988 millions (fr)

Now for the recording forms. To be honest, I don’t use them too often, because I don’t use the form of recording with the exponent (scientific record), and I don’t need to present large numbers. But if you need it, then there is a corresponding API specially for you.

Intl.ListFormat


Another useful API is Intl.ListFormat , which is already in the third stage and allows you to format lists in two different ways. Suppose I need to say the phrase "I'm going to HolyJS." We can make a list that includes the lines “Moscow” and “St. Petersburg ”, specify the parameter“ conjunction ”, and the lines will be combined by the union of the Russian language“ and ”. This is a completely new feature, and very useful.



If you specify "disjunction", then we get the union "or".



Finally, the function can automatically determine the language and alphabet used and sort the list items accordingly.

Intl.PluralRules


Another important API is Intl.PluralRules . This API is the oldest of all, but for some reason no one uses it.



When I see the lists of finalists in races or in football, the numbers are always indicated next to the names: “1”, “2”, “3”, etc. But this does not correspond to the way we say it would be much closer for speech, write “1st”, “2nd”, “3rd”. And for this there are special APIs, which are not so difficult to use.



For example, we can write the phrases “1 cat”, “0 cats”, “0.5 cats”, “1.5 cats”, and the API will automatically select the correct plural ending.

Intl.DisplayNames


This is one of the most popular APIs, because we very often have to display lists of countries. Suppose we have a list of countries - for example, in a database or in JSON. Then, each time you switch the language, we need to load a separate JSON with a new list of countries, currencies, and so on. There are too many of these JSONs, and how does it end? We create a microservice in which a database with various languages ​​is built-in, and we extract all the data from it. Of course, in the example with the list of countries we were lucky and we need to update the data infrequently - but it will not always be like that, right? We cannot solve all problems at once, but DisplayNames solves some of them. You have the API as in the example below, and you can request only a list of currencies or only a list of countries:

const currencyNames = new Intl.DisplayNames(['en'], {type: 'currency'}); currencyNames.of('USD'); // "US Dollar" 
currencyNames.of('EUR'); // "Euro" 
currencyNames.of('TWD'); // "New Taiwan Dollar" 
currencyNames.of('CNY'); // "Chinese Yuan"


const languageNames = new Intl.DisplayNames(['en'], {type: 'language'}); languageNames.of('fr'); // "French" 
languageNames.of('de'); // "German" 
languageNames.of('fr-CA'); // "Canadian French" 
languageNames.of('zh-Hant'); // "Traditional Chinese"

This is a very useful thing. It works not only with countries and currencies: in the same way, you can work with months, days of the week and many other things that you as a developer need.

Results and plans for the future


So far, we have talked about existing APIs. Let's move on to our plans for the future. My mother tongue is portuguese. So on my sites I need to support at least Portuguese and English. And since we are very close to Spain, Spanish is also useful. Portugal is a very small country, and France is also not so far away, so it would be nice to add French to this list.

For us MessageFormatvery relevant, and it will appear soon. There are libraries, and there are developers who work on them. All of these developers are working on related issues. Most creators of the most popular libraries and most large companies (Netflix, Amazon, Facebook) agree at least in one thing: now there is an urgent need for internationalization. This is also indicated by two million downloads per week. So now we can afford to write MessageFormat again, and do it in a quality manner .

Who will benefit from proper internationalization? The whole web: all companies, all projects, all libraries. Libraries like Intl.MessageFormatwill not disappear anywhere, but will begin to work in a new way. You will not need to download data, since all the data will already be in the browser. Most likely, you will not need to switch to a new library. Some of these libraries already function as polyfills for some implementations. Some implementations that I mentioned are in the third stage and are not implemented in all browsers. But libraries like Intl.MessageFormat provide polyfillings for this functionality. In general, a new chapter in the history of the web is coming - a real revolution. The web will become accessible and understandable to everyone. This is extremely important.

I believe that it is very important to ensure the uniqueness of our project. If there is one format that can be used in C ++, Java, and JavaScript, then why not use this format everywhere? When we write web pages, we often need to create mobile versions of them, in which case we have to do a lot of work twice. If we had one format for everything, then we could just use the existing resources and API. We need a new level of integration with tools. Internationalization is provided not only by the work of developers directly involved in it. For her, modularity is extremely important, because it is often convenient to use your own formatting tools, your own code. Therefore, you should not close the API, they must be open so that they can connect what the situation requires.Another important point: these APIs must be native. CLDRs provide the data that the internationalization API needs. If you are running Windows or MacOS, then you are already downloading data from the CLDR. CLDR is a unique repository; nobody duplicates its function. This means that data can be downloaded only once and made common to the entire operating system. If all the data for the Intl API is already loaded in the operating system, then why not provide it for all the software on this system?If all the data for the Intl API is already loaded in the operating system, then why not provide it for all the software on this system?If all the data for the Intl API is already loaded in the operating system, then why not provide it for all the software on this system?

Our experience taught us to remember that we are not alone, that not only programmers are working on internationalization. We are developers, not translators. Suppose we need to do a line feed in our interface, we send them to the translation company. But translators often have no context for these lines. This is also absent in MessageFormat . Sometimes this leads to errors, as we have already seen in the mentioned example with the Russian crematorium.

Finally, I believe that APIs for internationalization should be easy to use, and everyone should be able to do internationalization — this should not take too much time and effort. When writing code for internationalization, you need to be guided from the very beginning. After all, with TDD, they first write a test, and then code; let's start our web projects on this principle with the right internationalization and localization. This will allow us to create sites that are convenient and accessible to everyone.
HolyJS 2020 Piter «Speak my language %app%». , , : . HolyJS 2020 Piter . , .

All Articles