Making all domain names work in all applications
Universal Acceptance (UA) is the idea that domain names and all email addresses should work in all software applications. The early Internet was limited to the ASCII character set, and as first standardised, names in the Domain Name System were restricted to the Latin characters a-z, the digits 0-9, and the hyphen character “-”. The objective of UA is ensuring that non-Latin script domain names, including new Top Level Domains and Internationalised Domain
Names, operate as successfully as those in the ASCII character set.
Unfortunately most IDNs, while registered in the DNS and available for resolution, do not work like older ASCII domain names. For people using domain names in their native languages, many websites are not recognizing them as valid email addresses, and people cannot complete their transactions.
In order for devices and systems to be truly multilingual, they must accept, validate, store, process and display all domain names. Until Universal Acceptance is achieved, it is not possible to provide a consistent and positive experience for all Internet users.
Improving linguistic diversity online
IDNs in application
For IDNs to be universally accepted, they must be able to be used anywhere a traditional domain name is used. However, many web applications and software tools make assumptions about domain names and email addresses.
One of the great challenges facing universal acceptance of IDNs is ensuring that the hardcoded assumptions built into applications don’t create barriers to the use of IDNs and EAI addresses.
One of the few bright spots for IDN Universal Acceptance is in the use of IDNs as content. If an application or service displays a URL, it should recognize that it is a link to an external resource and do the expected action when the text is clicked upon.
When IDNs appear as part of a web page, they should display and function in the way that any URL. Also, web pages that have IDNs as “content” should handle the URL appropriately.
As the number of devices – the Internet of Things – connected to the Internet becomes ever larger, the pressure to control, deploy, update and retrieve information from those devices will be intense. Using the DNS to identify some of those devices will be the best choice in some deployments. Without universal acceptance, internationalisation of the infrastructure of the Internet of Things will be far more difficult.
Emails and IDNs
For universal acceptance, there are two, huge challenges that seem to prevent true internationalisation of the Internet. The use of IDNs as personal identifiers, and Email Address Internationalisation (EAI).
Email addresses consist of two parts separated by an “@” symbol. At the front of the “@” symbol is a string called the user portion (technically known as the “local-part”). Behind the “@” symbol is usually a domain name. To achieve universal acceptance of IDNs we should be able to use any IDN as the domain name and also use the non-ASCII script for the user portion. Not only should we be able to address email with these internationalised addresses, but we should be able to send and receive them as well. A system that allows a user to address email, and send and receive it with IDNs is often referred to as Email Address Internationalisation (EAI).
We should be able to use email addresses that look like:
It is crucial to understand that this does not just apply to the top-level of the domain name. In a fully EAI compliant system, we should be able to use IDNs anywhere in the domain name string where they are legally permitted by IDN and registry standards.
For all its utility, electronic mail is surprisingly complex. Even the human-facing component can be a standalone piece of software (e.g. Outlook), a web page (e.g. the basic interface to Gmail), or a mobile client that simply queries a server and synchronises its view of the available email with that of the server. On the server side, there are two major ways to arrange for the pickup of electronic mail. Finally, in between the sender and receiver are mail transmission agents (MTAs) that arrange for the forwarding of email from one place to another.
Electronic mail works because all of these components are standardised – they are interoperable. Interoperability means that any computer, running any software can connect any way it likes to the Internet and send and receive email – as long as it abides by the standards for email. Electronic mail is so complex that Internet Engineering Task Force (the standards development organisation) has published a guide to understanding the entire ecosystem.
The standards that make up the traditional electronic email ecosystem are very old (the basic Internet email format was codified in 1977). The installed base of servers
nd clients is also extremely large. Those two facts, taken together, make change to the email ecosystem very challenging.
Older, legacy, email messages consist of three parts: the envelope, the headers and the body.
The envelope of the message contains metadata or information about the message (e.g. when it was received, the size of the message, how important it is, etc.).
The header is a set of fields such as the sender address, the subject, the date the message was sent and other information provided by the sender of the message.
The body of the message contains the text of the message and any attachments.
When a message is sent, the sender provides “From:” and “To:” addresses as well as a “Subject:” and the content of the message. As we have seen above, the legacy address is of the form local-part@domain name. The domain name part is an LDH-based string and the local part can be arbitrary ASCII characters. The fundamental problem for EAI is changing email so that it can use internationalised scripts in both the local part and the domain name.
Because the underlying system has so many parts, solving the EAI problem is quite complex.
At a high level, solving the problem seems simple. When a user agent (e.g. Outlook) wants to send an EAI message to another user, it needs to be sure that all the infrastructure between sender and receiver, plus the receiver itself, can handle the email. For one of the two major mail protocols, SMTP, the solution is in an extension to the older email standards. Computer applications can test whether the receiving computer can support the extension that supports EAI, called SMTPUTF8.
That seems easy, but what happens when a sender can’t find a receiver that handles SMTPUTF8? What happens when the message is delivered to the recipient, but they do not have a user agent that can handle EAI?
In fact, the technical, standardised solution for EAI has been around for more than a decade. However, the related infrastructure and deployment challenges still remain. There are two, crucial challenges that need to be overcome, before EAI is achievable:
The client software (for instance, Outlook or Thunderbird) needs to be able to display, process and store the internationalised address. For instance, the client software should display EAI addresses in unicode, while passing the domain name to the mail server in Punycode.
The server software must support EAI and allow for the transfer of the mail in a manner that preserves the EAI address.
IDNs and Browsers
For many people, the browser is the essential and principal human interface to the Internet. Because of this, IDN support in browsers is essential.
Support for IDNs in major commercial browsers is excellent. In the past, a browser was a separate piece of software (an “application”) that made requests of web servers and rendered the results of those requests on a display. Today’s browser appears in cars, on tablets, on watches and in settings that wouldn’t have been imagined ten years ago. Even so, IDN support remains crucial. In today’s Internet, a human uses a piece of software to dynamically interact with services and content and to act as a go-between between complex software, content and people. That we call such a tool a ‘browser’ is more of a nod to the history of the Internet than a reflection of what that tool actually does.
If IDNs are to be usable everywhere, then the marketplace’s emphasis on portability and size needs to also reflect acceptance of IDNs. In particular, smartphone, tablets, e-readers and other portable devices should show the same progress in accepting, using and displaying IDNs as desktops and laptops do.
A History of Universal Acceptance Challenges
The early Internet was limited to the ASCII character set. As we have seen elsewhere, this had dramatic implications on the development of a multilingual Internet for both the operations and content of the Internet. The technical ability to transmit and display content in other scripts – to support the internationalisation of the Internet – started in the 1990’s. However, the Domain Name System (DNS) lagged behind.
As first standardised, names in the DNS were limited to a subset of ASCII characters including the letters a-z, the digits 0-9 and the hyphen character (“-“). All registrations in the DNS were, for a long time, limited to this, so-called, LDH restriction. Support for a more diverse character set was first provided in a set of standards published by the Internet Engineering Task Force (IETF) in 2003. However, just because the standard was published, did not mean IDNs were widely available or usable.
The World Report on IDNs studies the availability and trends for registrations of IDNs, this section focuses on how usable those IDNs are once they are registered and resolvable.
Universal Acceptance (UA) is a metric. It is a measure of how well IDN domain names are accepted, displayed, stored and processed by the Internet’s applications and infrastructure. In previous research we have said that UA is a measure of how ‘usable’ an IDN is. UA measures the ability of IDNs to be used in the same way as traditional domain names. Another definition might be: UA is the state where an IDN can appear and be used anywhere an ASCII domain name appears, with predictable, reliable and appropriate results.
The DNS is a fundamental yet evolving part of the Internet’s infrastructure. The ability to use a domain name as part of a query for other information is a part of the Internet we often take for granted. The ubiquity of the DNS has been a source of the UA challenges for IDNs. For some time, application developers for the Internet presumed, erroneously, that all domain names ended with two or three characters – and that those characters were always ASCII.
That misunderstanding has led to the principal problem for Universal Acceptance: while it may be possible to register and resolve IDNs, if software and Internet infrastructure does not accept, process, display and use those IDNs correctly, they remain unable to fulfil the potential of a truly rich, multilingual Internet.
So what is the source of the problem?
To support IDNs, the Internet changed the DNS so that it supported non-ASCII characters. To do this, the DNS evolved to support the Universal Coded Character Set, known as unicode (or UTF8). Since the DNS was only built to support ASCII characters, there needed to be a translation from Unicode to ASCII strings – as well as a translation in the other direction. The Unicode-to-ASCII translation is called punycode and results in a translation for every IDN to a string called an “A-label.” The A-label for IDNs is easy to recognise because is always starts with the ASCII characters “XN–”
Older parts of the Internet would have no problem processing, supporting and displaying A-labels because they are simply parts of an ASCII domain name. The problem for IDNs emerges when the unicode characters are used, stored or displayed by the Internet’s infrastructure. Older software and applications do not have the ability to store, process and display the unicode characters properly. As we have seen, even newer software fails to take the unicode labels into account. The result is that the IDNs, while registered in the DNS and available for resolution, do not work like the older ASCII domain names. Simply stated, IDNs face a barrier not faced by other domain names.
Universal Acceptance is a problem that is not limited to IDNs. With the expansion of the DNS root zone starting in 2013, many new top-level domain names appeared in the public Internet. Many have characteristics (especially string-length) that make previous assumptions about top-level domain names fail. As we have seen in previous years, this is a problem that significantly affects user account creation and validation.
In addition, IDNs appear in strings that are not used in the traditional DNS. It is very common to use a domain name as part of an username identifier. Domain names also form a crucial part of public email addresses. Domain names also appear in Internet infrastructure settings such as digital certificates. With the advent of the Internet of Things, domain names have the potential to be an identifier for many billions of devices and sensors. For internationalisation to succeed, IDNs must be accepted in these diverse settings which go well beyond the confines of the DNS.
The standard for Internationalised Domain Names was largely completed in 2008. However, the unicode standard, upon which IDNs are built, is an evolving and changing document. When unicode is changed, now characters appear and properties of existing characters (or, “codepoints”) are modified. In 2018, the IETF was at work bringing the IDN standard up-to-date to reflect changes in unicode. In addition, the IETF is currently developing a new standard that would help registries validate and IDN prior to registration.