Modality Translation Services on Demand Making the World More Accessible For All

  • Title: Modality Translation Services on Demand Making the World More Accessible For All
  • Publication Type: Conference Proceedings
  • Authors: Zimmerman, G., & Vanderheiden G.

Full Text


Thomas Jefferson's words, "Information is the currency of democracy," pertain to today's information society more than ever. Exclusion from information can keep a person from fully participating in society. The problem is not a shortage of information. Indeed, we may often experience information overload. The question is, how can we access information in the way we need (in the appropriate "currency") to be able to use it? For example, it would be inappropriate for a person to visually read e-mail while driving a car because the eyes are busy watching the road and traffic. However, the driver could use a "text-to-speech service" to voice the e-mail messages. Another example is a blind person participating in a business meeting where a diagram is being discussed. Here, an "image description service" could provide a verbal translation for the visual diagram.


The "Modality Translation Services" concept is a variety of remote services available anywhere, anytime [1]. These services are becoming possible as a result of recent technological advancements in wide-area, high-bandwidth networks and wireless communication technologies.

Service Spectrum

Modality translation services render information from one specific presentation form (mode) to another. Within the wide spectrum of possible services, each service is tailored to a person's communication needs regarding temporary, or permanent functional limitations (see figure 1).

  • Text-to-Speech facilitates eyes-free interaction and requires no reading skills for the user. A human reader, or a speech synthesizer, could deliver this service.
  • Speech-to-Text facilitates real-time ears-free interaction without requiring typing or writing by the user. A human steno-typist, or automatic or human-assisted voice recognition technology, could deliver this service.
  • Speech- to-Sign facilitates communication between a person speaking and a person who is deaf signing. A human sign interpreter, or a signing avatar (computer-animated character on a display), could deliver this service.
  • Sign-to-Speech facilitates real-time communication between a deaf person who signs and a non-signing (hearing) person. A human sign interpreter, or an image and sign recognition system, could deliver this service.
  • International Language translates text, or real-time speech, from one language to another. A human language interpreter, or a machine translation system, could deliver this service.

Modality Translation GraphicD

Figure 1: Modality Translation Service Spectrum

  • Language Level simplifies text, or real-time speech, presented in a complex language (cognitive) expertise level. A human interpreter, or an automatic information extraction system, could deliver this service.
  • Image/Video Description provides speech or text translation from a visual image or video. A human service provider, or an automated system with computer vision, text generation and speech synthesizing capabilities could deliver this service.

Try Harder Feature

While some of these services can be provided in a fully-automated manner today (e.g. text-to-speech synthesizers for the text to speech service), others may need human assistance for some time (e.g. speech to text, language level translation, and image/video description service). Although more automated services will be implemented with emerging technologies, the early implementations may not be as mature as needed in some cases. In these situations a "Try Harder" feature could be used to harness more powerful applications (network advanced services), and use human assistance in the automatic translation process when technology fails to be effective in certain environments and for certain materials [2].

Service Access Devices

To use these on-demand translation services, a person needs to have a device that connects remotely to a global, high-bandwidth network and renders information on a display, or through other output units. Although any kind of computer system can be used as an access device, the small, wireless devices bring the real "anywhere at anytime" feature to the user. Examples include handheld computers, cell phones, etc., outfitted with earbuds, buttonhole microphones, or eyeglasses with a built-in monitor.

Who are the users?

We identify four user groups that could benefit from the "Modality Translation Services" concept, differing only in the kind of functional limitation they encounter:

  • People with permanent functional limitations such as hearing, visual, and cognitive impairments;
  • People with temporary functional limitations like a car driver (cannot use his eyes for reading on a display), a worker in a factory building (cannot hear because of the noisy environment), or a manager in a meeting who needs accurate minutes (cannot type as fast as participants speak);
  • People using small (and wireless) Internet devices with restricted input and output capabilities (e.g. handheld computers, or cell phones);
  • People facing information given in a different language (having insufficient reading or hearing skills in that language).


Many of these services are already implemented in a human-assisted, semi-automatic or full-automatic manner. Examples of the speech-to-text service include Ultratec's Instant Captioning™ technology [3] and the Classroom Captioner from Personal Captioning Systems [4]; for the speech-to-sign service the Signing Avatar™ from VCom3D [5]; and for the international language service the AltaVista Babel Fish translation service powered by SYSTRAN [6].

In order to make these services available to a broad user basis they should be embedded in a globally available telecommunication network. As part of the Partnership for Advanced Computational Infrastructure (PACI) [7] the Trace Center is currently investigating options and promoting feasible solutions for modality translation services on the Grid, and other next-generation networks, services and computational resources [8].


[1] Zimmermann, Gottfried & Vanderheiden, Gregg (2001). Translation on Demand Anytime and Anywhere. CSUN's 16th International Conference, March 19 - 24, 2001, Los Angeles, CA.

[2] Vanderheiden, Gregg (in press). Telecommunications – Accessibility and Future Directions. In: Abascal, & Nicolle, (eds.). Inclusive Guidelines for HCI. Taylor & Francis Ltd., in press.

[3] Ultratec Inc., Instant Captioning™ Technology.

[4] Personal Captioning Systems

[5] VCom3D, Inc. Signing Avatar™

[6] AltaVista Babel Fish translation service

[7] Partnership for Advanced Computational Infrastructure (PACI), National Science Foundation (NSF)

[8] Universal Design/Disability Access Program (UD/DA) for Advanced Computational Infrastructure, Trace R&D Center,


This paper was partly funded by the National Science Foundation (NSF) in the context of the Universal Design/Disability Access Program (UD/DA) [8].

Gottfried Zimmermann, Ph.D.,
Trace R&D Center, 2107 Engineering Centers Bldg., 1550 Engineering Dr. Madison, WI 53706

Modality Translation Services, Figure 1 Alternate Text Description

Figure 1 represents the seven different aspects of the Modality Translation Services concept, how the services could be delivered by a local or remote automatic implementation, or remotely with human assistance, and what electronic devices could deliver these services to a person.

The largest object in the graphic is a half circle arch, or rainbow, with three different colored bands. Each band represents a different service to get information translated. The first band, located on the inside of the rainbow, is titled "local automatic services". The second band, in the middle of the rainbow, is titled "network advanced services". The third band, on the outside of the rainbow, is titled "human assisted services". Below the bottom of each end of the rainbow of services are two arrows pointing outward with the words "try harder" written on them. The arrows start below the innermost band of the rainbow below local automatic services and end below the last band of the rainbow titled human assisted services. The try harder arrows indicate a smooth transition from local automatic services to network advanced services to human assisted services.

Seven text bubbles are distributed evenly across the services rainbow. Each bubble spells out one of the seven Modality Translation Services. >From left to right, the services include:

  1. text to speech service
  2. speech to text service
  3. speech to sign service
  4. sign to speech service
  5. international language service
  6. language level translation service
  7. image/video description service

In the middle of the rainbow arch is an icon that represents a person. There are seven sets of continuous lines with arrows that start near the person under the rainbow arch and point toward a service bubble on the rainbow arch. These continuous lines with arrows then make a U-turn back toward the person under the rainbow arch.

Centered below the person icon at the center of the diagram is a title that reads "service delivered via...". Below the title, a series of icons are shown that represent delivery devices. From left to right these seven icons are a desktop computer; a palmtop computer; an earbud; a cell phone; four computers circled around a central database that represents a network; a pair of eyeglasses with a built-in monitor; a small microphone; and a laptop computer.

Image of the Trace Center logo.
Trace Research & Development Center
College of Information Studies, University of Maryland
Room 2117 Hornbake Bldg, South Wing
4130 Campus Drive
College Park, MD 20742
Copyright 2016, University of Maryland
Tel: (301) 405.2043
Fax: (301) 314.9145
Trace Center