Polly
Definition
AWS Polly is a text-to-speech service that converts written text into lifelike spoken audio, enhancing accessibility and user engagement in applications.
Use Cases
- Duolingo: Generate spoken examples and pronunciation audio for language learners at scale — Duolingo has publicly discussed using text-to-speech to produce large volumes of audio for exercises. A typical implementation pattern is generating audio from lesson text via a TTS API, caching the resulting audio files, and serving them through a CDN to mobile and web apps. (Enables rapid content creation and consistent audio quality across many languages without recording every phrase with human voice talent, improving scalability and time-to-publish.)
- The Washington Post: Offer audio versions of written articles for accessibility and on-the-go listening — The Washington Post has publicly described using Amazon Polly to create audio narration for articles. A common approach is to convert article text to speech, store the audio in object storage, and embed an audio player on article pages. (Improves accessibility and increases engagement by letting readers listen to content, expanding how audiences consume news.)
- PBS: Provide audio experiences and improve accessibility for digital content — PBS has been referenced in AWS case studies using Amazon Polly. A typical pattern is integrating Polly into content workflows to generate speech from scripts or article text, then distributing audio through web/mobile experiences. (Helps broaden accessibility and supports new audio-driven experiences without requiring manual recording for every update.)
Provider Equivalents
- AWS: Amazon Polly
- Azure: Azure AI Speech (Text to Speech)
- GCP: Google Cloud Text-to-Speech
- OCI: OCI Speech (Text to Speech)
Frequently Asked Questions
- What's the difference between Amazon Polly and Amazon Transcribe?
- Polly converts text into speech (text-to-speech). Amazon Transcribe does the opposite: it converts speech audio into text (speech-to-text). Use Polly when you need a voice to read text; use Transcribe when you need written text from recordings or live audio.
- When should I use Amazon Polly?
- Use Polly when you need to generate spoken audio from text, such as reading articles aloud, adding voice prompts to an IVR/contact center, creating accessibility features for users with visual impairments, generating audio for e-learning, or producing voiceovers for apps where recording human narration for every change would be slow or expensive.
- How much does Amazon Polly cost?
- Polly pricing is typically based on the number of characters you convert to speech, and the rate depends on the voice type (for example, standard vs neural). Costs also include any related services you use to store and deliver audio (like Amazon S3 and CloudFront). For exact rates and free tier details, check the current Amazon Polly pricing page because prices can change by region and over time.
Category: ai-ml
Difficulty: basic
Related Terms
See Also