Harnessing AI to create more accessible audio and video content
As an accessibility specialist, I often find myself frustrated by the lack of accessible audio and video content out there. It's disheartening to see that the majority of video content lacks closed captions. It’s even rare to find a podcast with a transcript for those who are deaf or hard of hearing. However, AI-powered solutions might change that.
My colleague Sabatino Masala, head of technology and AI expert at Craftzing, was telling me how he was experimenting with AI to summarize YouTube videos. I quickly realized that this could improve the accessibility of audio and video content on a large scale.
We converted video content into text, providing the audience with a host of benefits, including:
A full transcript, making it searchable.
Accurate closed captions, enhancing accessibility for viewers who are deaf or hard of hearing.
A list of highlights.
A summary with highlighted keywords, allowing your audience to quickly grasp the core message.
If you find yourself publishing a substantial amount of video content that must adhere to accessibility guidelines, whether it be for conferences, podcasts, e-learning courses, or local council meetings, this article is for you.
The rise of audio and video content
We've all noticed how audio and video content has infiltrated our daily lives, from multimedia content in our social media feeds to voice messages on platforms like WhatsApp. The convenience it offers, allowing users to multitask or consume content on the go, has driven its rapid adoption. Podcasts, audiobooks, and voice assistants like Alexa, Google Home, and Siri underscore the growing dominance of audio in the digital landscape.
As this type of content continues to proliferate, the benefits of transcription have become increasingly evident.
The benefits of transcription
Transcriptions and captions make content accessible to individuals who are deaf, hard of hearing or simply unable to turn up the volume. It ensures that content is inclusive and can be consumed by a wider audience. It’s a legal requirement for public sector content under the Web Accessibilty Directive since 2021 and will be required for specific services under the European Accessibility Act from 2025. Because 1 in 4 Europeans live with a disability. But captions aren't just for those with hearing difficulties; they're also handy in noisy environments or when you need to watch content without disturbing others.
Searchability and improved user experience
Transcriptions allow you to quickly find specific information within lengthy files, making content more user-friendly. Bonus SEO points.
Transcripts can be effortlessly translated into multiple languages, expanding the global reach of your content.
Generate video captions with unprecedented accuracy
Unlike some automated captions or “craptions”, Whisper proves itself to be particularly accurate. It even spelled Sabatino’s name correctly. Video captions are subtitles enriched with additional information for individuals who are deaf or hard of hearing. They include descriptions of sounds that are crucial for understanding the video. Whisper, impressively, recognizes some of these sounds.
While human professionals still provide the highest quality captioning, it's not always financially possible, especially for projects with extensive video content.
In this video, Sabatino shows how Whisper can generate captions for your video content:
Generate transcripts and summarize content with ease
To unlock the full potential, we’re feeding the automatically generated captions from Whisper into ChatGPT to create easily understandable transcripts, summaries, and bullet points. Here’s Sabatino’s demonstration:
Get in touch
If you're eager to enhance the accessibility of your audio and video content with the power of AI, don't hesitate to reach out. We're more than happy to advise you and your team on how to fully automate the process of generating captions, transcripts, and summaries.
Contact Gijs Veyfeyken
Contact Sabatino Masala