You know you’ve been around this game a grey haired time if you remember that podcasting had something to do with this thing called RSS. I found shreds of workshops I did back at Maricopa in 2006 “Podcasting, Schmodcasting…. What’s All the Hype?” and smiled I was using this web audio tool called Odeo who’s founder went on to lay a few technical bird droppings.

I digress.

This post is about a radical change in my technical tool kit, relearning what I was pretty damned comfortable doing, and to a medium degree, appreciating for a refreshing change, something that Artificial Intelligence probably has a hand in. Not magically transforming, but helping.

I’ve had this post in my brain draft for a while, but there is a timely nature, since this coming Friday I am hosting for OE Global a new series I have been getting off the grind, OEG Live, which is a live streamed, unstructured, open conversation about open education and some tech stuff… really the format is gather some interesting people and just let them talk together. Live.

This week’s show came as a spin off from a conversation in our OEG Connect community starting with a request for ideas about creating Audiobook versions of OER content but went down a path that including interesting ideas about how new AI tools might make this more easy to produce. Hence our show live streamed to YouTube Friday, June 2 is OEG Live: Audiobook Versions of OER Textbooks (and AI Implications).

I wanted to jot down some things I have been using and experimenting with for audio production, where AI likely has a place, but is by no means the entire enchilada. So this tale is more about changing out some old tech ways for new ones.

Podcasting Then and Now

Early on I remember using apps like WireTap pro to snag system audio recorded in Skype calls and a funky little portable iRiver audio recorder for in person sessions. My main audio editing tool of choice was Audacity, and still something I recommend for its features and open source heritage. I not only created a ton of resources for it in the days of teaching DS106 Audio, I used it for pretty much all my media project I did over the last maybe 17, 18 years. Heck Audacity comes up 105 times in my blog (this post will make it hit the magic number, right?)/

Audacity is what I used for the first two years of editing the OEG Voices podcast. Working in waveforms was pretty much second nature, and I was pretty good at brining in audio recorded in Zoom or Zencastr (where you can separate speaker audio seperate tracks), layer in the multivoice intros and Free Music Archive music tracks.

This was the editing space:

The multitrack editing in Audacity, waveforms for music, intros, separate speakers.

After editing, to generate a transcript i used various tools like Otter.ai and Rev.ai to generate transcripts, and cleaning up required another listening pass. This was time consuming, and for a number of episodes we paid for human transcriptions (~$70/episode), which still needed some cleanup.

Might AI Come in?

Via a Tweet? a Mastodon Post from Paul Privateer I found an interesting tool from Modal Labs offering free transcription using OpenAI Whisper tech. Just by entering “OEG Voices” it bounced back with links for all the episodes. With a click for any episode, and some time for processing, it returned a not bad transcript, that would take some text editing to use, but it gives a taste, that, AI has a useful space for transcribing audio.

Gardner Campbell tuned my into MacWhisper for a nifty means to use that same AI ______ (tool? machine? gizmo? magic blackbox) for audio transcription. You can get a good taste with the free version, the bump for the advanced features might be worth it. There is also Writeout which does transcription via a web interface and translation (“even Klingon”). And likely a kazillion more services, sprouting every day with a free demo and a link to pay for more. Plus other tools for improving audio- my pal Alex Emkerli has been nudging the new Adobe tools.

There is not enough time in a day to try them all, so I rely on trusted recommendations and lucky hunches,

Descript was a ,luck hunch that panned out.

Something Different: Descript

Just by accident, as it seems to do, something I see in passing, in this case boosted by someone in the fediverse, I saw a post that triggered my web spidey sense

I gave Descript a try starting with the first 2023 OEG Podcast with Robert Schuwer. It’s taken some time to hone, but It. Has. Been. A.Game. Changer.

This is a new approach entirely for my audio editing. I upload my speaker audio tracks (no preprocessing needed to convert say .m4a to .wav nor jumping to the Levelator to even out levels), it chugs a few minutes to transcribe. I can apply a “Studio Sound” effect that cleans sound.

But it’s the editing that is different. Transcribing the audio means most (but not all) editing is done via text- removing words, moving sound around is done via looking at text. The audio is tied to the text.

Editing podcasts in Descript

I can move to any point via text or the waveform. It does something where it manages the separate audio tracks as one, so if I delete a word, or nudging something in the timeline (say to increase or decrease the gap above), it modifies all tracks. But if I have a blip in on track, I can jump into the multitrack editor and replace it with a silence gap.

But because I am working with both the transcript and the audio, but I am done editing, both are final. I’m not showing everything, like inserting music, doing fades, invoking ducking. And it took maybe 4 or 5 episodes of fumbling to train myself, but Descript has totally changed my podcast ways (Don’t worry Audacity lovers, I still use it for other edits).

You can get a decent sense of Descript with their free plan, but with the volume of episodes, we went with the $30/month Pro plan for up to 30 transcription hours per month (a multitrack episode of say 4 voices for 50 minutes, incurs 200 minutes of that). That’s much less than paying for decent human transcription (sorry humans, AI just took your grunt work)

And i am maybe at about the 20% level of understanding all Descript does, but that’s enough to keep my pod.

But it’s not just drop something in a magic AI box and out pops a podcast, this is still me, Alan, doing the editing.

Yet, if you like Magic stuff, read on.

Magic Podcast Production

Editing podcasts us work enough, but all that work writing up show notes, summaries, creating social media posts, maybe there is some kind of magic.

Well, a coffee meetup in Saskatoon with JR Dingwall dropped me intro Castmagic – “Podcast show notes & content in a click, Upload your MP3, download all your post production content.”

That’s right, just give AI your audio, and let the magic churn.

I gave it a spin for a recent podcast episode of OEG Voices, number 56 with Giovanni Zimotti (- a really interesting Open Educator at University of Iowa, you should check it out. It generates potential titles (none I liked), keywords, highlights, key points, even the text for social media posts (see all it regurgitated).

On one hand, what it achieves and produces is impressive. Woah, is AI taking away my podcast production? Like most things AI, if you stand back from the screen and squint, it looks legit. But up close, I find it missing key elements, and wrongly emphasizing what I know are not the major points. I was there in the conversation.

I’d give it an 7 for effort but I am not ready to drop all I do for some magic AI beans.

Ergo AI

I’m not a Debbie Downer in AI, just skeptical. I am more excited here about a tool, Descript, that has really transformed my creation process. It’s not because of AI and frankly I have no idea what AI is really doing in any of these improbable machines, but maybe aided by AI.

This stuff is changing all the time. And likely you out there, random or regular reader, is doing something interesting with AI and audio, so let me know! My human brain seeks more random potential nuerons to connect. And please drop in for our OEG Live show Friday to hash more out for OER, audio, and AI swirling together.

Meanwhile, I have some more Descript-ing to do. You?

Updates:

I got downsed!

Alan: The new OLDaily’s here! The new OLDaily’’s here!
Felix: Well I wish I could get so excited about nothing.
Alan: Nothing? Are you kidding?! Post 7275, CogDogBlog.! I’m somebody now! Millions of people look at this site every day! This is the kind of spontaneous publicity, you’re name on the web, that makes people. I’m on the web! Things are going to start happening to me now.

with apologies to a scene from The Jerk

I also got Jon Udell interested too…

And from Jon’s post I discovered more exciting features:


Featured Image: Mine! No Silly MidjournalStableConfusingDally stuff.

Improbable Machine
Improbable Machine flickr photo by cogdogblog shared under a Creative Commons (BY) license

If this kind of stuff has value, please support me by tossing a one time PayPal kibble or monthly on Patreon
Become a patron at Patreon!
Profile Picture for CogDog The Blog
An early 90s builder of web stuff and blogging Alan Levine barks at CogDogBlog.com on web storytelling (#ds106 #4life), photography, bending WordPress, and serendipity in the infinite internet river. He thinks it's weird to write about himself in the third person. And he is 100% into the Fediverse (or tells himself so) Tooting as @cogdog@cosocial.ca

Comments

  1. Woah. Editing the AUDIO by editing the TEXT transcript? That’s amazing! Thanks for sharing this – I can see this being a HUGE game-changer for any audio production involving interviews!

  2. I have been a fan of Descript for a while now. It’s worth noting that Adobe has introduced similar edit-by-text functionality in Premiere Pro now. (No doubt influenced by Descript)

    I think this does have the potential to reshape video editing for the non-techy

Leave a Reply

Your email address will not be published. Required fields are marked *