Countless are the posts of people showing off the “coolest” AI thing, using Google’s NotebookLLM to generate a chatty podcast of your uploaded book, blog post, paper, shopping list. Woah. Wow. You click and listen hearing a male and female voice banter back and forth in conversational tone.

Neato, you are so onto this AI train!

When I spotted the always wise worded Kate Bowles post her assertion “I just need to bookmark this as my full-hearted nope to AI podcasting. For so many reasons. Just nope. In every language, from my heart to yours. Stop.”

Do not get me wrong, the technical accomplishment is impressive. But riddle me this– how many of these have you fully listened to start to finish? Or for more than 10 seconds?

I felt like I needed to, as the two voices say so often, a “deep dive”. I made it a goal to download a bunch of these, assemble them into a single track, and listen to them all. I used a few I’d seen my edtech colleagues share, but just searched the Google for things like “I made a podcast with NotebookLLM” – and ended up with 11 audio files (two were youtube videos I had to extract audio from. Since the pair never introduce themselves, I have been calling them “Biff” and “Buffy”

I re-named all the files “biff-and-buffy-1.mp3”, “biff-and-buffy-2.mp3” … “biff-and-buffy-11.mp3” and have emerged from being so well informed by this every chipper and knowledgable pair of botcasters. Here it is for you, an hour and 49 minutes of audio ________

My soul hurts from this stuff. But a few notes:

  • The banter is remarkable, well at first. But listen closely and you can hear as one voice talking the other chipping in with “Totally”, “100%”, “that’s amazing”
  • The have inflection and intonation that is really not what we are used to for synthetic voices
  • I even heard a few “ums” in there. Weird.
  • The clichés are strong. I heard Buffy say at least 3 times “Work smarter, not harder”
  • The always refer to their “show” as a “Deep Dive”
  • Biff and Buffy carry the same exuberance for every damn topic. I wondered about uploading something really banal and seeing how the hep it up.
  • In one sample listen, you might be wowed. But over a series, Biff and Buffy sound like a bunch of gushing sycophants, those office but kissers you want to kick in the pants.
  • Judging a bit, but to me the voices sound very middle class white. I know the response will be, “they will add more voices” or “it will be improve”.
  • And they come across as hip experts on everything.
  • After about 45 minutes, I am ready to throw them out the window of my truck (I listened while driving)

But beyond the point of showing that this can be done (reference the old saying about why a dog does something) – what is the use? Will people really use this as a mode to consume content? I’d reference a telling article I spent my one month’s free read on the New Yorker, Jill Leplore’s Is a Chat with a Bot a Conversation? (paywall, readable in browser incognito window) (please tell me you know how to do that) (or just try, eh?)

I am sure someone, likely based in eastern Canada, will have a different say. But I think we need to go beyond the “This is neat, look what I did” level. So many of these articles I scanned are full of outlining all of the steps needed to do this NotebookLLM parlor trick. A favorite commment on one I saw was:

“How I Turned My Resume Into a Podcast with Google’s NotebookLM AI”
Summary: I uploaded documents and pressed a button…

Is this the grand future of human creativity the prophe–profits are yodelling from the mountain tops?

See if you can listen to one hour and 49 minutes of this stuff, then come back with your comment. I am open to having my mind changed, but to me this is still a trick from the parlor room.

For reference, the sources of my Biff and Buffy show are the following, I leave it for you to figure out which one is what.

  • https://sheknowsseo.co/how-to-use-notebooklm-to-make-a-podcast-from-your-blog-post/
  • https://darcynorman.net/2024/09/23/notebooklm-summarizes-my-dissertation/
  • https://www.buzzsprout.com/1102442/episodes/15812241
  • https://simonwillison.net/2024/Sep/29/notebooklm-audio-overview/
  • https://talkingwithmachines.com/does-anyone-actually-want-ai-generated-podcasts/
  • https://www.linkedin.com/posts/couros_playing-around-with-googles-notebooklm-activity-7241870968258752512-SzFF
  • https://open.spotify.com/episode/16VFabNCxxtlrXUqtuGo7V?si=omji0uBdR8GQUBQ1_QBM5Q
  • https://ideasandthoughts.org/2024/09/18/googles-lm-notebook-made-this-podcast/
  • https://dev.to/jamesbright/how-i-turned-my-resume-into-a-podcast-with-googles-notebooklm-ai-4nji
  • https://medium.com/@artificialintelligencenews/how-we-created-an-ai-podcast-with-google-notebooklm-the-results-are-mindblowing-e24c8e7bfd13
  • https://podcasts.apple.com/ca/podcast/ai-generated-podcasts-the-future-of-audio-storytelling/id1768892778?i=1000670948117

I had some thoughts to mimic the audio podcast summary of my almost nil LinkedIn profile, but than I am playing into the parlor game. Nah. See below!

Postscript

In assembling 11 audio files into one, I went back to use the ffmpeg command I have used to assemble things from my old voice mixer toy, but ran into some hangups as it required all kinds of command line updates, that then wanted a new version of xcode… I found this little trick which looked neat, just using the command line cat command, and yes, I quickly got one file.

And on my first listen I was treated to several of the tracks being chipmunked, reflecting the source’s note “Simple, right? As long as all the MP3 files are recorded at the same bitrate, it should just work.”

Just for fun, this bit of Biff and Buffy are not much different from their “regular” voice:

Post Post Script

Gulp, I went and did it. Bridging off the dude who did a LLMcast based on his LinkedIn profile, I tried mine but there was not enough info there for Biff and Buffy (by design) so I added my calling card site. Here is the bit to round up the listen to a fill 2 hours, as the butt kissing pair lather it up about me (love how they pronounce SPLOT).


Featured Image: As per my routine, the only times I use Generated AI images is to mock generative AI. I used Adobe Firefly to generate the image above, but sadly I forgot to save/publish the link. I asked for an image in 16:9 aspect ratio for the prompt “A cheerful and excited man and woman sit at desk of recording studio speaking into a microphone”. I regenerated the image to something similar. How does one license such things? Shrug. I just try to be as clear as I can.

Very human like and cheerful man on left and woman on right speaking into microphone in what might be a sound studio
If this kind of stuff has value, please support me by tossing a one time PayPal kibble or monthly on Patreon
Become a patron at Patreon!
Profile Picture for CogDog The Blog
An early 90s builder of web stuff and blogging Alan Levine barks at CogDogBlog.com on web storytelling (#ds106 #4life), photography, bending WordPress, and serendipity in the infinite internet river. He thinks it's weird to write about himself in the third person. And he is 100% into the Fediverse (or tells himself so) Tooting as @cogdog@cosocial.ca

Comments

  1. I agree with you Alan about the initial amazement about what is possible, I am not sure how purposeful it is. I listened to David Truss’ podcast he posted and was left thinking about my experience with David Truss’ writing. I imagine that such tools may provide a possible entry way into new content, but I am not sure what is really gained by putting this into an audio format? If as David has suggested (quoting Adam Grant), “The future belongs to those who connect dots.” Does an autogenerated podcast help with that? (On a side note, anytime someone talks about connecting dots, I am reminded of the wonderful work of Amy Burvall.) I wonder in this case if the focus on the product overlooks the learning gained through the process of highlighting the patterns and finding a trace through all the dots?

    I personally listen to a lot of text using the phone’s accessibility features. I think that a text summary read in this manner is both sufficient and maintains the divide, whereas I feel that the artificial voices sit somewhere in the uncanny valley. However, the more I think about this, I wonder what is the uncanny valley anymore and whether we are “all already interpolated” within the system, especially after reading Jill Lepore’s dive into the world of the talking chatbot.

Leave a Reply

Your email address will not be published. Required fields are marked *