Podcasting With Bots

Steve Stearns
7 min readOct 26, 2023

--

I record a podcast periodically; while I enjoy the process of recording it and people being able to hear it, everything in between those two points is awful. Like it’s hardly back breaking work, but it’s the kind of tedium that I struggle with — the kind that made me the computer nerd I am today..

How I Suffered

The way the podcast was hosted was through Squarespace. It was a perfectly fine platform for getting a site up and running and it handled the podcast very well for a few years. What always irritated me though was the process of uploading the podcast. The steps involved were:

  1. Log into Squarespace and bring up our site editor
  2. Add a new blog post entry, and type in the title and description of the podcast as a blog post.
  3. Add the “audio” element to the blog post, and then start uploading the podcast audio file.
  4. Enter the title and description again.
  5. Wait for the file to finish uploading.
  6. Save post. If you save the post before the file is done uploading, you have to start over.

The two things that bugged me about this were the need to wait for the upload to finish, and the duplicate entries of title and description. This was like two minutes of my existence, but surely, I could make this better?

Behold the Power of Bots!

What I decided to build ended up looking like this:

The core idea I started with was to upload a file to an S3 bucket, and that would be literally the only thing I need to do. No need to log in anyplace. Just copy the file and let the rest happen on its own. I still need a few pieces, but here’s how this works.

Upload to Transcription

To begin, I upload the file to the S3 bucket that holds the audio files for the podcast. I then set up a trigger on the bucket so that whenever a new audio file showed up, it would create a transcription.

Initially I created this transcript using AWS Transcribe but found the quality of the transcripts to be pretty mediocre. I tested out a couple of services but it seemed like Assembly AI was the best. The transcripts were good quality and the cost was reasonable at 76 cents/hr.

There were no transcripts on the original site, so this is the first upgrade that came with the new process. I don’t imagine too many people are going to come read our podcast, but having a transcript is good for SEO at the very least.

Chat GPT Summarization

One of the big motivators for this project was trying to see how practical Chat GPT was. Could it reasonably handle the work of summarizing and creating a title for our podcast. The theory was to transcribe the podcast, upload the text of the ENTIRE EPISODE to Chat GPT, and see what it did.

First of all, I tried using the 3.5 version of the AI model and it worked okay, but the tone of what it output sucked. Like it summarized it just fine, but it definitely had an extra level of botness that I didn’t want. So then I tried to use Chat GPT 4 and hit a couple snags.

Token Limits

Chat GPT’s API has limits on the number of tokens that it can process. In the case of Chat GPT 4, it’s an ~8K token limit which, as it turns out, is not enough to handle an hour long podcast. Now, what’s a token? It’s not quite a word, but there’s a fairly good description of it at Open AI’s blog.

For my purposes what I did was limit the transcript to 3K words. That kept it under the token limit, but gave enough context for a good summary and title. Problem solved!

Rate Limiting

The other issue I ran into was that Chat GPT has rate limiting. This isn’t normally a big deal as I’m uploading one podcast every couple of weeks. As I was getting the site running though I needed to upload more. The limit is 10K tokens/minute, which should have been fine even for this work.

It appears that Chat GPT sometimes throttles traffic, above and beyond the stated rate limit. I waited a minute, tried a new upload, and it still errored out, saying that I was exceeding the limit.

If you look at the diagram, you’ll notice that there’s an SQS queue on the way to Chat GPT. This was an addition I made to fix that problem. So if Chat GPT throttles us, the message will just hang out in the queue until it is ready to play nice. If I was building a proper enterprise system I’d have a dead letter queue, some means of reprocessing, etc. This is not that.

All About The Prompts

The funny thing about AI work like this is that it’s all about probabilities, and you can tune it to get it to be what you want most of the time. I took examples of summaries and titles I had written previously and I use that to prompt the AI. Is the AI as clever as I am in writing titles and summaries? No. Is it good enough considering that this is 98% about listening to the audio? Yup.

I will likely add an editing capability so I don’t have to log into Dynamo DB to fine tune things. For now though, like I said, it’s good enough.

Updating the Site

When I set out to build this, I wanted to make sure the site was very fast and I wanted it to be dirt cheap. One challenge with building any site, from a cost perspective, is that as soon as you have any kind of back end automation, you’re paying for a persistent server of some kind. Sure it’s only like $5–10/month, but it’s a matter of principle. My podcast doesn’t get much traffic and it felt silly to pay for those few requests per month.

This led me to building it out as a static website in an S3 bucket. The framework I chose for this is Gatsby. I tend to build my personal sites in React, and Gatsby has a good framework for generating sites at build time.

When the transcription and summarization are done, the DynamoDB table that tracks everything gets updated to indicate that the podcast is done and ready for publishing. At that moment, I generate a new RSS for the podcast. To keep things simple, the podcast website is generated based on the same RSS feed.

I have considered changing this to drive the site off of Dynamo and just make calls at runtime to do this. The thing is, the site is quite fast as is (.72 seconds until it’s interactive), even though it’s loading 105 episodes of content every time it’s loaded.

Displaying Transcripts

The last piece of the puzzle was making the transcripts useful. When you transcribe the audio files through Assembly or AWS, the format of the file is fairly complex. Like it will identify that Speaker A said the word “pudding” but you need to do the work of turning that into a sentence, and identifying who Speaker A is.

I have a simple lambda function that handles this. It loads the transcript out of the S3 bucket and does the processing in real time. As it’s a Lambda function there’s also warm up time involved so it takes about 5 seconds to load a typical transcript. Not perfect, but good enough.

Okay, but is this better?

After this work, the obvious question is whether this is better? Was it worth the time spent?

For me, it was absolutely worth the time. I had fun building it and seeing the pieces of it come together. So on that point we’ll consider it a win. But beyond that, is it better?

Features

For listeners to the show, this is an upgrade. Now they can view transcripts of the files which didn’t exist before. There was also no practical way to do that in Squarespace. A very minor upgrade, but it’s better.

For SEO purposes, this should be an enormous improvement. Now instead of just having it show descriptions, Google can pick up all of the text of the shows. That should help improve things a bit.

Obviously this simple site doesn’t have all of the features of Squarespace, and there’s no longer an ability to use it as a free form blog. For our purposes, we didn’t need all of that stuff, and that’s part of why it didn’t make a lot of sense to keep running it on their site.

Costs

Let’s just acknowledge that there’s no way that this is justifiable if we’re considering how much I would bill per hour to do work like this. This was for fun, and so let’s toss that out.

Previously, we were paying $18/mo for hosting on Squarespace. The raw cost for the s3 bucket storage and the basic automation is pennies per day. So far it looks like it will costs under $3/month to run.

For the bot army I throw at transcripts, I’ll pay 76 cents/hour for the podcasts. So figure that’s probably under $2/month. Chat GPT costs .03 cents/1K tokens. I send it roughly 3K tokens per podcast, and figure two podcasts/month. That’s … 18 cents/month?

All totaled, it’s ~$5/month to do all of this? I’ll take it.

What’s Next?

I think I’ll spend a little time making this more of a general-purpose tool rather than just using it for my podcast. I built it to be pretty far along that path already but I need to tweak a few naming conventions, etc. I’ll do another blog post shortly that gets more into how I physically built it, tools I used, etc. For now though, the site seems to be working pretty well!

--

--

Steve Stearns
Steve Stearns

Written by Steve Stearns

A fan of architecture from Chicago

No responses yet