I listen to Legends of the Old West podcast, it's a western-themed episodic podcast centered around outlaws.The narrator is great, the character actions are descriptive but I'm left wanting more.
With the surge of Stable Diffusion projects, I was inspired to make something AI generated art themed.
What I ended up with is transcribing the podcast audio into text, and then generating images based off of that. Take a look below for an example.
The bulk of the work is done by Vosk, an offline open source speech recognition toolkit. We convert the input MP3 to wav, send it through Vosk and receive a generated JSON output file.