Making Obsidian Talk with Google's Fancy Wavenet Voices

Posted: 2024-12-15 | Tags: google, note-taking, obsidian, TTS, wavenet

I’m sharing a little hack I put together. As a hardcore Obsidian user, I was getting frustrated with the available text-to-speech options. I really wanted to use Google’s Wavenet voices because they sound amazing, but couldn’t find a plugin that did exactly what I wanted. So I sorted it out myself.

I cobbled together a hack using Google Cloud’s Text-to-Speech API and Obsidian’s Shell Commands plugin. Here’s how you can do it too (assuming you’re on Linux, if not you need some adjustment):

1. Set Up Google Cloud

You need to use Google Cloud:

Create a new project in Google Cloud Console. Create an account first if you don’t have one. Use one project for this purpose to not mix things up.
In APIS, enable the Text-to-Speech API for your project
Create an IAM service account. Is important to allow only this service, we don’t want surprises with the bills.
Download the JSON credentials file of this user (keep this safe! Don’t push it to github! This is money!)

2. The Python Script Magic

Create a file called read.py and drop this basic code in it:

import sys
from google.cloud import texttospeech
import os
import tempfile
import subprocess

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/google-credentials-iam-user.json"

def text_to_speech(text):
    client = texttospeech.TextToSpeechClient()

    input_text = texttospeech.SynthesisInput(text=text)

    voice = texttospeech.VoiceSelectionParams(
        language_code="en-EN",
        name="en-EN-Wavenet-B",
    )

    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3
    )

    response = client.synthesize_speech(
        input=input_text, voice=voice, audio_config=audio_config
    )

    with tempfile.NamedTemporaryFile(delete=False, suffix=".mp3") as temp_audio:
        temp_audio.write(response.audio_content)
        temp_audio_path = temp_audio.name

    subprocess.run(["mpv", temp_audio_path])

if __name__ == "__main__":
    if not sys.stdin.isatty():  
        text = sys.stdin.read()
    elif len(sys.argv) > 1:  
        text = " ".join(sys.argv[1:])

    if text:
        text_to_speech(text)
    else:
        print("No text received.")

Make it executable with chmod +x /home/abel/bin/read.py (adjust the path to wherever you put it).

You will need to install the Google library for this script. You can install it with pip: pip install google-cloud-texttospeec

3. The Obsidian Part

Install the “Shell commands” plugin in Obsidian
Add a new shell command with this awesome oneliner:

/usr/bin/xclip -o | python3 /home/abel/bin/read.py > /dev/null 2>&1

Use the real route you use for your program!

What’s happening here? xclip -o grabs whatever text you’ve selected, pipes it to our Python script, and we send any output to /dev/null because we don’t want to see clutter output in Obsidian.

Of course change the paths, unless your name is same as mine :)

In the plugin options give a name to the command like “Read with fancy voice”.

For more comfort, in Obsidian settings hotkeys, associate this command “Read with fancy voice” with your prefered key binding. This is how it looks in my Obsidian:

obsidian hot key

How to Use It

Select some text in Obsidian and copy it (ctrl+c)
Trigger the shell command (I set up a hotkey)
Listen to your notes in a lovely Wavenet voice!

Some Notes

You’ll need xclip installed (sudo apt install xclip if you don’t have it)
Keep an eye on your Google Cloud usage - while there is a free tier. Pricing is per character, there is a free tier and it’s not unlimited and it consumes SSML tags, not only words.
You can find a full list of available WaveNet voices and their language codes in the Google Cloud documentation: https://cloud.google.com/text-to-speech/docs/voices.
In case you find yourself needing this in two or more languages, you may add an extra argument to the script and create different obsidian commands for every language.
This may not work if you are using non-native obsidian in linux like flatpak.

That’s it! Now you can have Google’s premium voices read your Obsidian notes. Let me know if you make any cool modifications to this setup!

P.S. Remember to keep your Google Cloud credentials safe and never share them!