Edge TTS Web Interface：简单易用的中文文本转语音工具

本项目已开源在GitHub上面：Edge TTS Web Interface
快速体验网站：https://tts.lp0.cc

前言

初入公司的时候，有一项考核项目就是利用开源软件实现在Linux系统中，实现输入文字转换成音频文件，但是当时AI发展的不是很好，微软或者是谷歌的TTS也没有开源，所以当时能使用并支持中文的很少

当时考核使用的是Espeak和Ekho(余音)这两个来实现的，但是生成的语音结果并不理想，所以严格来说，也是失败的，但好在还是实现了基本功能，考核也是顺利通过了。

但是我觉得这个应该还有其他更好的解决办法，但是一直没有找到，所以就搁置了。

再度尝试

去年的时候，公司有一个项目又使用到了TTS项目，也是运用在Linux上的，但是之前我开发的不是很好，所以公司找了一个可以免费文字转语音的网站使用了，这让我决定自己做一个，但是自己开发无异于单步上天，所以只能利用开源项目搭建。

看了一圈开源的TTS项目，要么是对运行环境要求高，要么就是生成的语音不达标，所以也是找了好久。

但是幸运的是，我找到了一个开源的项目，给了我启发：edge-tts

这个项目是基于微软开源的TTS语音库文字转语音，文字转mp3
代码采用lask+ edge-tts +python3.9+gunicorn + cos 直接可以运行在docker项目中,接口根据文字、主播生成语音并上传到腾讯云COS云存储

这个项目的代码给了我很大启发，我发现原来Python可以直接调用微软的edge-tts库

于是我在上述项目上做了一个有限改动，但是始终达不到我理想的或者是想要的效果，于是我决定让AI帮我完成。

AI开发

可以说，我现在在GitHub上提交的代码，90%都是AI完成的，可以说，没有AI，就不会有我在GitHub上提交的Edge TTS Web Interface项目，也不会有今天的博客。

参与这个项目开发的主要是Gemini 1.5 Pro、Claude 3.5 Sonnet、ChatGPT 4o

其中ChatGPT 4o帮助我完成了基本的项目搭建及开发、运行中遇到的大部分问题；
Gemini 1.5 Pro帮助我完成了项目更丰富的功能，并搭建了基础的Docker功能；
Claude 3.5 Sonnet帮助了我完整的Docker部署，以及项目的文档。

项目代码

保留一点小私心，最新的代码和功能都在GitHub及体验网站中，大家可以去GitHub去查看最新的代码，或者是下载下来，自己再次开发，我这里就展示一下我使用AI开发的未完成版本吧。

app = Flask(__name__)

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

voiceMap = {
    "xiaoxiao": "zh-CN-XiaoxiaoNeural",
    "xiaoyi": "zh-CN-XiaoyiNeural",
    "yunjian": "zh-CN-YunjianNeural",
    "yunxi": "zh-CN-YunxiNeural",
    "yunxia": "zh-CN-YunxiaNeural",
    "yunyang": "zh-CN-YunyangNeural",
    "xiaobei": "zh-CN-liaoning-XiaobeiNeural",
    "xiaoni": "zh-CN-shaanxi-XiaoniNeural",
    "hiugaai": "zh-HK-HiuGaaiNeural",
    "hiumaan": "zh-HK-HiuMaanNeural",
    "wanlung": "zh-HK-WanLungNeural",
    "hsiaochen": "zh-TW-HsiaoChenNeural",
    "hsioayu": "zh-TW-HsiaoYuNeural",
    "yunjhe": "zh-TW-YunJheNeural",
}

def getVoiceById(voiceId):
    return voiceMap.get(voiceId)

# 去除 HTML 标签
def remove_html(string):
    regex = re.compile(r'<[^>]+>')
    return regex.sub('', string)

def createAudio(text, file_path, voiceId):
    new_text = remove_html(text)
    logger.debug(f"Text without html tags: {new_text}")
    voice = getVoiceById(voiceId)
    if not voice:
        logger.error("Invalid voice ID")
        return "error params"

    logger.debug(f"File path: {file_path}")

    # 使用 edge-tts 命令
    command = ["edge-tts", "--voice", voice, "--text", new_text, "--write-media", file_path]
    logger.debug(f"Running command: {' '.join(map(shlex.quote, command))}")
    
    try:
        result = subprocess.run(command, check=True, capture_output=True, text=True)
        logger.debug(f"Command output: {result.stdout}")
        logger.debug(f"Command error output: {result.stderr}")
        if os.path.exists(file_path):
            logger.debug(f"File created successfully: {file_path}")
            return "success"
        else:
            logger.error(f"File not created: {file_path}")
            return "file not created"
    except subprocess.CalledProcessError as e:
        logger.error(f"Command failed with exit code {e.returncode}")
        logger.error(f"Command output: {e.stdout}")
        logger.error(f"Command error output: {e.stderr}")
        return "command failed"
    except Exception as e:
        logger.error(f"An unexpected error occurred: {str(e)}")
        return "unexpected error"

@app.route('/', methods=['GET', 'POST'])
def index():
    if request.method == 'POST':
        text = request.form['text']
        file_name = request.form['file_name']
        voice = request.form['voice']
        custom_path = request.form['custom_path']
        
        file_path = os.path.join(custom_path, f"{file_name}.mp3") if custom_path else f"{file_name}.mp3"
        
        result = createAudio(text, file_path, voice)
        return jsonify({"result": result})

    html = '''
<!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>文字转语音工具</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            line-height: 1.6;
            color: #333;
            max-width: 800px;
            margin: 0 auto;
            padding: 20px;
            background-color: #f4f4f4;
        }
        h1 {
            color: #2c3e50;
            text-align: center;
            margin-bottom: 30px;
        }
        form {
            background-color: #ffffff;
            padding: 30px;
            border-radius: 8px;
            box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
        }
        label {
            display: block;
            margin-bottom: 5px;
            color: #2c3e50;
        }
        textarea, input[type="text"], select {
            width: 100%;
            padding: 10px;
            margin-bottom: 20px;
            border: 1px solid #ddd;
            border-radius: 4px;
            box-sizing: border-box;
        }
        textarea {
            height: 150px;
            resize: vertical;
        }
        input[type="submit"] {
            background-color: #3498db;
            color: #ffffff;
            padding: 12px 20px;
            border: none;
            border-radius: 4px;
            cursor: pointer;
            font-size: 16px;
            transition: background-color 0.3s;
        }
        input[type="submit"]:hover {
            background-color: #2980b9;
        }
        #result {
            margin-top: 20px;
            padding: 10px;
            border-radius: 4px;
            background-color: #ecf0f1;
            color: #2c3e50;
            text-align: center;
            display: none;
        }
    </style>
    <script>
        function submitForm(event) {
            event.preventDefault();
            const formData = new FormData(event.target);
            fetch('/', {
                method: 'POST',
                body: formData
            })
            .then(response => response.json())
            .then(data => {
                const resultDiv = document.getElementById('result');
                resultDiv.innerText = `音频生成结果: ${data.result}.`;
                resultDiv.style.display = 'block';
            })
            .catch(error => console.error('Error:', error));
        }
    </script>
</head>
<body>
    <h1>文字转语音工具</h1>
    <form method="POST" onsubmit="submitForm(event)">
        <label for="text">要转换的文字：</label>
        <textarea name="text" id="text" rows="4" required></textarea>

        <label for="file_name">文件名（不含扩展名）：</label>
        <input type="text" name="file_name" id="file_name" required>

        <label for="custom_path">文件保存路径（可选）：</label>
        <input type="text" name="custom_path" id="custom_path">

        <label for="voice">选择语音：</label>
        <select name="voice" id="voice" required>
            {% for voice_id, voice_name in voiceMap.items() %}
            <option value="{{ voice_id }}">{{ voice_name }}</option>
            {% endfor %}
        </select>

        <input type="submit" value="生成语音">
    </form>
    <div id="result"></div>
</body>
</html>
'''
    return render_template_string(html, voiceMap=voiceMap)

if __name__ == "__main__":
    app.run(port=2020, host="127.0.0.1", debug=True)

末尾

其实我还是希望大家看完这篇文章的时候，可以去GitHub给我的项目点一下Starred，多谢，嘿嘿嘿。

Edge TTS Web Interface：简单易用的中文文本转语音工具

前言

再度尝试

AI开发

项目代码

末尾

国内外免费AI测试

XXL-JOB-2.4.0适配Oracle数据库

Comments NOTHING

取消回复