基于开源库so-vits-svc生成AI歌手

本文为您介绍如何在阿里云DSW中,基于so-vits-svc开源库端到端生成一个AI歌手。

背景信息

在人工智能浪潮的推动下,技术的不断加持让虚拟人类愈发逼真。越来越多的虚拟人类被开发并应用于互联网中。技术使机器具备了人的特性,而人类也在追求智能化的道路上越走越远。使用人工智能克隆人类声音的场景已经不再仅限于荧屏之中。在今天,虚拟人类作为技术创新和文艺创作的结合体,让AI歌手成为了打开虚拟人与人世界的钥匙。本文将为您介绍如何生成一个AI歌手。AI歌手的效果演示如下:

  • 目标人物声音:

  • 歌曲原声:

  • 换声后的效果:

准备环境和资源

  • 创建工作空间,详情请参见创建工作空间

  • 创建DSW实例,其中关键参数配置如下。具体操作,请参见创建及管理DSW实例

    • 地域及可用区:进行本实践操作时,建议选择华北2(北京)华东2(上海)华东1(杭州)华南1(深圳)这四个地域。这四个地域在后续操作中下载ChatGLM模型数据时速度更快。

    • 实例规格选择:ecs.gn6v-c8g1.2xlarge。

    • 镜像选择:在官方镜像中选择stable-diffusion-webui-env:pytorch1.13-gpu-py310-cu117-ubuntu22.04

步骤一:在DSW中打开教程文件

  1. 进入PAI-DSW开发环境。

    1. 登录PAI控制台

    2. 在左侧导航栏单击工作空间列表,在工作空间列表页面中单击待操作的工作空间名称,进入对应工作空间内。

    3. 在页面左上方,选择使用服务的地域。

    4. 在左侧导航栏,选择模型开发与训练 > 交互式建模(DSW)

    5. 可选:交互式建模(DSW)页面的搜索框,输入实例名称或关键字,搜索实例。

    6. 单击需要打开的实例操作列下的打开

  2. Notebook页签的Launcher页面,单击快速开始区域Tool下的DSW Gallery,打开DSW Gallery页面。image.png

  3. DSW Gallery页面中,搜索并找到如何生成“AI歌手”教程,单击教程卡片中的DSW中打开

    单击后即会自动将本教程所需的资源和教程文件下载至DSW实例中,并在下载完成后自动打开教程文件。aa99bd52391ef07f3a3472e74f627b9d.png

步骤二:运行教程文件

在打开的教程文件ai_singer.ipynb文件中,您可以直接看到教程文本,您可以在教程文件中直接运行对应的步骤的命令,当成功运行结束一个步骤命令后,再顺次运行下个步骤的命令。59ec9a1549250e834a7ff4a14cc9fe55.png本教程包含的操作步骤以及每个步骤的运行结果如下。

  1. 下载so-vits-svc源码并安装依赖包。

    1. 克隆开源代码。

      单击此处查看运行结果

      Cloning into 'so-vits-svc'...
      remote: Enumerating objects: 3801, done.
      remote: Total 3801 (delta 0), reused 0 (delta 0), pack-reused 3801
      Receiving objects: 100% (3801/3801), 10.70 MiB | 29.38 MiB/s, done.
      Resolving deltas: 100% (2392/2392), done.
      Note: switching to '8aeeb10'.
      
      You are in 'detached HEAD' state. You can look around, make experimental
      changes and commit them, and you can discard any commits you make in this
      state without impacting any branches by switching back to a branch.
      
      If you want to create a new branch to retain commits you create, you may
      do so (now or later) by using -c with the switch command. Example:
      
        git switch -c <new-branch-name>
      
      Or undo this operation with:
      
        git switch -
      
      Turn off this advice by setting config variable advice.detachedHead to false
      
      HEAD is now at 8aeeb10 feat(preprocess): skip hidden files with prefix `.`
    2. 安装依赖包。

      【说明】结果中出现的ERRORWARNING信息可以忽略。

      单击此处查看运行结果

      Looking in indexes: https://mirrors.cloud.aliyuncs.com/pypi/simple
      Collecting ffmpeg-python
        Downloading https://mirrors.cloud.aliyuncs.com/pypi/packages/d7/0c/56be52741f75bad4dc6555991fabd2e07b432d333da82c11ad701123888a/ffmpeg_python-0.2.0-py3-none-any.whl (25 kB)
      Collecting Flask
        Downloading https://mirrors.cloud.aliyuncs.com/pypi/packages/fd/56/26f0be8adc2b4257df20c1c4260ddd0aa396cf8e75d90ab2f7ff99bc34f9/flask-2.3.3-py3-none-any.whl (96 kB)
           ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 96.1/96.1 kB 15.5 MB/s eta 0:00:00
      Collecting Flask_Cors
        Downloading https://mirrors.cloud.aliyuncs.com/pypi/packages/10/69/1e6cfb87117568a9de088c32d6258219e9d1ff7c131abf74249ef2031279/Flask_Cors-4.0.0-py2.py3-none-any.whl (14 kB)
      Requirement already satisfied: gradio>=3.7.0 in /usr/local/lib/python3.10/dist-packages (from -r ./so-vits-svc/requirements.txt (line 4)) (3.16.2)
      Collecting numpy==1.23.5
        Downloading https://mirrors.cloud.aliyuncs.com/pypi/packages/e4/f3/679b3a042a127de0d7c84874913c3e23bb84646eb3bc6ecab3f8c872edc9/numpy-1.23.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
           ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.1/17.1 MB 17.0 MB/s eta 0:00:0000:0100:01
      ......
      ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
      xformers 0.0.16rc425 requires torch==1.13.1, but you have torch 2.0.1 which is incompatible.
      torchvision 0.14.1+cu117 requires torch==1.13.1, but you have torch 2.0.1 which is incompatible.
      Successfully installed Flask-2.3.3 Flask_Cors-4.0.0 SoundFile-0.12.1 Werkzeug-2.3.7 antlr4-python3-runtime-4.8 audioread-3.0.0 bitarray-2.8.1 blinker-1.6.2 certifi-2023.7.22 colorama-0.4.6 cython-3.0.2 edge_tts-6.1.8 einops-0.6.1 fairseq-0.12.2 faiss-cpu-1.7.4 ffmpeg-python-0.2.0 hydra-core-1.0.7 itsdangerous-2.1.2 joblib-1.3.2 langdetect-1.0.9 librosa-0.9.1 local_attention-1.8.6 loguru-0.7.0 numpy-1.23.5 omegaconf-2.0.6 onnx-1.14.1 onnxoptimizer-0.3.13 onnxsim-0.4.33 pooch-1.7.0 portalocker-2.7.0 praat-parselmouth-0.4.3 pynvml-11.5.0 pyworld-0.3.4 resampy-0.4.2 sacrebleu-2.3.1 scikit-learn-1.3.0 scikit-maad-1.4.0 scipy-1.10.0 tabulate-0.9.0 tensorboardX-2.6.2.2 threadpoolctl-3.2.0 torch-2.0.1 torchaudio-2.0.2 torchcrepe-0.0.21
      WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
      
      [notice] A new release of pip is available: 23.0.1 -> 23.2.1
      ......
      7Progress: [ 93%] [#####################################################.....] 8Setting up liblilv-0-0:amd64 (0.24.12-2) ...
      Setting up libopenmpt0:amd64 (0.6.1-1) ...
      Setting up libpulse0:amd64 (1:15.99.1+dfsg1-1ubuntu2.1) ...
      7Progress: [ 94%] [######################################################....] 8Setting up libpango-1.0-0:amd64 (1.50.6+ds-2ubuntu1) ...
      Setting up libopenal1:amd64 (1:1.19.1-2build3) ...
      Setting up libswresample3:amd64 (7:4.4.2-0ubuntu0.22.04.1) ...
      7Progress: [ 95%] [#######################################################...] 8Setting up libpangoft2-1.0-0:amd64 (1.50.6+ds-2ubuntu1) ...
      Setting up libsdl2-2.0-0:amd64 (2.0.20+dfsg-2ubuntu1.22.04.1) ...
      Setting up libpangocairo-1.0-0:amd64 (1.50.6+ds-2ubuntu1) ...
      7Progress: [ 96%] [#######################################################...] 8Setting up libsphinxbase3:amd64 (0.8+5prealpha+1-13build1) ...
      Setting up librsvg2-2:amd64 (2.52.5+dfsg-3ubuntu0.2) ...
      Setting up libpocketsphinx3:amd64 (0.8.0+real5prealpha+1-14ubuntu1) ...
      7Progress: [ 97%] [########################################################..] 8Setting up libdecor-0-plugin-1-cairo:amd64 (0.1.0-3build1) ...
      Setting up librsvg2-common:amd64 (2.52.5+dfsg-3ubuntu0.2) ...
      Setting up libavcodec58:amd64 (7:4.4.2-0ubuntu0.22.04.1) ...
      Setting up libchromaprint1:amd64 (1.5.1-2) ...
      7Progress: [ 98%] [########################################################..] 8Setting up libavformat58:amd64 (7:4.4.2-0ubuntu0.22.04.1) ...
      Setting up libavfilter7:amd64 (7:4.4.2-0ubuntu0.22.04.1) ...
      Setting up libavdevice58:amd64 (7:4.4.2-0ubuntu0.22.04.1) ...
      7Progress: [ 99%] [#########################################################.] 8Setting up ffmpeg (7:4.4.2-0ubuntu0.22.04.1) ...
      Processing triggers for libc-bin (2.35-0ubuntu3.1) ...
      Processing triggers for libgdk-pixbuf-2.0-0:amd64 (2.42.8+dfsg-1ubuntu0.2) ...
  2. 下载预训练模型。

    1. 下载声音编码器模型。

      单击此处查看运行结果

      --2023-08-30 08:40:25--  http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/projects/so-vits-svc/pretrained_models/hubert_base.pt
      Resolving pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)... 39.98.1.111
      Connecting to pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)|39.98.1.111|:80... connected.
      HTTP request sent, awaiting response... 200 OK
      Length: 189507909 (181M) [application/octet-stream]
      Saving to: ‘./so-vits-svc/pretrain/checkpoint_best_legacy_500.pt’
      
      ./so-vits-svc/pretr 100%[===================>] 180.73M  2.78MB/s    in 28s     
      
      2023-08-30 08:40:54 (6.40 MB/s) - ‘./so-vits-svc/pretrain/checkpoint_best_legacy_500.pt’ saved [189507909/189507909]
    2. 下载预训练模型。

      单击此处查看运行结果

      --2023-08-30 08:43:45--  http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/projects/so-vits-svc/pretrained_models/ms903%3Asovits4.0-768vec-layer12/clean_D_320000.pth
      Resolving pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)... 39.98.1.111
      Connecting to pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)|39.98.1.111|:80... connected.
      HTTP request sent, awaiting response... 200 OK
      Length: 187027770 (178M) [application/octet-stream]
      Saving to: ‘./so-vits-svc/logs/44k/D_0.pth’
      
      ./so-vits-svc/logs/ 100%[===================>] 178.36M  15.8MB/s    in 12s     
      
      2023-08-30 08:43:57 (15.5 MB/s) - ‘./so-vits-svc/logs/44k/D_0.pth’ saved [187027770/187027770]
      
      --2023-08-30 08:43:57--  http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/projects/so-vits-svc/pretrained_models/ms903%3Asovits4.0-768vec-layer12/clean_G_320000.pth
      Resolving pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)... 39.98.1.111
      Connecting to pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)|39.98.1.111|:80... connected.
      HTTP request sent, awaiting response... 200 OK
      Length: 209268661 (200M) [application/octet-stream]
      Saving to: ‘./so-vits-svc/logs/44k/G_0.pth’
      
      ./so-vits-svc/logs/ 100%[===================>] 199.57M  20.6MB/s    in 10s     
      
      2023-08-30 08:44:07 (20.0 MB/s) - ‘./so-vits-svc/logs/44k/G_0.pth’ saved [209268661/209268661]
  3. 下载训练数据。

    您可以直接下载PAI准备好的训练数据。您也可以自行下载数据并参照教程文本中的附录内容完成数据清洗操作。

    单击此处查看运行结果

    --2023-08-30 08:44:24--  http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/projects/so-vits-svc/data/thchs30-C12.tar.gz
    Resolving pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)... 39.98.1.111
    Connecting to pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)|39.98.1.111|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 58648074 (56M) [application/gzip]
    Saving to: ‘thchs30-C12.tar.gz’
    
    thchs30-C12.tar.gz  100%[===================>]  55.93M  14.2MB/s    in 4.0s    
    
    2023-08-30 08:44:28 (14.0 MB/s) - ‘thchs30-C12.tar.gz’ saved [58648074/58648074]
    
    ./
    ./C12_569.wav
    ./C12_520.wav
    ./C12_724.wav
    ./C12_626.wav
    ./C12_559.wav
    ./C12_583.wav
    ......
    ./C12_687.wav
    ./C12_534.wav
    ./C12_745.wav
    ./C12_684.wav
    ./C12_738.wav
    ./C12_657.wav
    ./C12_523.wav
    ./C12_625.wav

    下载的样本数据格式如下,支持多种人声的训练。

    dataset_raw
    ├───speaker1(C12)
    │   ├───xxx1.wav
    │   ├───...
    │   └───xxxn.wav
    ├───speaker2(可选)
    │    ├───xxx1.wav
    │    ├───...
    │    └───xxxn.wav
    ├───speakerN(可选)
  4. 预处理训练数据。

    1. 重采样数据。

      单击此处查看运行结果

      CPU count: 8
      dataset_raw/C12
      resampling: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:2700:0100:01
    2. 将数据切分为训练集和验证集并生成配置文件。

      单击此处查看运行结果

      100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 124.91it/s]
      2023-08-30 08:46:12.683 | INFO     | __main__:<module>:74 - Writing ./filelists/train.txt
      100%|████████████████████████████████████| 248/248 [00:00<00:00, 1516308.15it/s]
      2023-08-30 08:46:12.684 | INFO     | __main__:<module>:80 - Writing ./filelists/val.txt
      100%|██████████████████████████████████████████| 2/2 [00:00<00:00, 79137.81it/s]
      2023-08-30 08:46:12.691 | INFO     | __main__:<module>:115 - Writing to configs/config.json
      2023-08-30 08:46:12.691 | INFO     | __main__:<module>:118 - Writing to configs/diffusion.yaml
    3. 生成音频特征数据,并保存至./so-vits-svc/dataset/44k/C12目录下。

      单击此处查看运行结果

      vec768l12
      2023-08-30 08:46:37.346 | INFO     | __main__:<module>:152 - Using device: 
      2023-08-30 08:46:37.346 | INFO     | __main__:<module>:153 - Using SpeechEncoder: vec768l12
      2023-08-30 08:46:37.346 | INFO     | __main__:<module>:154 - Using extractor: dio
      2023-08-30 08:46:37.346 | INFO     | __main__:<module>:155 - Using diff Mode: False
        0%|                                                     | 0/1 [00:00<?, ?it/s]2023-08-30 08:46:40.577 | INFO     | __mp_main__:process_batch:107 - Loading speech encoder for content...
      2023-08-30 08:46:40.596 | INFO     | __mp_main__:process_batch:113 - Rank 1 uses device cuda:0
      WARNING:xformers:WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
          PyTorch 1.13.1+cu117 with CUDA 1107 (you have 2.0.1+cu117)
          Python  3.10.9 (you have 3.10.6)
        Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
        Memory-efficient attention, SwiGLU, sparse and more won't be available.
        Set XFORMERS_MORE_DETAILS=1 for more details
      load model(s) from pretrain/checkpoint_best_legacy_500.pt
      2023-08-30 08:46:46.644 | INFO     | __mp_main__:process_batch:115 - Loaded speech encoder for rank 1
      100%|█████████████████████████████████████████| 250/250 [02:43<00:00,  1.53it/s]
      100%|████████████████████████████████████████████| 1/1 [02:53<00:00, 173.02s/it]
  5. 训练(可选)

    【说明】由于模型训练时间比较长,您可以跳过该步骤,使用PAI准备好的模型文件直接进行模型推理。

    为了获得更好的效果,建议您将epochs参数值修改为1000,每个epoch训练时长大约为20~30秒。训练时长大约持续500分钟。

    单击此处查看运行结果

    INFO:44k:{'train': {'log_interval': 200, 'eval_interval': 800, 'seed': 1234, 'epochs': 1500, 'learning_rate': 0.0001, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 6, 'fp16_run': False, 'half_type': 'fp16', 'lr_decay': 0.999875, 'segment_size': 10240, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0, 'use_sr': True, 'max_speclen': 512, 'port': '8001', 'keep_ckpts': 3, 'all_in_mem': False, 'vol_aug': False}, 'data': {'training_files': 'filelists/train.txt', 'validation_files': 'filelists/val.txt', 'max_wav_value': 32768.0, 'sampling_rate': 44100, 'filter_length': 2048, 'hop_length': 512, 'win_length': 2048, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': 22050, 'unit_interpolate_mode': 'nearest'}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4, 4], 'n_layers_q': 3, 'n_flow_layer': 4, 'use_spectral_norm': False, 'gin_channels': 768, 'ssl_dim': 768, 'n_speakers': 1, 'vocoder_name': 'nsf-hifigan', 'speech_encoder': 'vec768l12', 'speaker_embedding': False, 'vol_embedding': False, 'use_depthwise_conv': False, 'flow_share_parameter': False, 'use_automatic_f0_prediction': True}, 'spk': {'C12': 0}, 'model_dir': './logs/44k'}
    ./logs/44k/G_0.pth
    emb_g.weight is not in the checkpoint,please check your checkpoint.If you're using pretrain model,just ignore this warning.
    INFO:44k:emb_g.weight is not in the checkpoint
    load 
    INFO:44k:Loaded checkpoint './logs/44k/G_0.pth' (iteration 0)
    ./logs/44k/D_0.pth
    load 
    INFO:44k:Loaded checkpoint './logs/44k/D_0.pth' (iteration 0)
    ./logs/44k/D_0.pth
    ......
    INFO:44k:Train Epoch: 990 [17%]
    INFO:44k:Losses: [2.3169736862182617, 2.2942988872528076, 9.555232048034668, 14.556828498840332, 0.6244402527809143], step: 62600, lr: 8.299526322416852e-05, reference_loss: 29.347774505615234
    INFO:44k:====> Epoch: 990, cost 23.75 s
    INFO:44k:====> Epoch: 991, cost 22.81 s
    INFO:44k:====> Epoch: 992, cost 22.70 s
    INFO:44k:====> Epoch: 993, cost 22.99 s
    INFO:44k:Train Epoch: 994 [93%]
    INFO:44k:Losses: [2.5843334197998047, 2.4109506607055664, 8.15036392211914, 12.917271614074707, 0.6071179509162903], step: 62800, lr: 8.295377337271398e-05, reference_loss: 26.6700382232666
    INFO:44k:====> Epoch: 994, cost 23.86 s
    INFO:44k:====> Epoch: 995, cost 21.87 s
    INFO:44k:====> Epoch: 996, cost 23.03 s
    INFO:44k:====> Epoch: 997, cost 22.81 s
    INFO:44k:====> Epoch: 998, cost 23.05 s
    INFO:44k:Train Epoch: 999 [69%]
    INFO:44k:Losses: [2.552673816680908, 2.0296831130981445, 3.976914405822754, 13.161809921264648, 0.2755252420902252], step: 63000, lr: 8.290194022426301e-05, reference_loss: 21.996606826782227
    INFO:44k:====> Epoch: 999, cost 24.08 s
    INFO:44k:====> Epoch: 1000, cost 22.81 s

步骤三:推理模型

完成以上操作后,您已经成功完成了AI歌手的模型训练。您可以使用上述步骤训练好的模型文件或者使用PAI准备好的模型文件进行离线推理。推理结果默认保存在./results目录下。您可以在教程文件中继续运行推理章节的操作步骤。具体操作步骤以及每个步骤的执行结果如下。

  1. (可选)下载PAI准备好的模型文件,并将模型文件保存至./so-vits-svc/logs/G_8800_8gpus.pth目录下。

    【说明】如果您使用上述步骤训练好的模型文件进行离线推理,则可以跳过该步骤。

    --2023-08-30 08:50:10--  http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/projects/so-vits-svc/models/C12/G_8800.pth
    Resolving pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)... 39.98.1.111
    Connecting to pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)|39.98.1.111|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 627897375 (599M) [application/octet-stream]
    Saving to: ‘logs/G_8800_8gpus.pth’
    
    logs/G_8800_8gpus.p 100%[===================>] 598.81M  13.8MB/s    in 45s     
    
    2023-08-30 08:50:55 (13.3 MB/s) - ‘logs/G_8800_8gpus.pth’ saved [627897375/627897375]
  2. 下载测试数据,并保存至./raw目录下。本教程使用UVR5分离好的数据作为测试数据。由于离线推理需要使用干净的人声数据,如果您想自行准备测试数据,则需要参照教程文本中的附录内容完成数据清洗操作。同时,推理数据必须存放在./raw目录下。

    --2023-08-30 08:51:48--  http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/projects/so-vits-svc/data/one.tar.gz
    Resolving pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)... 39.98.1.111
    Connecting to pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)|39.98.1.111|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 15195943 (14M) [application/gzip]
    Saving to: ‘./raw/one.tar.gz’
    
    one.tar.gz          100%[===================>]  14.49M  12.5MB/s    in 1.2s    
    
    2023-08-30 08:51:50 (12.5 MB/s) - ‘./raw/one.tar.gz’ saved [15195943/15195943]
    
    one/
    one/1_one_(Instrumental).wav
    one/1_one_(Vocals).wav
    one/one.mp3
    one/1_1_one_(Vocals)_(Vocals).wav
    one/1_1_one_(Vocals)_(Instrumental).wav
  3. 将声音替换为C12人物的声音。

    load 
    WARNING:xformers:WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
        PyTorch 1.13.1+cu117 with CUDA 1107 (you have 2.0.1+cu117)
        Python  3.10.9 (you have 3.10.6)
      Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
      Memory-efficient attention, SwiGLU, sparse and more won't be available.
      Set XFORMERS_MORE_DETAILS=1 for more details
    load model(s) from pretrain/checkpoint_best_legacy_500.pt
    #=====segment start, 7.76s======
    vits use time:0.8072702884674072
    #=====segment start, 6.62s======
    vits use time:0.11305761337280273
    #=====segment start, 6.76s======
    vits use time:0.11228108406066895
    #=====segment start, 6.98s======
    vits use time:0.11324000358581543
    #=====segment start, 0.005s======
    jump empty segment
  4. 读取声音。

  5. 合并人声和伴奏。

    Export successfully!