Thursday, January 21, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Neural Networks

Generating neural speech synthesis voice acting using xVASynth | by Dan Ruta | Jan, 2021

January 12, 2021
in Neural Networks
Generating neural speech synthesis voice acting using xVASynth | by Dan Ruta | Jan, 2021
586
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

Goals and requirements

The main goal of this app is to provide content creators with a way to generate new voice acted lines for their projects. As such, audio quality is the top priority. A secondary objective is to provide a way to exert artistic control over the generated audio.

Big Data Jobs

The Data

The models are built around datasets like LJSpeech, which is formatted as a folder full of .wav files, and an accompanying metadata.csv file containing lines for each .wav file, as follows:
<filename>|<text transcript>

You might also like

6 Major AI Use Cases In IT Operations | by Gina Shaw | Jan, 2021

Classifying employees as likely-to-quit using Tensorflow, Pandas & IBM attrition dataset | by Timilsinasandesh | Jan, 2021

Does AI Raises Security and Ethics Concerns amid Pandemic | by Divyesh Dharaiya | Jan, 2021

Extracting audio/transcript data from games was quite easy, thanks to the very open and moddable nature of Bethesda games, and great tools like BAE, LazyAudio, and Bethesda’s Creation Kit. To start preparing the data for training, the audio files were first extracted from the game file, then decomposed into .lip and .wav files. The transcript was extracted using the Creation Kit. Following this, it was just a matter of matching up the audio files to the transcript, which was easy to do via a python script. To finalize the transcripts, a number of filtering passes were run, to exclude invalid data such as screams, shouts, music, and lines with extra unspoken text.

1. 130 Machine Learning Projects Solved and Explained

2. The New Intelligent Sales Stack

3. Time Series and How to Detect Anomalies in Them — Part I

4. Beginners Guide -CNN Image Classifier | Part 1

For the actual audio files, a couple additional pre-processing steps were required, starting with down-sampling the audio to 22050Hz mono audio. Using pydub, the audio silence was trimmed from either end, and the middle of the audio where long pauses were present (sox can also do this, but it introduced audio artifacts).

Models

I started early experiments during this project in 2018, using Tacotron. At this point, the project was a proof of concept, and was riddled with issues, as can be seen in this video I made showcasing progress at a v0.1 pre-release version.

Demo video for early experiments in xVASynth v0.1

Though it somewhat worked, the audio quality was terrible (with very high reverb), and the output was quite unstable. The model was also very slow to load. Additionally, the model required very large datasets which limited me to voices from voice actors who also recorded audio books (The data pre-processing for which was a whole other messy can of worms). Finally, any artistic control was limited to clever use of punctuation.

Fast-forward to 2020 and the PyTorch FastPitch model by NVIDIA was released. There are several key features of this model that are useful for this project.

One of the main selling points of this model is its focus on letter-by-letter values for pitch and duration. This meant that by hijacking intermediate model values, a user would be able to have artistic control over how the audio is generated – a great plus for the acting part of voice acting.

Pitch sliders and duration editing tools in xVASynth

Another plus is the support for multiple speakers per model. The best quality was still achieved when training a single speaker at a time. However, this has meant that training (or at least pre-training) voices with only a small amount of data was now possible, by grouping them up.

Image source: FastPitch paper on arxiv (https://arxiv.org/pdf/2006.06873.pdf)

However, the most important point for this project is that audio generation was backed by a pre-trained Tacotron 2 model, instead of learning from scratch. As a pre-processing step, the Tacotron 2 model outputs mel spectrograms, and character durations which is what is used to compute the loss, as shown in the diagram above, from the paper (the pitch information is extracted using a different method).

Just like Tacotron and Tacotron 2, the FastPitch model generates mel spectrograms, so another model is needed to generate actual .wav audio files from these spectrograms. The FastPitch repo uses WaveGlow for this, which works really well.

This dependency on Tacotron 2 has meant the training has been far more quick, simple and successful. However, an issue still persists when the speaker style is very different from the one the pre-trained Tacotron 2 was trained on, LJSpeech. LJSpeech is a female speaker dataset, meaning deep male voices cannot very well be converted to the data required by FastPitch.

xVASynth right now

At the time of writing, v1.0 xVASynth comes with 34 models trained for 53 voice sets across 6 games, with many more planned for future releases. The issue with Tacotron 2 persists for most male voices, as I don’t currently have the hardware requirements to train/fine-tune a Tacotron 2 model well enough, though this is the next step after my next hardware upgrade.

The video at the top of this post details usage examples for the app, which can be downloaded from either GitHub or the Nexus.

I additionally added HiFi-GAN as an alternative vocoder to WaveGlow, which is orders of magnitude faster on the CPU, albeit at lower audio quality.

Credit: BecomingHuman By: Dan Ruta

Previous Post

Attend MarTech for free this March

Next Post

An international team of researchers just introduced a new photonic processor -- ScienceDaily

Related Posts

6 Major AI Use Cases In IT Operations | by Gina Shaw | Jan, 2021
Neural Networks

6 Major AI Use Cases In IT Operations | by Gina Shaw | Jan, 2021

January 21, 2021
Classifying employees as likely-to-quit using Tensorflow, Pandas & IBM attrition dataset | by Timilsinasandesh | Jan, 2021
Neural Networks

Classifying employees as likely-to-quit using Tensorflow, Pandas & IBM attrition dataset | by Timilsinasandesh | Jan, 2021

January 21, 2021
Does AI Raises Security and Ethics Concerns amid Pandemic | by Divyesh Dharaiya | Jan, 2021
Neural Networks

Does AI Raises Security and Ethics Concerns amid Pandemic | by Divyesh Dharaiya | Jan, 2021

January 21, 2021
How to Deploy AI Models? — Part 2 Setting up the Github For Herolu and Streamlit | by RAVI SHEKHAR TIWARI | Jan, 2021
Neural Networks

How to Deploy AI Models? — Part 2 Setting up the Github For Herolu and Streamlit | by RAVI SHEKHAR TIWARI | Jan, 2021

January 20, 2021
Big Data in Energy: Possibilities and Limitations | by Iflexion | Jan, 2021
Neural Networks

Big Data in Energy: Possibilities and Limitations | by Iflexion | Jan, 2021

January 20, 2021
Next Post
Applying artificial intelligence to science education — ScienceDaily

An international team of researchers just introduced a new photonic processor -- ScienceDaily

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Better than the best password: How to use 2FA to improve your security
Internet Security

Better than the best password: How to use 2FA to improve your security

January 21, 2021
4Paradigm Defends its Championship in China’s Machine Learning Platform Market in the 1st Half of 2020, According to IDC
Machine Learning

4Paradigm Defends its Championship in China’s Machine Learning Platform Market in the 1st Half of 2020, According to IDC

January 21, 2021
The Content Habits and Preferences of Engineers
Marketing Technology

The Content Habits and Preferences of Engineers

January 21, 2021
Ransomware victims that have backups are paying ransoms to stop hackers leaking their stolen data
Internet Security

Ransomware victims that have backups are paying ransoms to stop hackers leaking their stolen data

January 21, 2021
Skyrim modders have a new machine learning tool that turns text to realistic NPC speech
Machine Learning

Skyrim modders have a new machine learning tool that turns text to realistic NPC speech

January 21, 2021
6 Major AI Use Cases In IT Operations | by Gina Shaw | Jan, 2021
Neural Networks

6 Major AI Use Cases In IT Operations | by Gina Shaw | Jan, 2021

January 21, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Better than the best password: How to use 2FA to improve your security January 21, 2021
  • 4Paradigm Defends its Championship in China’s Machine Learning Platform Market in the 1st Half of 2020, According to IDC January 21, 2021
  • The Content Habits and Preferences of Engineers January 21, 2021
  • Ransomware victims that have backups are paying ransoms to stop hackers leaking their stolen data January 21, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates