Upload one or more audio files totaling up-to 30 minutes (do not exceed)
Ensure consistent audio quality throughout
Use clear recordings without background noise
Best Practices for Audio Samples:
Use a high-quality microphone
Higher audio quality shall yield better results
Record in a quiet environment
Speak naturally and be clear
Include different speaking styles (questions, statements, emotions)
High Fidelity Fine-tune parameters
It is recommended that you do not change these. Changing them might cause instability in output.
Available parameters for High Fidelity Voice Cloning:
Maximum Iteration to Train: Controls the total duration of model training. Higher values mean
longer training time but potentially better results.
Number of iterations to warm up: Initial training phase where the model gradually adjusts its
parameters before full training begins. Helps establish a stable starting point.
Learning Rate: Determines how much the model’s weights are adjusted in each training iteration.
Higher values mean faster learning but risk instability.
Rank: Controls how much the model can be modified by the input voice. Higher values allow more voice
influence but should be used cautiously to avoid overfitting.
Number of Vocoder Epochs: Specifies how many times the vocoder (audio synthesis component) goes
through the training data. More epochs can improve audio quality but increase training time.