Quantization Options for Faster Inference and Lower VRAM Usage
Hi there,
Thanks for the fantastic work on the Parler-TTS model! I'm exploring options to optimize the model for deployment on environments with limited resources. Specifically, I'd like to know if there are ways to quantize the Parler-TTS-Mini-V1 model to achieve faster inference times and reduced VRAM usage.
A few questions:
Are there any recommended quantization techniques for this model, such as INT8 or mixed precision?
Would quantization impact the audio quality, and if so, to what extent?
Are there any tools or scripts available within the repo or compatible with the model to facilitate this process?
Any guidance or examples would be greatly appreciated!
Thanks in advance for your help.
Hi, I have similar questions, have you found any tutorials?