Spaces:
Running
on
Zero
Running
on
Zero
Upload app.py
Browse files
app.py
CHANGED
@@ -460,7 +460,7 @@ def extract_text(file):
|
|
460 |
with gr.Blocks() as lf_tts:
|
461 |
with gr.Row():
|
462 |
with gr.Column():
|
463 |
-
file_input = gr.File(file_types=['.pdf', '.txt'], label='
|
464 |
text = gr.Textbox(label='Input Text', info='Generate speech in batches of 100 text segments and automatically join them together')
|
465 |
file_input.upload(fn=extract_text, inputs=[file_input], outputs=[text])
|
466 |
with gr.Row():
|
@@ -506,7 +506,7 @@ Unstable voices are more likely to stumble or produce unnatural artifacts, espec
|
|
506 |
**How can CPU be faster than ZeroGPU?**<br/>
|
507 |
The CPU is a dedicated resource for this Space, while the ZeroGPU pool is shared and dynamically allocated across all of HF. The ZeroGPU queue/allocator system inevitably adds latency to each request.<br/>
|
508 |
For Basic TTS under ~100 tokens or characters, only a few seconds of audio need to be generated, so the actual compute is not that heavy. In these short bursts, the dedicated CPU can often compute the result faster than the total time it takes to: enter the ZeroGPU queue, wait to get allocated, and have a GPU compute and deliver the result.<br/>
|
509 |
-
ZeroGPU catches up beyond 100 tokens and especially closer to the ~500 token context window. Long
|
510 |
|
511 |
### Compute
|
512 |
The model was trained on 1x A100-class 80GB instances rented from [Vast.ai](https://cloud.vast.ai/?ref_id=79907).<br/>
|
@@ -545,7 +545,7 @@ with gr.Blocks() as changelog:
|
|
545 |
gr.Markdown('''
|
546 |
**28 Nov 2024**<br/>
|
547 |
🥈 CPU fallback
|
548 |
-
🌊 Long
|
549 |
|
550 |
**25 Nov 2024**<br/>
|
551 |
🎨 Voice Mixer added
|
@@ -574,7 +574,7 @@ with gr.Blocks() as changelog:
|
|
574 |
with gr.Blocks() as app:
|
575 |
gr.TabbedInterface(
|
576 |
[basic_tts, lf_tts, about, changelog],
|
577 |
-
['🔥 Basic TTS', '📖 Long
|
578 |
)
|
579 |
|
580 |
if __name__ == '__main__':
|
|
|
460 |
with gr.Blocks() as lf_tts:
|
461 |
with gr.Row():
|
462 |
with gr.Column():
|
463 |
+
file_input = gr.File(file_types=['.pdf', '.txt'], label='pdf or txt')
|
464 |
text = gr.Textbox(label='Input Text', info='Generate speech in batches of 100 text segments and automatically join them together')
|
465 |
file_input.upload(fn=extract_text, inputs=[file_input], outputs=[text])
|
466 |
with gr.Row():
|
|
|
506 |
**How can CPU be faster than ZeroGPU?**<br/>
|
507 |
The CPU is a dedicated resource for this Space, while the ZeroGPU pool is shared and dynamically allocated across all of HF. The ZeroGPU queue/allocator system inevitably adds latency to each request.<br/>
|
508 |
For Basic TTS under ~100 tokens or characters, only a few seconds of audio need to be generated, so the actual compute is not that heavy. In these short bursts, the dedicated CPU can often compute the result faster than the total time it takes to: enter the ZeroGPU queue, wait to get allocated, and have a GPU compute and deliver the result.<br/>
|
509 |
+
ZeroGPU catches up beyond 100 tokens and especially closer to the ~500 token context window. Long Form mode processes batches of 100 segments at a time, so the GPU should outspeed the CPU by 1-2 orders of magnitude.
|
510 |
|
511 |
### Compute
|
512 |
The model was trained on 1x A100-class 80GB instances rented from [Vast.ai](https://cloud.vast.ai/?ref_id=79907).<br/>
|
|
|
545 |
gr.Markdown('''
|
546 |
**28 Nov 2024**<br/>
|
547 |
🥈 CPU fallback
|
548 |
+
🌊 Long Form streaming and stop button
|
549 |
|
550 |
**25 Nov 2024**<br/>
|
551 |
🎨 Voice Mixer added
|
|
|
574 |
with gr.Blocks() as app:
|
575 |
gr.TabbedInterface(
|
576 |
[basic_tts, lf_tts, about, changelog],
|
577 |
+
['🔥 Basic TTS', '📖 Long Form', 'ℹ️ About', '📝 Changelog'],
|
578 |
)
|
579 |
|
580 |
if __name__ == '__main__':
|