hexgrad commited on
Commit
8f2bb3e
·
verified ·
1 Parent(s): 6d0a9b0

Upload app.py

Browse files
Files changed (1) hide show
  1. app.py +4 -4
app.py CHANGED
@@ -460,7 +460,7 @@ def extract_text(file):
460
  with gr.Blocks() as lf_tts:
461
  with gr.Row():
462
  with gr.Column():
463
- file_input = gr.File(file_types=['.pdf', '.txt'], label='Input File: pdf or txt')
464
  text = gr.Textbox(label='Input Text', info='Generate speech in batches of 100 text segments and automatically join them together')
465
  file_input.upload(fn=extract_text, inputs=[file_input], outputs=[text])
466
  with gr.Row():
@@ -506,7 +506,7 @@ Unstable voices are more likely to stumble or produce unnatural artifacts, espec
506
  **How can CPU be faster than ZeroGPU?**<br/>
507
  The CPU is a dedicated resource for this Space, while the ZeroGPU pool is shared and dynamically allocated across all of HF. The ZeroGPU queue/allocator system inevitably adds latency to each request.<br/>
508
  For Basic TTS under ~100 tokens or characters, only a few seconds of audio need to be generated, so the actual compute is not that heavy. In these short bursts, the dedicated CPU can often compute the result faster than the total time it takes to: enter the ZeroGPU queue, wait to get allocated, and have a GPU compute and deliver the result.<br/>
509
- ZeroGPU catches up beyond 100 tokens and especially closer to the ~500 token context window. Long-Form mode processes batches of 100 segments at a time, so the GPU should outspeed the CPU by 1-2 orders of magnitude.
510
 
511
  ### Compute
512
  The model was trained on 1x A100-class 80GB instances rented from [Vast.ai](https://cloud.vast.ai/?ref_id=79907).<br/>
@@ -545,7 +545,7 @@ with gr.Blocks() as changelog:
545
  gr.Markdown('''
546
  **28 Nov 2024**<br/>
547
  🥈 CPU fallback
548
- 🌊 Long-Form streaming and stop button
549
 
550
  **25 Nov 2024**<br/>
551
  🎨 Voice Mixer added
@@ -574,7 +574,7 @@ with gr.Blocks() as changelog:
574
  with gr.Blocks() as app:
575
  gr.TabbedInterface(
576
  [basic_tts, lf_tts, about, changelog],
577
- ['🔥 Basic TTS', '📖 Long-Form', 'ℹ️ About', '📝 Changelog'],
578
  )
579
 
580
  if __name__ == '__main__':
 
460
  with gr.Blocks() as lf_tts:
461
  with gr.Row():
462
  with gr.Column():
463
+ file_input = gr.File(file_types=['.pdf', '.txt'], label='pdf or txt')
464
  text = gr.Textbox(label='Input Text', info='Generate speech in batches of 100 text segments and automatically join them together')
465
  file_input.upload(fn=extract_text, inputs=[file_input], outputs=[text])
466
  with gr.Row():
 
506
  **How can CPU be faster than ZeroGPU?**<br/>
507
  The CPU is a dedicated resource for this Space, while the ZeroGPU pool is shared and dynamically allocated across all of HF. The ZeroGPU queue/allocator system inevitably adds latency to each request.<br/>
508
  For Basic TTS under ~100 tokens or characters, only a few seconds of audio need to be generated, so the actual compute is not that heavy. In these short bursts, the dedicated CPU can often compute the result faster than the total time it takes to: enter the ZeroGPU queue, wait to get allocated, and have a GPU compute and deliver the result.<br/>
509
+ ZeroGPU catches up beyond 100 tokens and especially closer to the ~500 token context window. Long Form mode processes batches of 100 segments at a time, so the GPU should outspeed the CPU by 1-2 orders of magnitude.
510
 
511
  ### Compute
512
  The model was trained on 1x A100-class 80GB instances rented from [Vast.ai](https://cloud.vast.ai/?ref_id=79907).<br/>
 
545
  gr.Markdown('''
546
  **28 Nov 2024**<br/>
547
  🥈 CPU fallback
548
+ 🌊 Long Form streaming and stop button
549
 
550
  **25 Nov 2024**<br/>
551
  🎨 Voice Mixer added
 
574
  with gr.Blocks() as app:
575
  gr.TabbedInterface(
576
  [basic_tts, lf_tts, about, changelog],
577
+ ['🔥 Basic TTS', '📖 Long Form', 'ℹ️ About', '📝 Changelog'],
578
  )
579
 
580
  if __name__ == '__main__':