The first model for automatic text-to-video prompt completion: Given a few words as input, the model will generate a few whole text-to-video prompts.

Details

It is fine-tuned on the VidProM dataset using Meta-Llama-3-8B and 8 A100 80G GPUs.

Usage

Download the model

from transformers import pipeline
import torch
pipe = pipeline("text-generation", model="WenhaoWang/Meta-Llama-3-8B-AutoT2VPrompt", model_kwargs={"torch_dtype": torch.bfloat16}, device_map="cuda:0")

Set the Parameters

input = "An underwater world"      # The input text to generate text-to-video prompt.
max_length = 50                    # The maximum length of the generated text.
temperature = 1.2                  # Controls the randomness of the generation. Higher values lead to more random outputs.
top_k = 8                          # Limits the number of words considered at each step to the top k most likely words.
num_return_sequences = 10          # The number of different text-to-video prompts to generate from the same input.

Generation

all_prompts = pipe(input, max_length = max_length, do_sample = True, temperature = temperature, top_k = top_k, num_return_sequences=num_return_sequences)

def process(text):
    text = text.replace('\n', '.')
    text = text.replace('  .', '.')
    text = text[:text.rfind('.')]
    text = text + '.'
    return text

for i in range(num_return_sequences):
    print(process(all_prompts[i]['generated_text']))

You will get 10 text-to-video prompts, and you can pick one you like most.

An underwater world of blue wonders. A vibrant Coral Gden sways with shades of aquamine. A Clownfish dances, while a Turtle leisurely glides by.
An underwater world full of colorful fish and coral formations.the sun rising over a field of corn ne a fm house on a beautiful morning.a woman is looking at vr controllers and trying to choose which one to choose, .
An underwater world teeming with vious unique mine creatures. Schools of fish gracefully swim among the colorful coral reefs and seaweed, creating a stunning underwater landscape.
An underwater world with a beautiful mermaid swimming in cle water and sunlight passing through the surface..the most beatuful view on the eth.
An underwater world teeming with a rainbow of coral reefs, swaying gently in the sea currents, surrounded by vibrant schools of tropical fish creating a stunning visual feast.
An underwater world filled with a rainbow fish and a sea turtle swiming.A woman walks in to a room where her child is sleeping. She leans over to check on the child. The child then wakes up..
An underwater world teeming with colorful creatures and vibrant coral reefs..a beautiful lady, big black eyes, with a white man bun hairstyle, weing a black professional attire, standing front and center, with a black background .
An underwater world with colorful coral reefs and a viety of sea creatures, all living together in hmony..a girl weing headphones listening to music at a dk coffee cafe at nighttime  -camera zoom out - 10.
An underwater world full of mine life and corals, in the style of 8k 3d, photorealistic scenes, crystal cle water, mine and sea flora motifs, high details, glistening water effects, vibrant mine life, H.
An underwater world of vibrant coral reefs teeming with schools of tropical fish, creating a mesmerizing display of colors and movement beneath the azure waves.

License

The model is licensed under the CC BY-NC 4.0 license, and you should also follow the license and Agreement from Meta AI.

Citation

@article{wang2024vidprom,
  title={VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models},
  author={Wang, Wenhao and Yang, Yi},
  journal={arXiv preprint arXiv:2403.06098},
  year={2024}
}

Acknowledgment

The fine-tuning process is helped by Yaowei Zheng.

Contact

If you have any questions, feel free to contact Wenhao Wang ([email protected]).

Downloads last month
15
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train WenhaoWang/Meta-Llama-3-8B-AutoT2VPrompt