Face Parsing

example image and output

Semantic segmentation model fine-tuned from nvidia/mit-b5 with CelebAMask-HQ for face parsing. For additional options, see the Transformers Segformer docs.

ONNX model for web inference contributed by Xenova.

Usage in Python

Exhaustive list of labels can be extracted from config.json.

id label note
0 background
1 skin
2 nose
3 eye_g eyeglasses
4 l_eye left eye
5 r_eye right eye
6 l_brow left eyebrow
7 r_brow right eyebrow
8 l_ear left ear
9 r_ear right ear
10 mouth area between lips
11 u_lip upper lip
12 l_lip lower lip
13 hair
14 hat
15 ear_r earring
16 neck_l necklace
17 neck
18 cloth clothing
import torch
from torch import nn
from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation

from PIL import Image
import matplotlib.pyplot as plt
import requests

# convenience expression for automatically determining device
device = (
    "cuda"
    # Device for NVIDIA or AMD GPUs
    if torch.cuda.is_available()
    else "mps"
    # Device for Apple Silicon (Metal Performance Shaders)
    if torch.backends.mps.is_available()
    else "cpu"
)

# load models
image_processor = SegformerImageProcessor.from_pretrained("jonathandinu/face-parsing")
model = SegformerForSemanticSegmentation.from_pretrained("jonathandinu/face-parsing")
model.to(device)

# expects a PIL.Image or torch.Tensor
url = "https://images.unsplash.com/photo-1539571696357-5a69c17a67c6"
image = Image.open(requests.get(url, stream=True).raw)

# run inference on image
inputs = image_processor(images=image, return_tensors="pt").to(device)
outputs = model(**inputs)
logits = outputs.logits  # shape (batch_size, num_labels, ~height/4, ~width/4)

# resize output to match input image dimensions
upsampled_logits = nn.functional.interpolate(logits,
                size=image.size[::-1], # H x W
                mode='bilinear',
                align_corners=False)

# get label masks
labels = upsampled_logits.argmax(dim=1)[0]

# move to CPU to visualize in matplotlib
labels_viz = labels.cpu().numpy()
plt.imshow(labels_viz)
plt.show()

Usage in the browser (Transformers.js)

import {
  pipeline,
  env,
} from "https://cdn.jsdelivr.net/npm/@xenova/[email protected]";

// important to prevent errors since the model files are likely remote on HF hub
env.allowLocalModels = false;

// instantiate image segmentation pipeline with pretrained face parsing model
model = await pipeline("image-segmentation", "jonathandinu/face-parsing");

// async inference since it could take a few seconds
const output = await model(url);

// each label is a separate mask object
// [
//   { score: null, label: 'background', mask: transformers.js RawImage { ... }}
//   { score: null, label: 'hair', mask: transformers.js RawImage { ... }}
//    ...
// ]
for (const m of output) {
  print(`Found ${m.label}`);
  m.mask.save(`${m.label}.png`);
}

p5.js

Since p5.js uses an animation loop abstraction, we need to take care loading the model and making predictions.

// ...

// asynchronously load transformers.js and instantiate model
async function preload() {
  // load transformers.js library with a dynamic import
  const { pipeline, env } = await import(
    "https://cdn.jsdelivr.net/npm/@xenova/[email protected]"
  );

  // important to prevent errors since the model files are remote on HF hub
  env.allowLocalModels = false;

  // instantiate image segmentation pipeline with pretrained face parsing model
  model = await pipeline("image-segmentation", "jonathandinu/face-parsing");

  print("face-parsing model loaded");
}

// ...

full p5.js example

Model Description

  • Developed by: Jonathan Dinu
  • Model type: Transformer-based semantic segmentation image model
  • License: non-commercial research and educational purposes
  • Resources for more information: Transformers docs on Segformer and/or the original research paper.

Limitations and Bias

Bias

While the capabilities of computer vision models are impressive, they can also reinforce or exacerbate social biases. The CelebAMask-HQ dataset used for fine-tuning is large but not necessarily perfectly diverse or representative. Also, they are images of.... just celebrities.

Downloads last month
273,367
Safetensors
Model size
84.6M params
Tensor type
F32
Β·
Inference API

Model tree for jonathandinu/face-parsing

Quantizations
1 model

Spaces using jonathandinu/face-parsing 8