7 min read

Making BODY DOUBLES WITH LOCAL AI

Making BODY DOUBLES WITH LOCAL AI Several AI models and tools allow you to upload a photo of a person, detect or encode their face, and generate new images of that same individual in different backgro

Cover image

Written by

ME

Megacity

Creator

Published on

3/11/2025

Making BODY DOUBLES WITH LOCAL AI

Several AI models and tools allow you to upload a photo of a person, detect or encode their face, and generate new images of that same individual in different backgrounds or styles. Below are some prominent solutions (many available on Hugging Face) that focus on maintaining facial consistency while varying the scene or style:

InstantID (Zero-Shot Identity Preservation on Hugging Face Space)

Link: InstantX/InstantID (Hugging Face Space)
Description: InstantID is a Gradio demo and research project for “Zero-shot Identity-Preserving Generation in Seconds” . It lets you upload a clear photo of a person’s face (or a cropped portrait) and, optionally, a second image to guide the pose. With no model fine-tuning needed, it will generate new images of that person according to your text prompt. The system extracts the person’s identity and facial features, then uses a Stable Diffusion pipeline (with ControlNets for pose or depth guidance) to produce a new image where the face looks like the input, but the background, pose, or style matches the prompt . This means you can place the person in different scenes or artistic styles while keeping a high likeness to the original face.

Usage: On the Hugging Face Space, you simply upload the person’s photo, upload an optional reference image to mimic the pose (if desired), and enter a text prompt describing the new background or scenario . For example, you might upload a headshot and then prompt “standing in a garden, sunlight, professional photography.” Clicking Submit runs the generation and returns one or several new images of that person in the specified context. If the face similarity isn’t high enough, the interface provides advanced sliders (e.g. “IdentityNet Strength”) to increase identity preservation . InstantID’s approach is very fast (seconds per image) and doesn’t require uploading multiple images or training a model, making it a convenient zero-shot solution. (Citation: official InstantID demo instructions )

IP-Adapter-FaceID (Stable Diffusion with Face Embedding)

Link: h94/IP-Adapter-FaceID (Hugging Face Model)
Description: IP-Adapter-FaceID is a specialized model that integrates a face-recognition embedding into Stable Diffusion to preserve a person’s identity in generated images . Instead of relying only on text, this method takes an embedding of the person’s face (extracted via a face recognition network such as InsightFace) and feeds it into the diffusion model’s cross-attention layers. It also uses a lightweight LoRA (Low-Rank Adapter) to further lock in the facial features . The result is a pipeline that can generate various styled images conditioned on a given face, using text prompts for the background or scene . For example, with a face embedding from your input photo, you can prompt “a portrait of [person] on a beach at sunset” and IP-Adapter-FaceID will produce that scene with the input person’s face accurately rendered.

Usage: Using IP-Adapter-FaceID typically involves coding with the Hugging Face Diffusers library. First, you extract the face embedding from the uploaded image. For instance, using InsightFace: load the face analysis model and get a normalized embedding vector for the face . Then, you load a Stable Diffusion pipeline (e.g. with a realistic pretrained model) and attach the IP-Adapter-FaceID module and its weights. Finally, call the generation function with your text prompt, any negative prompt, and the face embedding to produce images . The model card provides example code showing how to do this setup in Python. Some community-created UIs (like certain Stable Diffusion web UIs or ComfyUI workflows) also integrate IP-Adapter, allowing you to upload an image and prompt without manual coding . In summary, IP-Adapter-FaceID is a powerful way to generate new images of a specific person by using their face embedding as a condition, ensuring the generated face stays true to the input identity.

PuLID (Pure and Lightning ID Customization)

Link: yanze/PuLID-FLUX (Hugging Face Space)
Description: PuLID is a recent (NeurIPS 2024) tuning-free identity preservation method for text-to-image generation . It’s designed to quickly learn the defining features of a face from one or a few photos and apply them to new AI-generated images without needing to fine-tune the entire model. In practice, PuLID works similarly to IP-Adapter: it uses a face encoder (like InsightFace) and a specialized alignment technique to inject the identity into the generation process. The key advantage is speed and fidelity – PuLID can create high-quality, consistent faces in just a few diffusion steps, staying faithful to the input face . The original photos’ characteristics are maintained, so the person is instantly recognizable in the outputs . This works for realistic photos and can be combined with various styles or base models (the authors also provide a version for Stable Diffusion XL, and a “FLUX” variant for an optimized model) .

Usage: The Hugging Face Space for PuLID (or PuLID-FLUX) provides an easy interface: upload your face photo(s) and supply a text prompt for the scene or background. The tool will then generate the image, using PuLID under the hood to preserve your face. Because PuLID is tuning-free, the generation happens in a matter of seconds. As an example, you could upload a selfie and prompt “wearing hiking gear on a mountain trail” – PuLID will output an image of you in that scenario, with your facial features intact. Internally, it aligns an identity embedding of your face with the diffusion model’s latent space via contrastive learning, achieving a high-fidelity likeness . PuLID is notable for requiring no model training or prior knowledge of the face (hence “zero-shot”), making it very user-friendly for on-the-fly avatar or photo generation tasks.

PhotoMaker (Stacked-ID Embedding for Face Generation)

Link: TencentARC/PhotoMaker (Hugging Face Model & Demo)
Description: PhotoMaker is another personalized photo generation approach from Tencent ARC Lab. It encodes one or multiple images of a person into a “stacked ID embedding” that conditions Stable Diffusion, enabling the model to produce realistic images of that person in new scenes . Importantly, PhotoMaker does not require any fine-tuning on the user’s images – it’s a pretrained module that can plug into SD (including SDXL) and immediately personalize the output . The method combines a finetuned CLIP vision encoder (to capture identity features) with learned LoRA weights in the diffusion model to inject the person’s features . This allows for high-quality photorealistic results as well as stylized outputs (the model card shows examples of both) . In essence, you can get a custom portrait or even a painting of the person, with consistent facial identity.

Usage: PhotoMaker is available as a Hugging Face Space and as a downloadable model. Via the Space’s Gradio app, you can upload one or a few photos of the person and input a text prompt. The system will output a new image according to the prompt, keeping the subject’s face recognizable. For example, upload 2–3 pictures of someone and prompt “a realistic photo of [person] at the Eiffel Tower” or “oil painting of [person] as a medieval knight”. In a few seconds, PhotoMaker returns the generated images. Developers can also use the model in code: load the photomaker-v1 checkpoint and provide images to get the ID embedding, then generate images with a Stable Diffusion pipeline as described in the project’s GitHub . PhotoMaker’s strength is in producing high-fidelity personal photos with either real-world backgrounds or artistic styles, all while preserving the individual’s identity in the face .

DreamBooth Fine-Tuning (Customized Stable Diffusion Models)

Link: Hugging Face Diffusers – DreamBooth Guide
Description: DreamBooth is a well-known technique to personalize generative models (originally developed for Stable Diffusion). Unlike the above “zero-shot” methods, DreamBooth requires training on a few images of the person, but it achieves very high fidelity. The idea is to fine-tune a Stable Diffusion model on 3–5 photos of a subject, learning a unique token (like <person_name>) that represents that individual . After training, you can generate new images with that person by including the token in your text prompt (e.g. “<person_name> in a lush garden, photograph”). DreamBooth effectively teaches the model the appearance of the specific person, so it can render them in different poses, outfits, or environments described by the prompt . Many community models of celebrities or characters have been created this way, and it’s a popular method for creating personal avatars.

Usage: Using DreamBooth typically involves more setup: you need access to a GPU and the training code. Hugging Face’s Diffusers library provides example scripts and notebooks to perform DreamBooth fine-tuning . There are also Hugging Face Spaces (and Colab notebooks) that let you upload your images and train a model through a web UI. Once the model is fine-tuned (which can take 15–30 minutes or more, depending on hardware), you can use it to generate unlimited new images of the person. The advantage of DreamBooth is strong identity preservation and creative flexibility – since the model “knows” the person, it can put them in highly varied situations (different lighting, extreme poses, imaginative scenes, etc.) with consistent results. However, it does modify the model weights, so it’s less instantaneous compared to methods like InstantID or PuLID. For those willing to invest a bit of time, DreamBooth remains a go-to solution for photorealistic identity-specific generation .

StyleGAN-Based Face Generation (GAN Inversion & Re-Editing)

Link: IDInvert Project Page (for GAN inversion)
Description: Before diffusion models became popular, GANs (Generative Adversarial Networks) like StyleGAN were used for realistic face generation and editing. One way to generate new shots of a specific person is to perform GAN inversion: essentially, project the person’s photo into the latent space of a pretrained face generator, then manipulate that latent code to produce variations . For example, In-Domain GAN Inversion (IDInvert) allowed real face images to be embedded in StyleGAN’s latent space, enabling new images of that person with edited attributes or backgrounds . Once inverted, you could change the latent variables corresponding to hairstyle, background, expression, etc., and get a new image that still looks like the person. Some tools and research projects (like IDInvert and others cited on its page) demonstrated high-quality face editing using this method.

Usage: StyleGAN-based approaches typically require specialized pipelines or notebooks. A user would provide a face image, an encoder (like e4e or IDInvert) finds the latent vector for that face, and then a generator creates new images from that latent. By modifying the latent vector (or mixing it with random noise in certain layers), one can change the background or style while keeping the identity and facial structure. For instance, after inversion you might add a vector that represents “smiling” or “wearing glasses”, or simply sample different background parts of the latent space, to get new variations of the person. While these GAN techniques can produce very realistic faces, they are somewhat limited to the generator’s domain (usually faces only, often portrait-style) and require that an appropriate StyleGAN model exists for the type of image you want. In practice, modern diffusion-based methods (like those above) have largely supplanted GAN inversion for casual use, as they can handle more complex prompts and scenarios. However, StyleGAN tools are still useful for tasks like face swapping and controlled attribute editing. If you prefer a GAN route, you might explore open-source projects like IDInvert or StyleGAN-based editors; just note that they may need coding and aren’t as plug-and-play as the diffusion-based solutions.

References and Sources:

• InstantID Demo (Hugging Face) – Official Gradio demo for “InstantID: Zero-shot Identity-Preserving Generation” with instructions .

• IP-Adapter-FaceID Model Card – Hugging Face model using face recognition embeddings + Stable Diffusion for identity-conditioned generation .

• IP-Adapter FaceID Usage – Example from model card showing how to extract face embeddings and generate images with a prompt .

• PuLID (NeurIPS 2024) – Hugging Face model card for PuLID (Pure and Lightning ID Customization), a tuning-free identity preservation method .

• PuLID Overview – Ikomia blog explaining how PuLID quickly learns facial features and preserves them in new images .

• PhotoMaker Model Card – Introduction from Tencent ARC’s PhotoMaker, describing user workflow (no training, just input photos + prompt) .

• PhotoMaker Details – Model information (ID encoder and LoRA weights) and example results for realistic and stylized outputs .

• DreamBooth Diffusers Documentation – Definition of DreamBooth fine-tuning on a few images of a subject to personalize Stable Diffusion .

• Face-Landmark ControlNet – A ControlNet model that uses facial landmarks to guide Stable Diffusion, generating a new face with the same pose and features as the input . (This can be used to preserve identity by “tracing” the face structure, then changing background via prompt).

• Justin Pinkney’s Face Mixer Blog – Discussion of generating faces from one image without fine-tuning, and reference to IDInvert for GAN-based face identity editing .

Latest

More from the site