Why Your SD Prompts Make Bad Videos
Copy-pasting Stable Diffusion prompts into Wan 2.2 gives you stiff, jittery, or partially-clothed output. The models process text completely differently.
beautiful woman, nude, bedroom, lingerie removal, slow, sensual, long hair, perfect body, masterpiece, 8k, best quality
CLIP tokenises this as a bag of words. No syntax, no trajectory β output barely moves.
A woman in black lingerie slowly reaches for her shoulder strap, letting it fall as she turns slightly toward the camera, soft candlelight from the right, intimate handheld framing
T5 reads this as a sentence. Grammar creates motion direction and temporal flow.
Rule: write a sentence that describes what happens over time, not a list of what you want to see.
T5 vs CLIP β Why Sentence Structure Matters
Processes tokens as an unordered bag. Word position and relationships are largely ignored. Comma-separated tags work because order does not matter.
Reads the full sentence. Understands subject, verb, and object. Grammar activates semantic relationships the image model never sees β including temporal ones.
Practical rule: write "A woman slowly runs her hands down her body" not "woman, hands, body, slow, sensual".
Your Prompt Is a Path, Not a Picture
Video diffusion generates a trajectory through latent space, not individual frames. A static description gives a near-flat trajectory β barely any movement. A motion-implying description defines a start and end state, so the model has somewhere to go.
Static description β flat trajectory
Motion description β directed trajectory
woman lying on bed, nude, beautiful, soft light, perfect body
A woman lying on white sheets slowly arches her back, fingers trailing down her stomach, warm morning light from a window casting long shadows across the bed
Tip: motion verbs and adverbs are your real levers. "Slowly", "gradually", "arching", "teasingly" do more than "masterpiece" or "8k" ever will.
The CFG Sweet Spot for NSFW Activation
The NSFW fine-tune activates within a specific CFG range. Outside it, no prompt saves the output.
Base model dominates. NSFW activations are weak. Output looks generic or clothed.
NSFW fine-tune and base model balance correctly. Start at 6.5.
Recommended default: 6.5Fine-tune overcorrects. Anatomy distorts, artifacts appear, faces break.
I2V Anchor Frame β What Not to Prompt
In I2V mode, your starting image is encoded as an anchor into latent space. The model finds a motion trajectory that departs from the anchor without destroying it. This changes everything about how you write the prompt.
beautiful red-haired woman lying in bed, nude, soft lighting, sensual expression, perfect body, long hair spread across pillow
The model already sees the image. Repeating its contents creates competing signals β output stutters or stays frozen.
she slowly leans forward, lips parting slightly, one hand reaching toward the camera, hair falling across her face
The anchor handles appearance. Your prompt handles the trajectory. Describe only what changes.
Motion Vocabulary
Words and phrases that produce real movement in Wan 2.2. Click any chip to copy.
Body Motion
Camera Motion
Speed & Intensity
Scene Atmosphere
Scene Templates by Category
Copy-paste starting points for four common scene types. Prompt text is always English β Wan 2.2 is an English-prompt model regardless of interface language.
A woman in sheer white lingerie sits on the edge of a white-sheeted bed, slowly reaching back to unhook her bra, soft warm lamplight from the right, shallow depth of field, intimate close-up framing
she slowly slides the fabric off her shoulder, body turning slightly toward the light, hair falling forward
stiff, static, no movement, clothed, extra limbs, distorted anatomy, blurry face, low quality, watermark
