How to Create AI Lip Sync Videos with Spectoria

Spectoria’s Lipsync workflow is simple: upload an audio file and a reference video, then generate a lip-synced result. You’ll get an estimated credit cost based on audio duration, and the app shows a preview when the generation completes.

What you need

An audio file (MP3, WAV, or any audio format supported by your browser).
A video file (MP4, MOV, or another video format you can upload).
A Spectoria account (sign in to generate).

Step-by-step: generate lip sync

Upload the audio file.
Upload the reference video.
Click “Generate Lipsync.”
Wait for generation, then download your lip-synced video.

Credits & cost

Lip sync cost scales with audio length. Spectoria uses an estimate of about 3 credits per second of audio.

If your credits are insufficient, the UI will prompt you to purchase more.

Model Credit Cost Table

Model	Credits / second	Credits / second (with audio)
Runway Gen3a Turbo	1	—
Runway Gen4 Turbo	1	—
Veo 3.1 Fast	2	3
Veo 3.1	4	7
Seedance (480p/720p)	2	—
Seedance (1080p)	3	—
Kling v2.1 Master	6	—
Kling v2.1 Pro	3	—
Kling v2.1 Standard	1	—
Kling v2.5 Turbo	2	—
Kling v2.6 Pro	2	3
Kling 3.0 Pro	3	4
Kling 3.0 Standard	2	3
Sora 2	2	—
Sora 2 Pro	6	—
WAN 2.5 Pro	2	—

Image model	Credits / image
Nano Banana	1
Nano Banana Pro (1K)	2
Nano Banana Pro (2K)	4
Nano Banana Pro (4K)	6

Upload → generate → preview → download. Spectoria helps you produce lip sync without guesswork.

FAQ

What audio formats are supported?

The UI accepts common audio files like MP3 and WAV, and will generally work with supported audio formats you can upload via your browser.

How does Spectoria estimate lip sync credits?

Spectoria estimates cost from your audio duration and shows credits-per-second guidance (about 3 credits per second).

Do I need a reference video?

Yes. Lipsync needs a reference video (or portrait video) to match the face and timing for the synced mouth movement.