Spectoria’s Lipsync workflow is simple: upload an audio file and a reference video, then generate a lip-synced result. You’ll get an estimated credit cost based on audio duration, and the app shows a preview when the generation completes.
What you need
- An audio file (MP3, WAV, or any audio format supported by your browser).
- A video file (MP4, MOV, or another video format you can upload).
- A Spectoria account (sign in to generate).
Step-by-step: generate lip sync
- Upload the audio file.
- Upload the reference video.
- Click “Generate Lipsync.”
- Wait for generation, then download your lip-synced video.
Credits & cost
Lip sync cost scales with audio length. Spectoria uses an estimate of about 3 credits per second of audio.
If your credits are insufficient, the UI will prompt you to purchase more.
Model Credit Cost Table
| Model | Credits / second | Credits / second (with audio) |
|---|---|---|
| Runway Gen3a Turbo | 1 | — |
| Runway Gen4 Turbo | 1 | — |
| Veo 3.1 Fast | 2 | 3 |
| Veo 3.1 | 4 | 7 |
| Seedance (480p/720p) | 2 | — |
| Seedance (1080p) | 3 | — |
| Kling v2.1 Master | 6 | — |
| Kling v2.1 Pro | 3 | — |
| Kling v2.1 Standard | 1 | — |
| Kling v2.5 Turbo | 2 | — |
| Kling v2.6 Pro | 2 | 3 |
| Kling 3.0 Pro | 3 | 4 |
| Kling 3.0 Standard | 2 | 3 |
| Sora 2 | 2 | — |
| Sora 2 Pro | 6 | — |
| WAN 2.5 Pro | 2 | — |
| Image model | Credits / image |
|---|---|
| Nano Banana | 1 |
| Nano Banana Pro (1K) | 2 |
| Nano Banana Pro (2K) | 4 |
| Nano Banana Pro (4K) | 6 |
Upload → generate → preview → download. Spectoria helps you produce lip sync without guesswork.
FAQ
What audio formats are supported?
The UI accepts common audio files like MP3 and WAV, and will generally work with supported audio formats you can upload via your browser.
How does Spectoria estimate lip sync credits?
Spectoria estimates cost from your audio duration and shows credits-per-second guidance (about 3 credits per second).
Do I need a reference video?
Yes. Lipsync needs a reference video (or portrait video) to match the face and timing for the synced mouth movement.
