Audio & Images

djangosdk provides APIs for audio transcription, text-to-speech synthesis, and image generation.

Audio

Transcription (Speech-to-Text)

from djangosdk.audio.transcribe import transcribe, atranscribe

# Synchronous
transcript = transcribe("/path/to/audio.mp3", model="whisper-1")
print(transcript.text)

# Async
transcript = await atranscribe("/path/to/audio.mp3")

Supported models: OpenAI Whisper (whisper-1), Groq Whisper (whisper-large-v3), and any model supported by litellm.

Synthesis (Text-to-Speech)

from djangosdk.audio.synthesize import synthesize, asynthesize

audio_bytes = synthesize(
    "Hello, how can I help you today?",
    model="tts-1",
    voice="alloy",
)

with open("output.mp3", "wb") as f:
    f.write(audio_bytes)

Supported voices (OpenAI): alloy, echo, fable, onyx, nova, shimmer

In a Django View

Images

Image Generation

Supported models:

  • OpenAI: dall-e-3, dall-e-2

  • Google: imagen-3 (via Vertex AI)

  • xAI: grok-2-image (Aurora)

In a Django View

Configuration

Last updated

Was this helpful?