- GenAI Studio doesn’t include model files or container images.
- You must generate the models and build container images.
- Ensure your host computer meets the requirements.
- Use the host computer to generate models.
- Use the target device to build the docker containers.
| Use case | Description | Supported platforms | Model(s) |
|---|---|---|---|
| Text-to-text | Generates human-like responses with LLM for input prompts. Useful for creating articles, summaries, reports, or creative content automatically. | IQ9 Qualcomm Linux | LLaMA 3.2-3B Qwen 3-4B |
| Text-to-speech | Transforms text into clear, natural-sounding audio. Ideal for voice assistants, audiobooks, and accessibility solutions. | IQ9 Qualcomm Linux | Melo-TTS |
| Text-to-image | Generates images from text descriptions. Ideal for creating graphics, illustrations, or visual content without manual design. | Not Supported | Stable Diffusion 2.1 |
| Speech-to-text | Turns spoken words into written text. Helpful for transcription, voice commands, and hands-free applications. | Not Supported | Whisper Tiny |
| Image-to-text | Describes images with natural language for visual understanding and moderation. | IQ9 Qualcomm Linux | Qwen 2.5-VL-7B |
| Orchestrator | Provides an easy-to-use web interface to access all these features in one place, making it simple for developers and users. | IQ9 Qualcomm Linux | All models listed above |
High-level architecture
The following diagram shows the high-level architecture of GenAI Studio. Each functional block runs as an independent container, providing isolation, scalability, and extensibility. This allows new modalities to be added as additional containers without disrupting existing services.Each endpoint has a specific port. Note the port number in endpoint port numbers.

| Service | Port | OpenAI-compatible endpoint |
|---|---|---|
| Text-to-speech | 8083 | POST /v1/audio/speech |
| Text-to-image | 8084 | POST /v1/images/generations |
| Image-to-text | 8080 | POST /v1/responses |
| Text-to-text | 8088 | POST /v1/chat/completions |
| Speech-to-text | 8081 | POST /v1/audio/transcriptions |
| Orchestrator | 8090 | Unified gateway |
User interaction
The user accesses GenAI Studio through a web page hosted on the host computer or through API calls from third-party applications with OpenAI-compatible endpoints.Request routing to target
The webpage or API call communicates with the orchestrator service running on the target device (for example Qualcomm Dragonwing™ IQ9) and converts each user action on the UI into a corresponding REST API call for the backend services.Orchestration layer
The orchestrator container acts as the central hub. It receives requests, manages session history, handles multi-turn chat continuity (KV-cache), and aggregates responses from individual modality containers to present a unified experience to the user. Containerized architecture and scalability: Each functional block (orchestrator, text-to-text, text-to-speech, text-to-image, image-to-text, speech-to-text) runs as an independent container. This provides the following:- Isolation between services.
- Scalability, since each service scales independently based on use case.
- Extensibility, since you can add new modalities or models as additional containers.
Text-to-text
This application runs a text-to-text large language model (LLM) using the Genie API. It provides a persistent (always-on) LLM server that supports the following:- LLM response generation from user prompts (text-to-text).
- Preloaded model reuse to avoid reloading for each request.
- Control endpoints for model/session reset and model reload.
- System prompt updates directly from the UI.
- Conversation history that allows the UI to fetch previous messages.

Text-to-image
Known Issue: Text-to-image is not supported in the GA release.

Image-to-text
The image-to-text container uses a vision-language model (VLM) to describe images with natural language, enabling visual understanding and moderation use cases. For more information, see the image-to-text README and the code flow files.Text-to-speech
The following image shows an example call flow sequence in a text-to-speech container.
Speech-to-text
Known Issue: Speech-to-text is not supported in the GA release.

Setup GenAI Studio
GenAI Studio is supported on IQ9 with both Qualcomm Linux distributions and Ubuntu on Qualcomm IoT platforms.Language models aren’t shipped with GenAI Studio. You must generate a model using AI Hub.
- On the host computer, download the precompiled Qualcomm Linux build image for your EVK from CodeLinaro.
-
From the host computer, flash the image to the target device.
On QLI, Qualcomm-dependent DSP libraries and Docker are pre-installed.
-
Clone the repository to the target device and go to the local directory:
-
On the host computer, prepare the SDK (optional).
This is only required if you are bringing up the full stack or a private STT or TTS service.
-
From the sample app repository root directory on the target device, complete the preflight checks:
Ensure that the device is properly provisioned before running the preflight checks. For initial device provisioning, see device setup.
-
Prepare models for target.
See each service’s model generation and setup documentation for more information.
Service Target path Documentation Text-to-text /opt/genai-studio-models/text-to-text/...MODEL_SETUP.md Image-to-text /opt/genai-studio-models/image-to-text/Lemans_LE_Gen2_QNN2_41_qwen25_vl_7B/filesMODEL_SETUP.md Text-to-image /opt/genai-studio-models/text-to-image/stable_diffusion_v2_1-qnn_context_binary-w8a16-qualcomm_qcs9075MODEL_SETUP.md Speech-to-text /opt/genai-studio-models/speech-to-text/whisper_tiny-qnn_context_binary-float-qualcomm_qcs9075MODEL_SETUP.md Text-to-speech /opt/genai-studio-models/text-to-speech/melo-tts-v73/filesModel-Generation.md -
Build base images (one-time setup).
-
Build service images.
See the service README files (for example, core-services/README.md)
for guidance to ensure all required files are in place.
-
Start services with
docker-compose. See the README for the recommended environment variables to set before runningdocker-compose. -
Run service health checks and functional checks to verify all services are running:
-
Run the unified test suite:
For functional endpoint checks and testing, see functional endpoint test and the individual service README files.
docs folder.
To check service logs, run the following commands:

