Develop a generative AI (GenAI) application: GenAI Studio

GenAI Studio is a containerized solution to enable rapid prototyping and customization by simplifying the creation and deployment of Generative AI applications on Qualcomm Linux systems. GenAI Studio provides OpenAI-compatible APIs for the following core services: text generation, text-to-speech, image generation, image-to-text (VLM), and speech-to-text. All services are accessible through a web interface and OpenAI-compatible API endpoints.

GenAI Studio doesn’t include model files or container images.
You must generate the models and build container images.
Ensure your host computer meets the requirements.
Use the host computer to generate models.
Use the target device to build the docker containers.

Use case	Description	Supported platforms	Model(s)
Text-to-text	Generates human-like responses with LLM for input prompts. Useful for creating articles, summaries, reports, or creative content automatically.	IQ9 Qualcomm Linux	LLaMA 3.2-3B Qwen 3-4B
Text-to-speech	Transforms text into clear, natural-sounding audio. Ideal for voice assistants, audiobooks, and accessibility solutions.	IQ9 Qualcomm Linux	Melo-TTS
Text-to-image	Generates images from text descriptions. Ideal for creating graphics, illustrations, or visual content without manual design.	Not Supported	Stable Diffusion 2.1
Speech-to-text	Turns spoken words into written text. Helpful for transcription, voice commands, and hands-free applications.	Not Supported	Whisper Tiny
Image-to-text	Describes images with natural language for visual understanding and moderation.	IQ9 Qualcomm Linux	Qwen 2.5-VL-7B
Orchestrator	Provides an easy-to-use web interface to access all these features in one place, making it simple for developers and users.	IQ9 Qualcomm Linux	All models listed above

High-level architecture

The following diagram shows the high-level architecture of GenAI Studio. Each functional block runs as an independent container, providing isolation, scalability, and extensibility. This allows new modalities to be added as additional containers without disrupting existing services.

Each endpoint has a specific port. Note the port number in endpoint port numbers.

High-level architecture of GenAI Studio showing containerized use cases and backend interactions

The following table lists each endpoint and its associated port number. Endpoint port numbers

Service	Port	OpenAI-compatible endpoint
Text-to-speech	8083	`POST /v1/audio/speech`
Text-to-image	8084	`POST /v1/images/generations`
Image-to-text	8080	`POST /v1/responses`
Text-to-text	8088	`POST /v1/chat/completions`
Speech-to-text	8081	`POST /v1/audio/transcriptions`
Orchestrator	8090	Unified gateway

User interaction

The user accesses GenAI Studio through a web page hosted on the host computer or through API calls from third-party applications with OpenAI-compatible endpoints.

Request routing to target

The webpage or API call communicates with the orchestrator service running on the target device (for example Qualcomm Dragonwing™ IQ9) and converts each user action on the UI into a corresponding REST API call for the backend services.

Orchestration layer

The orchestrator container acts as the central hub. It receives requests, manages session history, handles multi-turn chat continuity (KV-cache), and aggregates responses from individual modality containers to present a unified experience to the user. Containerized architecture and scalability: Each functional block (orchestrator, text-to-text, text-to-speech, text-to-image, image-to-text, speech-to-text) runs as an independent container. This provides the following:

Isolation between services.
Scalability, since each service scales independently based on use case.
Extensibility, since you can add new modalities or models as additional containers.

See the documentation of each respective use case for more detailed information.

Text-to-text

This application runs a text-to-text large language model (LLM) using the Genie API. It provides a persistent (always-on) LLM server that supports the following:

LLM response generation from user prompts (text-to-text).
Preloaded model reuse to avoid reloading for each request.
Control endpoints for model/session reset and model reload.
System prompt updates directly from the UI.
Conversation history that allows the UI to fetch previous messages.

Call flow of the GenAI studio components and how they interact with each other in a text generation use case.

For more information, see the text-to-text README and the code flow files.

Text-to-image

Known Issue: Text-to-image is not supported in the GA release.

The following image shows an example call flow sequence in a text-to-image container.

Call flow of the GenAI studio components and how they interact with each other in an application.

For more information, see the text-to-image README and the code flow files.

Image-to-text

The image-to-text container uses a vision-language model (VLM) to describe images with natural language, enabling visual understanding and moderation use cases. For more information, see the image-to-text README and the code flow files.

Text-to-speech

The following image shows an example call flow sequence in a text-to-speech container.

For more information, see the text-to-speech README.

Speech-to-text

Known Issue: Speech-to-text is not supported in the GA release.

The following image shows an example call flow sequence in a speech-to-text container.

For more information, see the speech-to-text README and the code flow files.

Setup GenAI Studio

GenAI Studio is supported on IQ9 with both Qualcomm Linux distributions and Ubuntu on Qualcomm IoT platforms.

Language models aren’t shipped with GenAI Studio. You must generate a model using AI Hub.

On the host computer, download the precompiled Qualcomm Linux build image for your EVK from CodeLinaro.
From the host computer, flash the image to the target device.
On QLI, Qualcomm-dependent DSP libraries and Docker are pre-installed.

Clone the repository to the target device and go to the local directory:

git clone https://github.com/qualcomm/sample-apps-for-qualcomm-linux.git

cd sample-apps-for-qualcomm-linux/GenAI-Solutions/GenAI-Studio

On the host computer, prepare the SDK (optional).

This is only required if you are bringing up the full stack or a private STT or TTS service.

qpm-cli --login

qpm-cli --install VoiceAI_ASR -v 2.5.0.0 --path /opt/qcom/qpm/VoiceAI_ASR/2.5.0.0 --silent

qpm-cli --install VoiceAI_TTS -v 1.1.1.0 --path /opt/qcom/qpm/VoiceAI_TTS/1.1.1.0 --silent

TARGET_REPO=/path/to/genai-studio-on-target

rsync -av /opt/qcom/qpm/VoiceAI_ASR/2.5.0.0/whisper_sdk/ \
  ubuntu@<target-host>:${TARGET_REPO}/core-services/speech-to-text/whisper_sdk/

rsync -av /opt/qcom/qpm/VoiceAI_TTS/1.1.1.0/melo_sdk/ \
  ubuntu@<target-host>:${TARGET_REPO}/core-services/text-to-speech/meloTTS/melo_sdk/

From the sample app repository root directory on the target device, complete the preflight checks:
Ensure that the device is properly provisioned before running the preflight checks. For initial device provisioning, see device setup.
```
docker --version
```
```
docker compose version
```
```
python3 --version
```
```
ls -l /dev/fastrpc-cdsp
```
```
ls /etc/cdi/
```

Prepare models for target. See each service’s model generation and setup documentation for more information.

Prepare the model folders under /opt/genai-studio-models.

Service	Target path	Documentation
Text-to-text	`/opt/genai-studio-models/text-to-text/...`	MODEL_SETUP.md
Image-to-text	`/opt/genai-studio-models/image-to-text/Lemans_LE_Gen2_QNN2_41_qwen25_vl_7B/files`	MODEL_SETUP.md
Text-to-image	`/opt/genai-studio-models/text-to-image/stable_diffusion_v2_1-qnn_context_binary-w8a16-qualcomm_qcs9075`	MODEL_SETUP.md
Speech-to-text	`/opt/genai-studio-models/speech-to-text/whisper_tiny-qnn_context_binary-float-qualcomm_qcs9075`	MODEL_SETUP.md
Text-to-speech	`/opt/genai-studio-models/text-to-speech/melo-tts-v73/files`	Model-Generation.md

Build base images (one-time setup).

bash scripts/download-qairt-sdk.sh --service base

bash scripts/pull-ubuntu-arm64.sh

DOCKER_BUILDKIT=1 docker build --progress=plain -f Dockerfile.runtime -t ubuntu-runtime:24.04 .

DOCKER_BUILDKIT=1 docker build --progress=plain -f Dockerfile.build-base -t genai-build-base:latest .

Build service images. See the service README files (for example, core-services/README.md) for guidance to ensure all required files are in place.

DOCKER_BUILDKIT=1 docker build --progress=plain -t text-to-text:latest core-services/text-to-text/

DOCKER_BUILDKIT=1 docker build --progress=plain -t image-to-text:responses-v1 core-services/image-to-text/

DOCKER_BUILDKIT=1 docker build --progress=plain -t text-to-image:latest core-services/text-to-image/

DOCKER_BUILDKIT=1 docker build --progress=plain -t speech-to-text:latest core-services/speech-to-text/

DOCKER_BUILDKIT=1 docker build --progress=plain -t text-to-speech:latest core-services/text-to-speech/meloTTS/

DOCKER_BUILDKIT=1 docker build --progress=plain -t orchestrator:latest core-services/orchestrator/

Start services with docker-compose. See the README for the recommended environment variables to set before running docker-compose.
```
docker-compose up -d
```
```
docker ps
```

Run service health checks and functional checks to verify all services are running:

curl -sf http://127.0.0.1:8080/health >/dev/null && echo "Image-to-Text (8080) OK"

curl -sf http://127.0.0.1:8081/health >/dev/null && echo "Speech-to-Text (8081) OK"

curl -sf http://127.0.0.1:8083/health >/dev/null && echo "Text-to-Speech (8083) OK"

curl -sf http://127.0.0.1:8084/health >/dev/null && echo "Text-to-Image (8084) OK"

curl -sf http://127.0.0.1:8088/health >/dev/null && echo "Text-to-Text (8088) OK"

curl -sf http://127.0.0.1:8090/api/status >/dev/null && echo "Orchestrator (8090) OK"

Run the unified test suite:

python3 -m pip install --user -r tests/unified/requirements.txt

python3 tests/unified/run_manifest.py --target-host <TARGET_DEVICE_IP>

For functional endpoint checks and testing, see functional endpoint test and the individual service README files.

If any service fails to collect logs, see the service’s troubleshooting and pain point documents from the service’s docs folder. To check service logs, run the following commands:

docker compose ps

docker logs --tail 200 text-to-text

docker logs --tail 200 image-to-text

docker logs --tail 200 text-to-image

docker logs --tail 200 speech-to-text

docker logs --tail 200 text-to-speech

docker logs --tail 200 orchestrator

​High-level architecture

​User interaction

​Request routing to target

​Orchestration layer

​Text-to-text

​Text-to-image

​Image-to-text

​Text-to-speech

​Speech-to-text

​Setup GenAI Studio

High-level architecture

User interaction

Request routing to target

Orchestration layer

Text-to-text

Text-to-image

Image-to-text

Text-to-speech

Speech-to-text

Setup GenAI Studio