Introduction to OpenVINO GenAI
What is OpenVINO GenAI?β
OpenVINOβ’ GenAI is a library of the most popular Generative AI model pipelines, optimized execution methods, and samples that run on top of highly performant OpenVINO Runtime. It provides simplified APIs for running generative models, hiding the complexity of the generation process and enabling developers to easily integrate state-of-the-art generative models into their applications with minimal code.
As a lightweight solution designed for efficient inference, OpenVINO GenAI includes all the core functionality needed for generative model execution (e.g. tokenization via openvino-tokenizers
) with no external dependencies required.
This library is friendly to PC and laptop execution, and optimized for resource consumption.
Key Features and Benefitsβ
- π¦ Pre-built Generative AI Pipelines: Ready-to-use pipelines for text generation (LLMs), image generation (Diffuser-based), speech processing (Whisper), and visual language models (VLMs). See all supported use cases.
- π£ Minimal Footprint: Smaller binary size and reduced memory footprint compared to other frameworks.
- π Performance Optimization: Hardware-specific optimizations for CPU, GPU, and NPU devices.
- π¨βπ» Programming Language Support: Comprehensive APIs in both Python and C++.
- ποΈ Model Compression: Support for 8-bit and 4-bit weight compression, including embedding layers.
- π Advanced Inference Capabilities: In-place KV-cache, dynamic quantization, speculative sampling, and more.
- π¨ Wide Model Compatibility: Support for popular models including Llama, Mistral, Phi, Qwen, Stable Diffusion, Flux, Whisper, and others. Refer to the Supported Models for more details.
Workflow Overviewβ
Using OpenVINO GenAI typically involves three main steps:
-
Model Preparation:
- Convert model from other frameworks to the OpenVINO IR format (e.g. using
optimum-intel
), optionally applying weights compression. - Download pre-converted model in OpenVINO IR format (e.g. from OpenVINO Toolkit organization on Hugging Face).
infoYou can use models from Hugging Face and ModelScope
- Convert model from other frameworks to the OpenVINO IR format (e.g. using
-
Pipeline Setup: Initialize the appropriate pipeline for your task (LLM, Text to Image, Whisper, VLM, etc.) with the converted model.
-
Inference: Run the model with your inputs using the pipeline's simple API.
Comparison with Alternativesβ
Unlike base OpenVINO, which requires manual implementation of generation loops, tokenization, scheduling etc., OpenVINO GenAI provides these components in a ready-to-use package.
Compared to Hugging Face Optimum Intel, OpenVINO GenAI offers a smaller footprint, fewer dependencies, and better performance optimization options, particularly for C++ applications.
Feature | OpenVINO GenAI | Base OpenVINO | Hugging Face Optimum Intel |
---|---|---|---|
Easy-to-use APIs | β | β | β |
Low footprint | β | β | β |
C++ support | β | β | β |
Pre-built pipelines | β | β | β |
Model variety | Medium | High | High |
System Requirementsβ
Refer to the OpenVINO System Requirements for more details.