Skip to main content

Introduction to OpenVINO GenAI

What is OpenVINO GenAI?​

OpenVINOβ„’ GenAI is a library of the most popular Generative AI model pipelines, optimized execution methods, and samples that run on top of highly performant OpenVINO Runtime. It provides simplified APIs for running generative models, hiding the complexity of the generation process and enabling developers to easily integrate state-of-the-art generative models into their applications with minimal code.

As a lightweight solution designed for efficient inference, OpenVINO GenAI includes all the core functionality needed for generative model execution (e.g. tokenization via openvino-tokenizers) with no external dependencies required. This library is friendly to PC and laptop execution, and optimized for resource consumption.

Key Features and Benefits​

  • πŸ“¦ Pre-built Generative AI Pipelines: Ready-to-use pipelines for text generation (LLMs), image generation (Diffuser-based), speech processing (Whisper), and visual language models (VLMs). See all supported use cases.
  • πŸ‘£ Minimal Footprint: Smaller binary size and reduced memory footprint compared to other frameworks.
  • πŸš€ Performance Optimization: Hardware-specific optimizations for CPU, GPU, and NPU devices.
  • πŸ‘¨β€πŸ’» Programming Language Support: Comprehensive APIs in both Python and C++.
  • πŸ—œοΈ Model Compression: Support for 8-bit and 4-bit weight compression, including embedding layers.
  • πŸŽ“ Advanced Inference Capabilities: In-place KV-cache, dynamic quantization, speculative sampling, and more.
  • 🎨 Wide Model Compatibility: Support for popular models including Llama, Mistral, Phi, Qwen, Stable Diffusion, Flux, Whisper, and others. Refer to the Supported Models for more details.

Workflow Overview​

Using OpenVINO GenAI typically involves three main steps:

  1. Model Preparation:

    • Convert model from other frameworks to the OpenVINO IR format (e.g. using optimum-intel), optionally applying weights compression.
    • Download pre-converted model in OpenVINO IR format (e.g. from OpenVINO Toolkit organization on Hugging Face).
    info

    You can use models from Hugging Face and ModelScope

  2. Pipeline Setup: Initialize the appropriate pipeline for your task (LLM, Text to Image, Whisper, VLM, etc.) with the converted model.

  3. Inference: Run the model with your inputs using the pipeline's simple API.

OpenVINO GenAI Workflow

Comparison with Alternatives​

Unlike base OpenVINO, which requires manual implementation of generation loops, tokenization, scheduling etc., OpenVINO GenAI provides these components in a ready-to-use package.

Compared to Hugging Face Optimum Intel, OpenVINO GenAI offers a smaller footprint, fewer dependencies, and better performance optimization options, particularly for C++ applications.

FeatureOpenVINO GenAIBase OpenVINOHugging Face Optimum Intel
Easy-to-use APIsβœ…βŒβœ…
Low footprintβœ…βœ…βŒ
C++ supportβœ…βœ…βŒ
Pre-built pipelinesβœ…βŒβœ…
Model varietyMediumHighHigh

System Requirements​

Refer to the OpenVINO System Requirements for more details.