Skip to main content

Overview

TaskingAI-Inference is an autonomous backend service featuring a Unified API aimed at facilitating the smooth integration of Large Language Models (LLMs) and Text Embedding models from various providers. It acts as a bridge, enabling users to effortlessly access and utilize an extensive selection of LLMs. This service employs asynchronous techniques to ensure efficient and high-performance LLM inference.

The project is open-source on GitHub.

Architecture

Key Features

  1. Unified API Integration: Offers seamless integration of a variety of mainstream Large Language Models (LLMs) into the TaskingAI platform through unified APIs, enabling easy access and deployment.

  2. Function Call Support: Enables LLMs to interact with external resources through supported function calls, expanding their usability and integration capabilities.

  3. Asynchronous Requests for High-Concurrency: Designed to handle high-concurrency scenarios efficiently through the support of asynchronous requests, ensuring performance stability.

  4. End-to-End Encryption: Ensures the security of data and API keys with robust end-to-end AES encryption, safeguarding credential information.

  5. Custom Hosted Model Support: Allows users to deploy custom models locally and integrate them into the TaskingAI-Inference service for seamless access and deployment.

Supported Models

For a full list of supported models and their required credentials, please refer to Providers and Models.