Skip to main content

Overview

Many LLM applications require access to user-specific data that is not included in the model's training dataset. A key method to achieve this is through Retrieval Augmented Generation (RAG). During this process, external data is sourced and then integrated into the LLM during the generation phase.

TaskingAI offers comprehensive tools for the retrieval system, ranging from straightforward to intricate setups. This section of the documentation delves into all aspects related to the retrieval phase, such as record management and text chunks querying.


Key Components

TaskingAI's retrieval system is designed to manage and retrieve data efficiently. At its core, the system revolves around collections, records, and chunks, each serving a specific purpose in the data retrieval process. The system is engineered to handle various data sources and formats, ensuring that users can quickly and accurately access the information they need.

Here are the roles of each component:

  • Collections: These are the fundamental units of the retrieval system, each acting as an independent index.
  • Records: Representing the data sources, records can be of various formats, including pure text or text extracted from files like PDF, TXT, and Word documents.
  • Chunks: These are segments of text extracted and split from records, stored within collections.

Key Features

Document Splitter

TaskingAI provides sophisticated functionality for breaking down records (which can be text or files) into more manageable text chunks. This process is crucial for efficient information retrieval, as it isolates the most relevant sections of the records for easier access and analysis.

Text Embedding Models

Creating embeddings for documents is another crucial retrieval aspect. Embeddings encapsulate the semantic essence of text, facilitating the rapid and efficient identification of text segments with similar content. TaskingAI integrates with multiple embedding providers and models, from open-source to proprietary APIs, giving you the flexibility to select the most fitting option for your needs. TaskingAI provides a unified interface for easy model switching.

Vector Stores: TaskingVec

TaskingAI enhances the retrieval experience by utilizing its proprietary vector storage system, TaskingVec. TaskingVec is engineered to deliver a reliable and swift retrieval experience, seamlessly integrated with the user's preferred embedding model. This combination ensures that users can enjoy a rapid querying experience without needing to delve into the complexities of vector operations.