Skip to main content

Overview

Many LLM applications require access to user-specific data that is not included in the model's training dataset. A key method to achieve this is through Retrieval Augmented Generation (RAG). During this process, external data is sourced and then integrated into the LLM during the generation phase.

TaskingAI offers comprehensive tools for the retrieval system, ranging from straightforward to intricate setups. This section of the documentation delves into all aspects related to the retrieval phase, such as record management and text chunks querying.

Key Components

TaskingAI's knowledge system is designed to manage and retrieve data efficiently. At its core, the system revolves around collections, records, and chunks, each serving a specific purpose in the data retrieval process. The system is engineered to handle various data sources and formats, ensuring that users can quickly and accurately access the information they need.

Here are the roles of each component:

  • Collections: These are the fundamental units of the retrieval system, each acting as an independent index. Text collection is the most fundamental type for holding textual information, while QA collection is a specialized type that is optimized for question-answering pairs.
  • Records: Representing the data sources, records can be of various formats, including pure text, text extracted from files like PDF, TXT, and Word documents, and text extracted from web pages.
  • Chunks: These are segments of text extracted and split from records, stored within collections.