Skip to main content

Create Record

Create Record in Text Collection

You can create a record in your collection using any of the following methods:

  1. Provide the text content directly.
  2. Provide the URL of a webpage, and TaskingAI will scrape the contents.
  3. Upload a file, and TaskingAI will extract the text content. Supported file formats include: .txt, .pdf, .docx, .md, .html.

Text Splitters

When creating records that involve text processing, you can use a text splitter to divide the content into more manageable chunks. TaskingAI currently supports two primary types of text splitters:

TokenTextSplitter

This splitter divides the text based on a specified number of tokens. It is configured with the following parameters:

  • chunk_size: The maximum number of tokens per chunk. This determines how large each text chunk can be.
  • chunk_overlap: The number of tokens that consecutive chunks will overlap. A value of 0 indicates that there is no overlap between chunks.

Example Usage:

token_text_splitter = {
"type": "token",
"chunk_size": 200,
"chunk_overlap": 20
}

SeparatorTextSplitter

This splitter uses specified separators to divide the text into chunks. If a separated chunk exceeds the chunk_size, it will be further divided into smaller chunks.

The parameters for this splitter are:

  • separators: A list of delimiter strings used to split the text.
  • chunk_size: The maximum number of tokens per chunk. (This is a mandatory parameter.)
  • chunk_overlap: The number of tokens that consecutive chunks will overlap. A value of 0 indicates no overlap.

Example Usage:

separator_text_splitter = {
"type": "separator",
"separators": ["\n\n"],
"chunk_size": 200,
"chunk_overlap": 20
}

Both splitters are designed to optimize the processing of large text datasets by breaking them down into manageable segments.

Create a Record with Text Content

To create a new record using text, use the create_record method. This method requires the following parameters:

  • collection_id: The identifier of the collection where the record will be stored.
  • content: The textual content of the record.
  • text_splitter: The text splitter to use for dividing the text into smaller chunks.
  • metadata: A dictionary containing metadata for the record.

Example:

import taskingai

record = taskingai.retrieval.create_record(
collection_id="YOUR_COLLECTION_ID",
type="text",
content="Machine learning is a subfield of artificial intelligence...",
text_splitter={"type": "token", "chunk_size": 200, "chunk_overlap": 20},
metadata={"file_name": "machine_learning.pdf"}
)

After executing this function, a new record will be created within the specified collection.

note

The text_splitter parameter is used only during the creation process to split the text into smaller chunks and is not stored as a property of the record.

Create a Record with a Web URL

To create a new record using a web URL, use the create_record method and set the type to web, providing the URL of the webpage. The textual content of the webpage will be scraped and stored in the record.

Example:

import taskingai

record = taskingai.retrieval.create_record(
collection_id="YOUR_COLLECTION_ID",
type="web",
url="https://www.tasking.ai",
text_splitter={"type": "token", "chunk_size": 200, "chunk_overlap": 20},
)

Create a Record with an Uploaded File

Creating a record by uploading a file involves two steps: first, upload the file, and then create the record based on the uploaded file.

Upload a File

To upload a file, use the upload_file method:

file = taskingai.file.upload_file(file=open("PATH_TO_FILE", "rb"), purpose="record_file")
print(f"Uploaded file ID: {file.file_id}")

Create a Record Based on the Uploaded File

record = taskingai.retrieval.create_record(
collection_id="YOUR_COLLECTION_ID",
type="file",
title="Machine Learning",
file_id=file.file_id,
text_splitter={"type": "token", "chunk_size": 200, "chunk_overlap": 20},
)
print(f"Created record: {record.record_id}\n")
info

In some cases, the record status will remain creating for a short period after the creation call. Generally, after a few minutes, the record status will change to ready.

When the record status changes to ready, it means that the text has been effectively split into smaller fragments, and the embeddings for these chunks have been constructed. Only in the ready status can the record chunks be retrieved in response to user queries.

Create Record in QA Collection

You can create a record in your QA collection using the following method:

  • First, download either the CSV template file, or the Excel template file before importing.
  • Then, add Q&A rows to fill in the sheet file.
  • Finally, upload the sheet file and create a QA record using the TaskingAI SDK.

Here is an example:

import taskingai 

file = taskingai.file.upload_file(file=open("PATH_TO_FILE", "rb"), purpose="qa_record_file")
print(f"Uploaded file ID: {file.file_id}")

record = taskingai.retrieval.create_record(
collection_id="YOUR_COLLECTION_ID",
type="qa_sheet",
file_id=file.file_id,
)
print(f"Created record: {record.record_id}\n")
info

Same as with text records, the record status will remain creating for a short period after the creation call. Only in the ready status can the Q&A chunks be retrieved in response to user queries.