Create Record
Create Record in Text Collection
You can create a record in your collection using any of the following methods:
- Provide the text content directly.
- Provide the URL of a webpage, and TaskingAI will scrape the contents.
- Upload a file, and TaskingAI will extract the text content. Supported file formats include:
.txt
,.pdf
,.docx
,.md
,.html
.
Text Splitters
When creating records that involve text processing, you can use a text splitter to divide the content into more manageable chunks. TaskingAI currently supports two primary types of text splitters:
TokenTextSplitter
This splitter divides the text based on a specified number of tokens. It is configured with the following parameters:
chunk_size
: The maximum number of tokens per chunk. This determines how large each text chunk can be.chunk_overlap
: The number of tokens that consecutive chunks will overlap. A value of 0 indicates that there is no overlap between chunks.
Example Usage:
token_text_splitter = {
"type": "token",
"chunk_size": 200,
"chunk_overlap": 20
}
SeparatorTextSplitter
This splitter uses specified separators to divide the text into chunks. If a separated chunk exceeds the chunk_size
, it will be further divided into smaller chunks.
The parameters for this splitter are:
separators
: A list of delimiter strings used to split the text.chunk_size
: The maximum number of tokens per chunk. (This is a mandatory parameter.)chunk_overlap
: The number of tokens that consecutive chunks will overlap. A value of 0 indicates no overlap.
Example Usage:
separator_text_splitter = {
"type": "separator",
"separators": ["\n\n"],
"chunk_size": 200,
"chunk_overlap": 20
}
Both splitters are designed to optimize the processing of large text datasets by breaking them down into manageable segments.
Create a Record with Text Content
To create a new record using text, use the create_record
method. This method requires the following parameters:
collection_id
: The identifier of the collection where the record will be stored.content
: The textual content of the record.text_splitter
: The text splitter to use for dividing the text into smaller chunks.metadata
: A dictionary containing metadata for the record.
Example:
import taskingai
record = taskingai.retrieval.create_record(
collection_id="YOUR_COLLECTION_ID",
type="text",
content="Machine learning is a subfield of artificial intelligence...",
text_splitter={"type": "token", "chunk_size": 200, "chunk_overlap": 20},
metadata={"file_name": "machine_learning.pdf"}
)
After executing this function, a new record will be created within the specified collection.
The text_splitter
parameter is used only during the creation process to split the text into smaller chunks and is not stored as a property of the record.
Create a Record with a Web URL
To create a new record using a web URL, use the create_record
method and set the type
to web
, providing the URL of the webpage. The textual content of the webpage will be scraped and stored in the record.
Example:
import taskingai
record = taskingai.retrieval.create_record(
collection_id="YOUR_COLLECTION_ID",
type="web",
url="https://www.tasking.ai",
text_splitter={"type": "token", "chunk_size": 200, "chunk_overlap": 20},
)
Create a Record with an Uploaded File
Creating a record by uploading a file involves two steps: first, upload the file, and then create the record based on the uploaded file.
Upload a File
To upload a file, use the upload_file
method:
file = taskingai.file.upload_file(file=open("PATH_TO_FILE", "rb"), purpose="record_file")
print(f"Uploaded file ID: {file.file_id}")
Create a Record Based on the Uploaded File
record = taskingai.retrieval.create_record(
collection_id="YOUR_COLLECTION_ID",
type="file",
title="Machine Learning",
file_id=file.file_id,
text_splitter={"type": "token", "chunk_size": 200, "chunk_overlap": 20},
)
print(f"Created record: {record.record_id}\n")
In some cases, the record status will remain creating
for a short period after the creation call. Generally, after a few minutes, the record status will change to ready
.
When the record status changes to ready
, it means that the text has been effectively split into smaller fragments, and the embeddings for these chunks have been constructed. Only in the ready
status can the record chunks be retrieved in response to user queries.
Create Record in QA Collection
You can create a record in your QA collection using the following method:
- First, download either the CSV template file, or the Excel template file before importing.
- Then, add Q&A rows to fill in the sheet file.
- Finally, upload the sheet file and create a QA record using the TaskingAI SDK.
Here is an example:
import taskingai
file = taskingai.file.upload_file(file=open("PATH_TO_FILE", "rb"), purpose="qa_record_file")
print(f"Uploaded file ID: {file.file_id}")
record = taskingai.retrieval.create_record(
collection_id="YOUR_COLLECTION_ID",
type="qa_sheet",
file_id=file.file_id,
)
print(f"Created record: {record.record_id}\n")
Same as with text records, the record status will remain creating
for a short period after the creation call.
Only in the ready
status can the Q&A chunks be retrieved in response to user queries.