Skip to main content

Hugging Face Inference Endpoint (Dedicated)

This document provides information on how to integrate Hugging Face models though Hugging Face's Inference Endpoint (Dedicated). For information about integrating with Inference API (Serverless), please refer to other documents.

Requisites

To use models provided by Hugging Face, you need to have an Hugging Face API key. You can get one by signing up at Hugging Face.

Required credentials:

  • HUGGING_FACE_API_KEY: Your Hugging Face API key.
  • HUGGING_INFERENCE_ENDPOINT_URL: The URL of the of your dedicated Hugging Face Inference Endpoint.

Supported Models:

NOTE: Only Text Generation models that are available to Hugging Face Inference Endpoint (Dedicated) are supported by this integration.

All models provided by Hugging Face has the following properties:

  • Function call: Not supported
  • Streaming: Not supported

For extra configs when generating texts, different models may accept different parameters, please check your target model's documentation. All the following parameters will be accepted by TaskingAI, but may not take effect if the target model rejects:

  • temperature
  • top_p
  • max_tokens
  • stop
  • top_k

Wildcard

  • Model schema id: huggingface/wildcard

Since Hugging Face is a platform that hosts thousands of models, TaskingAI created a wildcard model that can integrate all eligible models on Hugging Face. Currently, the eligible models are Text Generation models that are available for Hugging Face Inference Endpoint (Dedicated) service.

Hugging Face Deploy

To integrate a specific model, pass the model id to the Provider Model Id parameter, for example google/gemma-7b, meta-llama/Llama-2-7b-chat-hf.

Hugging Face Inference Endpoint integration