openvino-langchain
v0.0.2
Published
OpenVINO Generative AI integration for LangChain.js
Downloads
10
Readme
OpenVINO™ LangChain.js adapter
This package contains the LangChain.js integrations for OpenVINO™
Disclaimer It's preview version, do not use it on production!
Introduction
OpenVINO is an open-source toolkit for deploying performant AI solutions. Convert, optimize, and run inference on local hardware utilizing the full potential of Intel® hardware.
Installation and Setup
See this section for general instructions on installing integration packages.
npm install openvino-langchainExport your model to the OpenVINO™ IR
In order to use OpenVINO, you need to convert and compress the text generation model into the OpenVINO IR format.
The following models are tested:
- Embeddings models:
- BAAI/bge-small-en-v1.5
- intfloat/multilingual-e5-large
- sentence-transformers/all-MiniLM-L12-v2
- sentence-transformers/all-mpnet-base-v2
- Large language models:
- openlm-research/open_llama_7b_v2
- meta-llama/Llama-2-13b-chat-hf
- microsoft/Phi-3.5-mini-instruct
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
- Tool calling models:
- Qwen/Qwen2.5-7B-Instruct
Use HuggingFace Hub
Pre-converted and pre-optimized models are available under the LLM collections in the OpenVINO Toolkit organization.
To export another model hosted on the HuggingFace Hub you can use OpenVINO space. After conversion, a repository will be pushed under your namespace, this repository can be either public or private.
Use the Optimum Intel
Optimum Intel provides a simple interface to optimize your Transformers and Diffusers models and convert them to the OpenVINO Intermediate Representation (IR) format.
Firstly install Optimum Intel for OpenVINO:
pip install --upgrade --upgrade-strategy eager "optimum[openvino]"Then you download and convert a model to OpenVINO:
optimum-cli export openvino --model <model_id> --trust-remote-code <exported_model_name>Note: Any model_id, for example "TinyLlama/TinyLlama-1.1B-Chat-v1.0", or the path to a local model file can be used.
Optimum-Intel API also provides out-of-the-box model optimization through weight compression using NNCF which substantially reduces the model footprint and inference latency:
optimum-cli export openvino --model "TinyLlama/TinyLlama-1.1B-Chat-v1.0" --weight-format int4 --trust-remote-code "TinyLlama-1.1B-Chat-v1.0"LLM
This package contains the OpenVINO class, which is the recommended way to interact with models optimized for the OpenVINO toolkit.
OpenVINO Parameters
| Name | Type | Required | Description | | ----- | ---- |--------- | ----------- | | modelPath | string | ✅ | Path to the directory containing model xml/bin files and tokenizer | | device | string | ❌ | Device to run the model on (e.g., CPU, GPU). | | generationConfig | GenerationConfig | ❌ | Structure to keep generation config parameters. |
import { OpenVINO } from "openvino-langchain";
const model = new OpenVINO({
modelPath: "path-to-model",
device: "CPU",
generationConfig: {
"max_new_tokens": 100,
},
});
const response = await model.invoke("Hello, world!");ChatModel
This package contains the ChatOpenVINO class, which allow use the OpenVINO for chat pipelines.
ChatOpenVINO Parameters
| Name | Type | Required | Description | | ----- | ---- |--------- | ----------- | | modelPath | string | ✅ | Path to the directory containing model xml/bin files and tokenizer | | device | string | ❌ | Device to run the model on (e.g., CPU, GPU). | | generationConfig | GenerationConfig | ❌ | Structure to keep generation config parameters. |
import { ChatOpenVINO } from "openvino-langchain";
import { HumanMessage, SystemMessage } from '@langchain/core/messages';
const model = new ChatOpenVINO({
modelPath: "path-to-model",
device: "CPU",
generationConfig: {
"max_new_tokens": 100,
},
});
const messages = [
new SystemMessage('Translate the following from English into German'),
new HumanMessage('Thank you!'),
];
const response = await model.invoke(messages);
console.log(response.content);Tool calling (beta)
ChatOpenVINO has limited support for tool calling. The simplest way to set up a tool for use by an LLM is to use the bindTool function. You can also create an agent and set up the necessary tools to use them with ChatOpenVINO.
Different models have different tool calling requirements, so each model requires specific tool processing. Currently, however, tool calling is only supported for the Qwen model in ChatOpenVINO.
The list of supported models will expand in the future. Please follow the new releases.
Text Embedding Model
This package also adds support for OpenVINO's embeddings model.
| Name | Type | Required | Description | | ----- | ---- |--------- | ----------- | | modelPath | string | ✅ | Path to the directory containing embeddings model | | device | string | ❌ | Device to run the embeddings model on (e.g., CPU, GPU). |
import { OvEmbeddings } from "openvino-langchain";
const embeddings = new OvEmbeddings({
modelPath: "path-to-model",
device: "CPU",
});
const res = await embeddings.embedQuery("Hello world");