carbon-typescript-sdk
v0.1.35
Published
Client for Carbon
Downloads
2,384
Maintainers
Readme
Carbon
Connect external data to LLMs, no matter the source.
Table of Contents
- Installation
- Getting Started
- Reference
carbon.auth.getAccessToken
carbon.auth.getWhiteLabeling
carbon.dataSources.queryUserDataSources
carbon.dataSources.revokeAccessToken
carbon.embeddings.getDocuments
carbon.embeddings.getEmbeddingsAndChunks
carbon.embeddings.uploadChunksAndEmbeddings
carbon.files.createUserFileTags
carbon.files.delete
carbon.files.deleteFileTags
carbon.files.deleteMany
carbon.files.deleteV2
carbon.files.getParsedFile
carbon.files.getRawFile
carbon.files.queryUserFiles
carbon.files.queryUserFilesDeprecated
carbon.files.resync
carbon.files.upload
carbon.files.uploadFromUrl
carbon.files.uploadText
carbon.health.check
carbon.integrations.connectDataSource
carbon.integrations.connectFreshdesk
carbon.integrations.connectGitbook
carbon.integrations.createAwsIamUser
carbon.integrations.getOauthUrl
carbon.integrations.listConfluencePages
carbon.integrations.listDataSourceItems
carbon.integrations.listFolders
carbon.integrations.listGitbookSpaces
carbon.integrations.listLabels
carbon.integrations.listOutlookCategories
carbon.integrations.listRepos
carbon.integrations.syncConfluence
carbon.integrations.syncDataSourceItems
carbon.integrations.syncFiles
carbon.integrations.syncGitHub
carbon.integrations.syncGitbook
carbon.integrations.syncGmail
carbon.integrations.syncOutlook
carbon.integrations.syncRepos
carbon.integrations.syncRssFeed
carbon.integrations.syncS3Files
carbon.organizations.get
carbon.organizations.update
carbon.users.delete
carbon.users.get
carbon.users.toggleUserFeatures
carbon.users.updateUsers
carbon.utilities.fetchUrls
carbon.utilities.fetchYoutubeTranscripts
carbon.utilities.processSitemap
carbon.utilities.scrapeSitemap
carbon.utilities.scrapeWeb
carbon.utilities.searchUrls
carbon.webhooks.addUrl
carbon.webhooks.deleteUrl
carbon.webhooks.urls
Installation
npm i carbon-typescript-sdk
pnpm i carbon-typescript-sdk
yarn add carbon-typescript-sdk
Getting Started
import { Carbon } from "carbon-typescript-sdk";
// Generally this is done in the backend to avoid exposing API key to the client
const carbonWithApiKey = new Carbon({
apiKey: "API_KEY",
customerId: "CUSTOMER_ID",
});
const accessToken = await carbonWithApiKey.auth.getAccessToken();
// Once an access token is obtained, it can be passed to the frontend
// and used to instantiate the SDK client without an API key
const carbon = new Carbon({
accessToken: accessToken.data.access_token,
});
// use SDK as usual
const whiteLabeling = await carbon.auth.getWhiteLabeling();
// etc.
Reference
carbon.auth.getAccessToken
Get Access Token
🛠️ Usage
const getAccessTokenResponse = await carbon.auth.getAccessToken();
🔄 Return
🌐 Endpoint
/auth/v1/access_token
GET
carbon.auth.getWhiteLabeling
Returns whether or not the organization is white labeled and which integrations are white labeled
:param current_user: the current user :param db: the database session :return: a WhiteLabelingResponse
🛠️ Usage
const getWhiteLabelingResponse = await carbon.auth.getWhiteLabeling();
🔄 Return
🌐 Endpoint
/auth/v1/white_labeling
GET
carbon.dataSources.queryUserDataSources
User Data Sources
🛠️ Usage
const queryUserDataSourcesResponse =
await carbon.dataSources.queryUserDataSources({
order_by: "created_at",
order_dir: "desc",
});
⚙️ Parameters
pagination: Pagination
order_by: OrganizationUserDataSourceOrderByColumns
order_dir: OrderDir
filters: OrganizationUserDataSourceFilters
🔄 Return
OrganizationUserDataSourceResponse
🌐 Endpoint
/user_data_sources
POST
carbon.dataSources.revokeAccessToken
Revoke Access Token
🛠️ Usage
const revokeAccessTokenResponse = await carbon.dataSources.revokeAccessToken({
data_source_id: 1,
});
⚙️ Parameters
data_source_id: number
🔄 Return
🌐 Endpoint
/revoke_access_token
POST
carbon.embeddings.getDocuments
For pre-filtering documents, using tags_v2
is preferred to using tags
(which is now deprecated). If both tags_v2
and tags
are specified, tags
is ignored. tags_v2
enables
building complex filters through the use of "AND", "OR", and negation logic. Take the below input as an example:
{
"OR": [
{
"key": "subject",
"value": "holy-bible",
"negate": false
},
{
"key": "person-of-interest",
"value": "jesus christ",
"negate": false
},
{
"key": "genre",
"value": "religion",
"negate": true
}
{
"AND": [
{
"key": "subject",
"value": "tao-te-ching",
"negate": false
},
{
"key": "author",
"value": "lao-tzu",
"negate": false
}
]
}
]
}
In this case, files will be filtered such that:
- "subject" = "holy-bible" OR
- "person-of-interest" = "jesus christ" OR
- "genre" != "religion" OR
- "subject" = "tao-te-ching" AND "author" = "lao-tzu"
Note that the top level of the query must be either an "OR" or "AND" array. Currently, nesting is limited to 3. For tag blocks (those with "key", "value", and "negate" keys), the following typing rules apply:
- "key" isn't optional and must be a
string
- "value" isn't optional and can be
any
or list[any
] - "negate" is optional and must be
true
orfalse
. If present andtrue
, then the filter block is negated in the resulting query. It isfalse
by default.
When querying embeddings, you can optionally specify the media_type
parameter in your request. By default (if
not set), it is equal to "TEXT". This means that the query will be performed over files that have
been parsed as text (for now, this covers all files except image files). If it is equal to "IMAGE",
the query will be performed over image files (for now, .jpg
and .png
files). You can think of this
field as an additional filter on top of any filters set in file_ids
and
When hybrid_search
is set to true, a combination of keyword search and semantic search are used to rank
and select candidate embeddings during information retrieval. By default, these search methods are weighted
equally during the ranking process. To adjust the weight (or "importance") of each search method, you can use
the hybrid_search_tuning_parameters
property. The description for the different tuning parameters are:
weight_a
: weight to assign to semantic searchweight_b
: weight to assign to keyword search
You must ensure that sum(weight_a, weight_b,..., weight_n)
for all n weights is equal to 1. The equality
has an error tolerance of 0.001 to account for possible floating point issues.
In order to use hybrid search for a customer across a set of documents, two flags need to be enabled:
- Use the
/modify_user_configuration
endpoint to to enablesparse_vectors
for the customer. The payload body for this request is below:
{
"configuration_key_name": "sparse_vectors",
"value": {
"enabled": true
}
}
- Make sure hybrid search is enabled for the documents across which you want to perform the search. For the
/uploadfile
endpoint, this can be done by setting the following query parameter:generate_sparse_vectors=true
Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI's
multimodal model; for text, we support OpenAI's text-embedding-ada-002
and Cohere's embed-multilingual-v3.0.
The model can be specified via the embedding_model
parameter (in the POST body for /embeddings
, and a query
parameter in /uploadfile
). If no model is supplied, the text-embedding-ada-002
is used by default. When performing
embedding queries, embeddings from files that used the specified model will be considered in the query.
For example, if files A and B have embeddings generated with OPENAI
, and files C and D have embeddings generated with
COHERE_MULTILINGUAL_V3
, then by default, queries will only consider files A and B. If COHERE_MULTILINGUAL_V3
is
specified as the embedding_model
in /embeddings
, then only files C and D will be considered. Make sure that
the set of all files you want considered for a query have embeddings generated via the same model. For now, do not
set VERTEX_MULTIMODAL
as an embedding_model
. This model is used automatically by Carbon when it detects an image file.
🛠️ Usage
const getDocumentsResponse = await carbon.embeddings.getDocuments({
query: "query_example",
k: 1,
include_all_children: false,
media_type: "TEXT",
embedding_model: "OPENAI",
});
⚙️ Parameters
query: string
Query for which to get related chunks and embeddings.
k: number
Number of related chunks to return.
tags: Record<string, Tags1
>
A set of tags to limit the search to. Deprecated and may be removed in the future.
query_vector: number
[]
Optional query vector for which to get related chunks and embeddings. It must have been generated by the same model used to generate the embeddings across which the search is being conducted. Cannot provide both query
and query_vector
.
file_ids: number
[]
Optional list of file IDs to limit the search to
parent_file_ids: number
[]
Optional list of parent file IDs to limit the search to. A parent file describes a file to which another file belongs (e.g. a folder)
include_all_children: boolean
Flag to control whether or not to include all children of filtered files in the embedding search.
tags_v2: object
A set of tags to limit the search to. Use this instead of tags
, which is deprecated.
include_tags: boolean
Flag to control whether or not to include tags for each chunk in the response.
include_vectors: boolean
Flag to control whether or not to include embedding vectors in the response.
include_raw_file: boolean
Flag to control whether or not to include a signed URL to the raw file containing each chunk in the response.
hybrid_search: boolean
Flag to control whether or not to perform hybrid search.
hybrid_search_tuning_parameters: HybridSearchTuningParamsNullable
media_type: FileContentTypesNullable
Used to filter the kind of files (e.g. TEXT
or IMAGE
) over which to perform the search. Also plays a role in determining what embedding model is used to embed the query. If IMAGE
is chosen as the media type, then the embedding model used will be an embedding model that is not text-only, regardless of what value is passed for embedding_model
.
embedding_model: EmbeddingGeneratorsNullable
🔄 Return
🌐 Endpoint
/embeddings
POST
carbon.embeddings.getEmbeddingsAndChunks
Retrieve Embeddings And Content
🛠️ Usage
const getEmbeddingsAndChunksResponse =
await carbon.embeddings.getEmbeddingsAndChunks({
order_by: "created_at",
order_dir: "desc",
filters: {
user_file_id: 1,
embedding_model: "OPENAI",
},
include_vectors: false,
});
⚙️ Parameters
filters: EmbeddingsAndChunksFilters
pagination: Pagination
order_by: EmbeddingsAndChunksOrderByColumns
order_dir: OrderDir
include_vectors: boolean
🔄 Return
🌐 Endpoint
/text_chunks
POST
carbon.embeddings.uploadChunksAndEmbeddings
Upload Chunks And Embeddings
🛠️ Usage
const uploadChunksAndEmbeddingsResponse =
await carbon.embeddings.uploadChunksAndEmbeddings({
embedding_model: "OPENAI",
chunks_and_embeddings: [
{
file_id: 1,
chunks_and_embeddings: [
{
chunk_number: 1,
chunk: "chunk_example",
},
],
},
],
overwrite_existing: false,
chunks_only: false,
});
⚙️ Parameters
embedding_model: EmbeddingGenerators
chunks_and_embeddings: SingleChunksAndEmbeddingsUploadInput
[]
overwrite_existing: boolean
chunks_only: boolean
custom_credentials: { [key: string]: object; }
🔄 Return
🌐 Endpoint
/upload_chunks_and_embeddings
POST
carbon.files.createUserFileTags
A tag is a key-value pair that can be added to a file. This pair can then be used for searches (e.g. embedding searches) in order to narrow down the scope of the search. A file can have any number of tags. The following are reserved keys that cannot be used:
- db_embedding_id
- organization_id
- user_id
- organization_user_file_id
Carbon currently supports two data types for tag values - string
and list<string>
.
Keys can only be string
. If values other than string
and list<string>
are used,
they're automatically converted to strings (e.g. 4 will become "4").
🛠️ Usage
const createUserFileTagsResponse = await carbon.files.createUserFileTags({
tags: {
key: "string_example",
},
organization_user_file_id: 1,
});
⚙️ Parameters
tags: Record<string, Tags1
>
organization_user_file_id: number
🔄 Return
🌐 Endpoint
/create_user_file_tags
POST
carbon.files.delete
Delete File Endpoint
🛠️ Usage
const deleteResponse = await carbon.files.delete({
fileId: 1,
});
⚙️ Parameters
fileId: number
🔄 Return
🌐 Endpoint
/deletefile/{file_id}
DELETE
carbon.files.deleteFileTags
Delete File Tags
🛠️ Usage
const deleteFileTagsResponse = await carbon.files.deleteFileTags({
tags: ["tags_example"],
organization_user_file_id: 1,
});
⚙️ Parameters
tags: string
[]
organization_user_file_id: number
🔄 Return
🌐 Endpoint
/delete_user_file_tags
POST
carbon.files.deleteMany
Delete Files Endpoint
🛠️ Usage
const deleteManyResponse = await carbon.files.deleteMany({
delete_non_synced_only: false,
send_webhook: false,
delete_child_files: false,
});
⚙️ Parameters
file_ids: number
[]
sync_statuses: ExternalFileSyncStatuses
[]
delete_non_synced_only: boolean
send_webhook: boolean
delete_child_files: boolean
🔄 Return
🌐 Endpoint
/delete_files
POST
carbon.files.deleteV2
Delete Files V2 Endpoint
🛠️ Usage
const deleteV2Response = await carbon.files.deleteV2({
send_webhook: false,
});
⚙️ Parameters
filters: OrganizationUserFilesToSyncFilters
send_webhook: boolean
🔄 Return
🌐 Endpoint
/delete_files_v2
POST
carbon.files.getParsedFile
This route is deprecated. Use /user_files_v2
instead.
🛠️ Usage
const getParsedFileResponse = await carbon.files.getParsedFile({
fileId: 1,
});
⚙️ Parameters
fileId: number
🔄 Return
🌐 Endpoint
/parsed_file/{file_id}
GET
carbon.files.getRawFile
This route is deprecated. Use /user_files_v2
instead.
🛠️ Usage
const getRawFileResponse = await carbon.files.getRawFile({
fileId: 1,
});
⚙️ Parameters
fileId: number
🔄 Return
🌐 Endpoint
/raw_file/{file_id}
GET
carbon.files.queryUserFiles
For pre-filtering documents, using tags_v2
is preferred to using tags
(which is now deprecated). If both tags_v2
and tags
are specified, tags
is ignored. tags_v2
enables
building complex filters through the use of "AND", "OR", and negation logic. Take the below input as an example:
{
"OR": [
{
"key": "subject",
"value": "holy-bible",
"negate": false
},
{
"key": "person-of-interest",
"value": "jesus christ",
"negate": false
},
{
"key": "genre",
"value": "religion",
"negate": true
}
{
"AND": [
{
"key": "subject",
"value": "tao-te-ching",
"negate": false
},
{
"key": "author",
"value": "lao-tzu",
"negate": false
}
]
}
]
}
In this case, files will be filtered such that:
- "subject" = "holy-bible" OR
- "person-of-interest" = "jesus christ" OR
- "genre" != "religion" OR
- "subject" = "tao-te-ching" AND "author" = "lao-tzu"
Note that the top level of the query must be either an "OR" or "AND" array. Currently, nesting is limited to 3. For tag blocks (those with "key", "value", and "negate" keys), the following typing rules apply:
- "key" isn't optional and must be a
string
- "value" isn't optional and can be
any
or list[any
] - "negate" is optional and must be
true
orfalse
. If present andtrue
, then the filter block is negated in the resulting query. It isfalse
by default.
🛠️ Usage
const queryUserFilesResponse = await carbon.files.queryUserFiles({
order_by: "created_at",
order_dir: "desc",
});
⚙️ Parameters
pagination: Pagination
order_by: OrganizationUserFilesToSyncOrderByTypes
order_dir: OrderDir
filters: OrganizationUserFilesToSyncFilters
include_raw_file: boolean
include_parsed_text_file: boolean
include_additional_files: boolean
🔄 Return
🌐 Endpoint
/user_files_v2
POST
carbon.files.queryUserFilesDeprecated
This route is deprecated. Use /user_files_v2
instead.
🛠️ Usage
const queryUserFilesDeprecatedResponse =
await carbon.files.queryUserFilesDeprecated({
order_by: "created_at",
order_dir: "desc",
});
⚙️ Parameters
pagination: Pagination
order_by: OrganizationUserFilesToSyncOrderByTypes
order_dir: OrderDir
filters: OrganizationUserFilesToSyncFilters
include_raw_file: boolean
include_parsed_text_file: boolean
include_additional_files: boolean
🔄 Return
🌐 Endpoint
/user_files
POST
carbon.files.resync
Resync File
🛠️ Usage
const resyncResponse = await carbon.files.resync({
file_id: 1,
force_embedding_generation: false,
});
⚙️ Parameters
file_id: number
chunk_size: number
chunk_overlap: number
force_embedding_generation: boolean
🔄 Return
🌐 Endpoint
/resync_file
POST
carbon.files.upload
This endpoint is used to directly upload local files to Carbon. The POST
request should be a multipart form request.
Note that the set_page_as_boundary
query parameter is applicable only to PDFs for now. When this value is set,
PDF chunks are at most one page long. Additional information can be retrieved for each chunk, however, namely the coordinates
of the bounding box around the chunk (this can be used for things like text highlighting). Following is a description
of all possible query parameters:
chunk_size
: the chunk size (in tokens) applied when splitting the documentchunk_overlap
: the chunk overlap (in tokens) applied when splitting the documentskip_embedding_generation
: whether or not to skip the generation of chunks and embeddingsset_page_as_boundary
: described aboveembedding_model
: the model used to generate embeddings for the document chunksuse_ocr
: whether or not to use OCR as a preprocessing step prior to generating chunks (only valid for PDFs currently)generate_sparse_vectors
: whether or not to generate sparse vectors for the file. Required for hybrid search.prepend_filename_to_chunks
: whether or not to prepend the filename to the chunk text
Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI's
multimodal model; for text, we support OpenAI's text-embedding-ada-002
and Cohere's embed-multilingual-v3.0.
The model can be specified via the embedding_model
parameter (in the POST body for /embeddings
, and a query
parameter in /uploadfile
). If no model is supplied, the text-embedding-ada-002
is used by default. When performing
embedding queries, embeddings from files that used the specified model will be considered in the query.
For example, if files A and B have embeddings generated with OPENAI
, and files C and D have embeddings generated with
COHERE_MULTILINGUAL_V3
, then by default, queries will only consider files A and B. If COHERE_MULTILINGUAL_V3
is
specified as the embedding_model
in /embeddings
, then only files C and D will be considered. Make sure that
the set of all files you want considered for a query have embeddings generated via the same model. For now, do not
set VERTEX_MULTIMODAL
as an embedding_model
. This model is used automatically by Carbon when it detects an image file.
🛠️ Usage
const uploadResponse = await carbon.files.upload({
skipEmbeddingGeneration: false,
setPageAsBoundary: false,
embeddingModel: "OPENAI",
useOcr: false,
generateSparseVectors: false,
prependFilenameToChunks: false,
parsePdfTablesWithOcr: false,
detectAudioLanguage: false,
file: fs.readFileSync("/path/to/file"),
});
⚙️ Parameters
file: Uint8Array | File | buffer.File
chunkSize: number
Chunk size in tiktoken tokens to be used when processing file.
chunkOverlap: number
Chunk overlap in tiktoken tokens to be used when processing file.
skipEmbeddingGeneration: boolean
Flag to control whether or not embeddings should be generated and stored when processing file.
setPageAsBoundary: boolean
Flag to control whether or not to set the a page's worth of content as the maximum amount of content that can appear in a chunk. Only valid for PDFs. See description route description for more information.
embeddingModel: TextEmbeddingGenerators
Embedding model that will be used to embed file chunks.
useOcr: boolean
Whether or not to use OCR when processing files. Only valid for PDFs. Useful for documents with tables, images, and/or scanned text.
generateSparseVectors: boolean
Whether or not to generate sparse vectors for the file. This is required for the file to be a candidate for hybrid search.
prependFilenameToChunks: boolean
Whether or not to prepend the file's name to chunks.
maxItemsPerChunk: number
Number of objects per chunk. For csv, tsv, xlsx, and json files only.
parsePdfTablesWithOcr: boolean
Whether to use rich table parsing when use_ocr
is enabled.
detectAudioLanguage: boolean
Whether to automatically detect the language of the uploaded audio file.
🔄 Return
🌐 Endpoint
/uploadfile
POST
carbon.files.uploadFromUrl
Create Upload File From Url
🛠️ Usage
const uploadFromUrlResponse = await carbon.files.uploadFromUrl({
url: "url_example",
skip_embedding_generation: false,
set_page_as_boundary: false,
embedding_model: "OPENAI",
generate_sparse_vectors: false,
use_textract: false,
prepend_filename_to_chunks: false,
parse_pdf_tables_with_ocr: false,
detect_audio_language: false,
});
⚙️ Parameters
url: string
file_name: string
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
set_page_as_boundary: boolean
embedding_model: EmbeddingGenerators
generate_sparse_vectors: boolean
use_textract: boolean
prepend_filename_to_chunks: boolean
max_items_per_chunk: number
Number of objects per chunk. For csv, tsv, xlsx, and json files only.
parse_pdf_tables_with_ocr: boolean
detect_audio_language: boolean
🔄 Return
🌐 Endpoint
/upload_file_from_url
POST
carbon.files.uploadText
Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI's
multimodal model; for text, we support OpenAI's text-embedding-ada-002
and Cohere's embed-multilingual-v3.0.
The model can be specified via the embedding_model
parameter (in the POST body for /embeddings
, and a query
parameter in /uploadfile
). If no model is supplied, the text-embedding-ada-002
is used by default. When performing
embedding queries, embeddings from files that used the specified model will be considered in the query.
For example, if files A and B have embeddings generated with OPENAI
, and files C and D have embeddings generated with
COHERE_MULTILINGUAL_V3
, then by default, queries will only consider files A and B. If COHERE_MULTILINGUAL_V3
is
specified as the embedding_model
in /embeddings
, then only files C and D will be considered. Make sure that
the set of all files you want considered for a query have embeddings generated via the same model. For now, do not
set VERTEX_MULTIMODAL
as an embedding_model
. This model is used automatically by Carbon when it detects an image file.
🛠️ Usage
const uploadTextResponse = await carbon.files.uploadText({
contents: "contents_example",
skip_embedding_generation: false,
embedding_model: "OPENAI",
generate_sparse_vectors: false,
});
⚙️ Parameters
contents: string
name: string
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
overwrite_file_id: number
embedding_model: EmbeddingGeneratorsNullable
generate_sparse_vectors: boolean
🔄 Return
🌐 Endpoint
/upload_text
POST
carbon.health.check
Health
🛠️ Usage
const checkResponse = await carbon.health.check();
🌐 Endpoint
/health
GET
carbon.integrations.connectDataSource
Connect Data Source
🛠️ Usage
const connectDataSourceResponse = await carbon.integrations.connectDataSource({
authentication: {
source: "GOOGLE_DRIVE",
access_token: "access_token_example",
},
});
⚙️ Parameters
authentication: AuthenticationProperty
sync_options: SyncOptions
🔄 Return
🌐 Endpoint
/integrations/connect
POST
carbon.integrations.connectFreshdesk
Refer this article to obtain an API key https://support.freshdesk.com/en/support/solutions/articles/215517. Make sure that your API key has the permission to read solutions from your account and you are on a paid plan. Once you have an API key, you can make a request to this endpoint along with your freshdesk domain. This will trigger an automatic sync of the articles in your "solutions" tab. Additional parameters below can be used to associate data with the synced articles or modify the sync behavior.
🛠️ Usage
const connectFreshdeskResponse = await carbon.integrations.connectFreshdesk({
domain: "domain_example",
api_key: "api_key_example",
chunk_size: 1500,
chunk_overlap: 20,
skip_embedding_generation: false,
embedding_model: "OPENAI",
generate_sparse_vectors: false,
prepend_filename_to_chunks: false,
sync_files_on_connection: true,
sync_source_items: true,
});
⚙️ Parameters
domain: string
api_key: string
tags: object
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
embedding_model: EmbeddingGeneratorsNullable
generate_sparse_vectors: boolean
prepend_filename_to_chunks: boolean
sync_files_on_connection: boolean
request_id: string
sync_source_items: boolean
Enabling this flag will fetch all available content from the source to be listed via list items endpoint
file_sync_config: HelpdeskFileSyncConfigNullable
🔄 Return
🌐 Endpoint
/integrations/freshdesk
POST
carbon.integrations.connectGitbook
You will need an access token to connect your Gitbook account. Note that the permissions will be defined by the user generating access token so make sure you have the permission to access spaces you will be syncing. Refer this article for more details https://developer.gitbook.com/gitbook-api/authentication. Additionally, you need to specify the name of organization you will be syncing data from.
🛠️ Usage
const connectGitbookResponse = await carbon.integrations.connectGitbook({
organization: "organization_example",
access_token: "access_token_example",
chunk_size: 1500,
chunk_overlap: 20,
skip_embedding_generation: false,
embedding_model: "OPENAI",
generate_sparse_vectors: false,
prepend_filename_to_chunks: false,
sync_files_on_connection: true,
sync_source_items: true,
});
⚙️ Parameters
organization: string
access_token: string
tags: object
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
embedding_model: EmbeddingGenerators
generate_sparse_vectors: boolean
prepend_filename_to_chunks: boolean
sync_files_on_connection: boolean
request_id: string
sync_source_items: boolean
Enabling this flag will fetch all available content from the source to be listed via list items endpoint
🔄 Return
🌐 Endpoint
/integrations/gitbook
POST
carbon.integrations.createAwsIamUser
Create a new IAM user with permissions to:
🛠️ Usage
const createAwsIamUserResponse = await carbon.integrations.createAwsIamUser({
access_key: "access_key_example",
access_key_secret: "access_key_secret_example",
sync_source_items: true,
});
⚙️ Parameters
access_key: string
access_key_secret: string
sync_source_items: boolean
Enabling this flag will fetch all available content from the source to be listed via list items endpoint
🔄 Return
🌐 Endpoint
/integrations/s3
POST
carbon.integrations.getOauthUrl
This endpoint can be used to generate the following URLs
- An OAuth URL for OAuth based connectors
- A file syncing URL which skips the OAuth flow if the user already has a valid access token and takes them to the success state.
🛠️ Usage
const getOauthUrlResponse = await carbon.integrations.getOauthUrl({
service: "GOOGLE_DRIVE",
chunk_size: 1500,
chunk_overlap: 20,
skip_embedding_generation: false,
embedding_model: "OPENAI",
generate_sparse_vectors: false,
prepend_filename_to_chunks: false,
sync_files_on_connection: true,
set_page_as_boundary: false,
connecting_new_account: false,
request_id: "26453c8f-69ab-4eb3-bc25-0ca995b118a0",
use_ocr: false,
parse_pdf_tables_with_ocr: false,
enable_file_picker: true,
sync_source_items: true,
incremental_sync: false,
});
⚙️ Parameters
service: DataSourceType
tags: any
scope: string
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
embedding_model: EmbeddingGeneratorsNullable
zendesk_subdomain: string
microsoft_tenant: string
sharepoint_site_name: string
confluence_subdomain: string
generate_sparse_vectors: boolean
prepend_filename_to_chunks: boolean
max_items_per_chunk: number
Number of objects per chunk. For csv, tsv, xlsx, and json files only.
salesforce_domain: string
sync_files_on_connection: boolean
Used to specify whether Carbon should attempt to sync all your files automatically when authorization is complete. This is only supported for a subset of connectors and will be ignored for the rest. Supported connectors: Intercom, Zendesk, Gitbook, Confluence, Salesforce, Freshdesk
set_page_as_boundary: boolean
data_source_id: number
Used to specify a data source to sync from if you have multiple connected. It can be skipped if you only have one data source of that type connected or are connecting a new account.
connecting_new_account: boolean
Used to connect a new data source. If not specified, we will attempt to create a sync URL for an existing data source based on type and ID.
request_id: string
This request id will be added to all files that get synced using the generated OAuth URL
use_ocr: boolean
Enable OCR for files that support it. Supported formats: pdf
parse_pdf_tables_with_ocr: boolean
enable_file_picker: boolean
Enable integration\'s file picker for sources that support it. Supported sources: SHAREPOINT, DROPBOX, BOX, ONEDRIVE, GOOGLE_DRIVE
sync_source_items: boolean
Enabling this flag will fetch all available content from the source to be listed via list items endpoint
incremental_sync: boolean
Only sync files if they have not already been synced or if the embedding properties have changed. This flag is currently supported by ONEDRIVE, GOOGLE_DRIVE, BOX, DROPBOX. It will be ignored for other data sources.
file_sync_config: HelpdeskFileSyncConfigNullable
🔄 Return
🌐 Endpoint
/integrations/oauth_url
POST
carbon.integrations.listConfluencePages
To begin listing a user's Confluence pages, at least a data_source_id
of a connected
Confluence account must be specified. This base request returns a list of root pages for
every space the user has access to in a Confluence instance. To traverse further down
the user's page directory, additional requests to this endpoint can be made with the same
data_source_id
and with parent_id
set to the id of page from a previous request. For
convenience, the has_children
property in each directory item in the response list will
flag which pages will return non-empty lists of pages when set as the parent_id
.
🛠️ Usage
const listConfluencePagesResponse =
await carbon.integrations.listConfluencePages({
data_source_id: 1,
});
⚙️ Parameters
data_source_id: number
parent_id: string
🔄 Return
🌐 Endpoint
/integrations/confluence/list
POST
carbon.integrations.listDataSourceItems
List Data Source Items
🛠️ Usage
const listDataSourceItemsResponse =
await carbon.integrations.listDataSourceItems({
data_source_id: 1,
order_by: "name",
order_dir: "asc",
});
⚙️ Parameters
data_source_id: number
parent_id: string
filters: ListItemsFiltersNullable
pagination: Pagination
order_by: ExternalSourceItemsOrderBy
order_dir: OrderDirV2
🔄 Return
🌐 Endpoint
/integrations/items/list
POST
carbon.integrations.listFolders
After connecting your Outlook account, you can use this endpoint to list all of your folders on outlook. This includes both system folders like "inbox" and user created folders.
🛠️ Usage
const listFoldersResponse = await carbon.integrations.listFolders({});
⚙️ Parameters
dataSourceId: number
🌐 Endpoint
/integrations/outlook/user_folders
GET
carbon.integrations.listGitbookSpaces
After connecting your Gitbook account, you can use this endpoint to list all of your spaces under current organization.
🛠️ Usage
const listGitbookSpacesResponse = await carbon.integrations.listGitbookSpaces({
dataSourceId: 1,
});
⚙️ Parameters
dataSourceId: number
🌐 Endpoint
/integrations/gitbook/spaces
GET
carbon.integrations.listLabels
After connecting your Gmail account, you can use this endpoint to list all of your labels. User created labels will have the type "user" and Gmail's default labels will have the type "system"
🛠️ Usage
const listLabelsResponse = await carbon.integrations.listLabels({});
⚙️ Parameters
dataSourceId: number
🌐 Endpoint
/integrations/gmail/user_labels
GET
carbon.integrations.listOutlookCategories
After connecting your Outlook account, you can use this endpoint to list all of your categories on outlook. We currently support listing up to 250 categories.
🛠️ Usage
const listOutlookCategoriesResponse =
await carbon.integrations.listOutlookCategories({});
⚙️ Parameters
dataSourceId: number
🌐 Endpoint
/integrations/outlook/user_categories
GET
carbon.integrations.listRepos
Once you have connected your GitHub account, you can use this endpoint to list the repositories your account has access to. You can use a data source ID or username to fetch from a specific account.
🛠️ Usage
const listReposResponse = await carbon.integrations.listRepos({
perPage: 30,
page: 1,
});
⚙️ Parameters
perPage: number
page: number
dataSourceId: number
🌐 Endpoint
/integrations/github/repos
GET
carbon.integrations.syncConfluence
After listing pages in a user's Confluence account, the set of selected page ids
and the
connected account's data_source_id
can be passed into this endpoint to sync them into
Carbon. Additional parameters listed below can be used to associate data to the selected
pages or alter the behavior of the sync.
🛠️ Usage
const syncConfluenceResponse = await carbon.integrations.syncConfluence({
data_source_id: 1,
ids: ["string_example"],
chunk_size: 1500,
chunk_overlap: 20,
skip_embedding_generation: false,
embedding_model: "OPENAI",
generate_sparse_vectors: false,
prepend_filename_to_chunks: false,
set_page_as_boundary: false,
request_id: "3d0330f2-f2e4-482b-9ca7-91d3a1bbbd18",
use_ocr: false,
parse_pdf_tables_with_ocr: false,
incremental_sync: false,
});
⚙️ Parameters
data_source_id: number
ids: IdsProperty
tags: object
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
embedding_model: EmbeddingGeneratorsNullable
generate_sparse_vectors: boolean
prepend_filename_to_chunks: boolean
max_items_per_chunk: number
Number of objects per chunk. For csv, tsv, xlsx, and json files only.
set_page_as_boundary: boolean
request_id: string
use_ocr: boolean
parse_pdf_tables_with_ocr: boolean
incremental_sync: boolean
Only sync files if they have not already been synced or if the embedding properties have changed. This flag is currently supported by ONEDRIVE, GOOGLE_DRIVE, BOX, DROPBOX. It will be ignored for other data sources.
file_sync_config: HelpdeskGlobalFileSyncConfigNullable
🔄 Return
🌐 Endpoint
/integrations/confluence/sync
POST
carbon.integrations.syncDataSourceItems
Sync Data Source Items
🛠️ Usage
const syncDataSourceItemsResponse =
await carbon.integrations.syncDataSourceItems({
data_source_id: 1,
});
⚙️ Parameters
data_source_id: number
🔄 Return
🌐 Endpoint
/integrations/items/sync
POST
carbon.integrations.syncFiles
After listing files and folders via /integrations/items/sync and integrations/items/list, use the selected items' external ids as the ids in this endpoint to sync them into Carbon. Sharepoint items take an additional parameter root_id, which identifies the drive the file or folder is in and is stored in root_external_id. That additional paramter is optional and excluding it will tell the sync to assume the item is stored in the default Documents drive.
🛠️ Usage
const syncFilesResponse = await carbon.integrations.syncFiles({
data_source_id: 1,
ids: ["string_example"],
chunk_size: 1500,
chunk_overlap: 20,
skip_embedding_generation: false,
embedding_model: "OPENAI",
generate_sparse_vectors: false,
prepend_filename_to_chunks: false,
set_page_as_boundary: false,
request_id: "3d0330f2-f2e4-482b-9ca7-91d3a1bbbd18",
use_ocr: false,
parse_pdf_tables_with_ocr: false,
incremental_sync: false,
});
⚙️ Parameters
data_source_id: number
ids: IdsProperty
tags: object
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
embedding_model: EmbeddingGeneratorsNullable
generate_sparse_vectors: boolean
prepend_filename_to_chunks: boolean
max_items_per_chunk: number
Number of objects per chunk. For csv, tsv, xlsx, and json files only.
set_page_as_boundary: boolean
request_id: string
use_ocr: boolean
parse_pdf_tables_with_ocr: boolean
incremental_sync: boolean
Only sync files if they have not already been synced or if the embedding properties have changed. This flag is currently supported by ONEDRIVE, GOOGLE_DRIVE, BOX, DROPBOX. It will be ignored for other data sources.
file_sync_config: HelpdeskGlobalFileSyncConfigNullable
🔄 Return
🌐 Endpoint
/integrations/files/sync
POST
carbon.integrations.syncGitHub
Refer this article to obtain an access token https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens. Make sure that your access token has the permission to read content from your desired repos. Note that if your access token expires you will need to manually update it through this endpoint.
🛠️ Usage
const syncGitHubResponse = await carbon.integrations.syncGitHub({
username: "username_example",
access_token: "access_token_example",
sync_source_items: false,
});