@7-docs/cli

v0.5.1

Published

2 years ago

Command-line tool for 7-docs to ingest content

Downloads

123

0High
0Medium
0Low

webpro

cli content docs documentation ingest kb markdown vector database embedding chatgpt openai pinecone supabase algolia algoliasearch

@7-docs/cli

7d is a powerful CLI tool to ingest content and store into a vector database, ready to get queried like you would with ChatGPT.

Uses OpenAI APIs, part of 7-docs.

Impression

CLI

Demo of ingest and query

Content

Status

This is still in the early days, but already offers a variety of features:

Plain text, Markdown and PDF files are supported as input.
Ingest from local files, from HTML pages over HTTP, and from GitHub repositories.
The OpenAI text-embedding-ada-002 model is used to create embeddings.
Pinecone and Supabase are supported for vector storage.
The OpenAI gpt-3.5-turbo model is used for chat completions from the CLI.

See the 7-docs overview for more packages and starter kits.

Prerequisites

Node.js v16+
OpenAI API key
Pinecone or Supabase account, plus API keys
When ingesting lots of files from GitHub, a GitHub token

Installation

You can install 7-docs in two ways:

Global to manage knowledge base(s) from the command line.
Local to manage the knowledge base(s) of a repository.

Global

Use 7d from anywhere to manage your personal knowledge bases:

npm install --global 7-docs

Get an OpenAI API key and make it available as en environment variable:

export OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Alternative storage (in ~/.7d.json) so it's available in your next session too:

7d set OPENAI_API_KEY sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

This works for the other export values shown later as well.

Local

Add 7d to the devDependencies of a repository to manage its knowledge base(s):

npm install --save-dev 7-docs

Store the variables you need in a local .env file in the root of your project:

OPENAI_API_KEY="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

For local installations, use npx 7d (over just 7d).

Now let's choose either Pinecone or Supabase!

Pinecone

Make sure to have a Pinecone account and set PINECONE_API_KEY:

export PINECONE_API_KEY=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

Create or select an index:

7d pinecone-create-index --index [name] --environment [env]

Find the environment in your Pinecone Console (e.g. us-east4-gcp).

Keep working with this index by setting the PINECONE_URL from the Pinecone Console like so:

export PINECONE_URL=xxxxx-xxxxxxx.svc.us-xxxxx-gcp.pinecone.io
export PINECONE_API_KEY=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

Supabase

Make sure to have a Supabase account and set SUPABASE_URL and SUPABASE_API_KEY:

export SUPABASE_URL="https://xxxxxxxxxxxxxxxxxxxx.supabase.co"
export SUPABASE_API_KEY="ey..."

Print the SQL query to enable pgvector and create a table (paste the output in the Supabase web admin):

7d supabase-create-table --namespace my-collection

Ingestion

Let's ingest some text or Markdown files, make sure to adjust the --files pattern to match yours:

7d ingest --files README.md --files 'docs/**/*.md' --namespace my-collection

Note that ingestion from remote resources (GitHub and/or HTTP) has the benefit to link back to the original source when retrieving answers. This is not possible when using local files.

GitHub

Use --source github and file patterns to ingest from a GitHub repo:

7d ingest --source github --repo reactjs/react.dev --files 'src/content/reference/react/*.md' --namespace react

Demo of ingest and query

You can start without it, but once you start fetching lots of files you'll need to set GITHUB_TOKEN:

export GITHUB_TOKEN="ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

HTTP

Crawl content from web pages:

7d ingest --source http --url https://en.wikipedia.org/wiki/Butterfly

PDF

7d supports PDF files as well:

7d ingest --files ./my-article.pdf
7d ingest --source github --repo webpro/webpro.nl --files 'content/*.pdf'

When you see the cannot find module "canvas" error, please see node-canvas#compiling.

Ignore files

To exclude files from ingestion, use the --ignore argument:

7d ingest  --files 'docs/**/*.md' --ignore 'folder/*' --ignore 'dir/file.md' --ignore '**/ignore.md'

Query

Now you can start asking questions about it:

7d Can you please give me a summary?

Other commands

Other convenience flags and commands not mentioned yet.

`--help`

Shows available commands and how they can be used:

7d --help

`openai-list-models`

List available OpenAI models:

7d openai-list-models

`pinecone-clear-namespace`

Clear a single namespace from the current Pinecone index:

7d pinecone-clear-namespace --namespace my-collection

Token Usage

The OpenAI recommendation text-embedding-ada-002 model is used to create embeddings. Ingestion uses some tokens when ingesting lots of files. Queries use only a few tokens (using the gpt-3.5-turbo model by default). See the console for details.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@7-docs/cli

Impression

CLI

Content

Status

Prerequisites

Installation

Global

Local

Pinecone

Supabase

Ingestion

GitHub

HTTP

PDF

Ignore files

Query

Other commands

--help

openai-list-models

pinecone-clear-namespace

Token Usage

`--help`

`openai-list-models`

`pinecone-clear-namespace`