npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

mongodb-rag-ingest

v0.2.0

Published

MongoDB Ingest CLI for the MongoDB Chatbot Framework.

Downloads

541

Readme

MongoDB Ingest CLI

The MongoDB Ingest CLI ingest content into a MongoDB collection that you can use for retrieval augmented generation (RAG) applications.

You can use the Ingest CLI to ingest data into RAG applications built with the MongoDB Chatbot Framework.

Documentation

To learn more about the MongoDB Ingest CLI, refer to the documentation.

System Overview

flowchart
    B[Pages command]
    C[Embed command]
    B --> D(fetch pages from source)
    D --> E(store pages in Atlas)

    C --> F(fetch pages from Atlas)
    F -- for pages marked\n 'created' or 'updated' --> G(make embeddings)
    G --> H(store embeddings in Atlas)
    F -- for pages marked 'deleted' --> I(delete embeddings\nfor page)

The ingest tool has two major commands: pages and embed. These commands represent the two stages of ingesting content.

Stage 1: Pages

The pages command fetches pages from data sources and stores them in Atlas with a last updated timestamp. A "page" is some text with a URL. A data source is an arbitrary collection of pages. You can create a new data source by implementing DataSource.

For each given data source, the pages command compares the pages with those already stored in the database and only updates those that are new, have changed, or have been deleted. The command does not actually delete documents from the database, but instead marks a page as "deleted", so that the next stage knows to delete the corresponding embeddings.

Stage 2: Embed

The embed command creates embeddings for pages that have been updated since a given date. For pages that have been deleted, the command deletes any corresponding embeddings in the database. If a page is new or has been updated, the command regenerates the corresponding embeddings for that page.

Configuration

To configure the ingest tool, provide an ingest.config.js file. The default export of this file must be a Config object. See Config.ts for details.

Development

Build & Run

Set up the project monorepo. Refer to the Contributor Guide for more info on monorepo setup.

Make sure you set up the .env files in both the mongodb-rag-ingest and mongodb-rag-core projects.

To use the ingest CLI locally, run:

# See all available commands
node .

# Run specific command
node . <command> <options>

A few things to keep in mind when developing in the mongodb-rag-ingest project:

  1. You must recompile the mongodb-rag-ingest project with npm run build before running it from the CLI for changes to take effect. Therefore, when testing CLI commands locally, it can be convenient to run compilation and the command as a one-liner:

     npm run build && node . <command> <options>
  2. You must also recompile mongodb-rag-core with npm run build every time you make changes to it for the changes to be accessible to mongodb-rag-ingest or any other projects that depend on it.

    cd ../mongodb-rag-core
    npm run build
    cd ../ingest
    # do stuff

Add Commands

Add commands to src/commands/. The CLI automatically picks up any non-test .ts file that default-exports a yargs.CommandModule. See existing commands for example.