npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@arcblock/crawler-middleware

v1.4.6

Published

This express middleware provides pre-rendered HTML generated by SnapKit for Blocklets, enabling them to return complete HTML content to web spider. This is essential for SEO and ensuring that search engines can properly index dynamically generated content

Readme

@arcblock/crawler-middleware

This express middleware provides pre-rendered HTML generated by SnapKit for Blocklets, enabling them to return complete HTML content to web spider. This is essential for SEO and ensuring that search engines can properly index dynamically generated content.

How it Works

  1. The middleware intercepts incoming requests.
  2. It checks if the request is from a web spider.
  3. Try to read and return HTML from the local cache (Memory LRU Cache + SQLite).
  4. If the cache is not found, an asynchronous request is made to SnapKit, and the local cache is updated.
  5. The current request does not return the cached content; the next spider visit will hit step 3 and return the cache directly.

How to Verify

  1. Update your browser's User Agent string to include "spider"
  2. Visit a page that has already been crawled by SnapKit.
  3. First Visit (Cache Miss): On your first visit, the cache should be missed. Check the server logs for a "Cache miss" message. and a request has been sent to SnapKit to cache the page.
  4. Second Visit (Cache Hit): Wait a moment and then revisit the same page. The cache should be hit. The server logs should show a "Cache hit" message, and the returned HTML should include the meta tag: <meta name="arcblock-crawler" content="true">.

Usage

import { createSnapshotMiddleware } from '@arcblock/crawler-middleware';

const app = express();
const snapshotMiddleware = createSnapshotMiddleware({
  endpoint: process.env.SNAP_KIT_ENDPOINT,
  accessKey: process.env.SNAP_KIT_ACCESS_KEY,
  allowCrawler: (req) => {
    return req.path === '/';
  },
});

// for all route
app.use(snapshotMiddleware);

// for one route
app.use('/doc', snapshotMiddleware, (req) => {
  /* ... */
});

Options

The options for createSnapshotMiddleware:

{
  /** SnapKit endpoint */
  endpoint: string;
  /** SnapKit access key */
  accessKey: string;
  /** Max cache size for LRU cache */
  cacheMax?: number;
  /** When cache exceeds this time, it will try to fetch and update cache from SnapKit */
  updateInterval?: number;
  /** When failed cache exceeds this time, it will try to fetch and update cache from SnapKit */
  failedUpdateInterval?: number;
  /** Update queue concurrency */
  updatedConcurrency?: number;
  /** Call res.send(html) when cache hit */
  autoReturnHtml?: boolean;
  /** Custom function to determine whether to return cached content */
  allowCrawler?: (req: Request) => boolean;
};

Environment Variables

When using this middleware outside of a Blocklet environment, you need to configure the following environment variables:

  • BLOCKLET_DATA_DIR: (Required) Directory path for storing the sqlite file
  • BLOCKLET_LOG_DIR: (Required) Directory path for storing @blocklet/logger logs
  • BLOCKLET_APP_URL: (Optional) Deployed domain

SQLite

When createSnapshotMiddleware is called, it attempts to create an SQLite database at BLOCKLET_DATA_DIR. This database is used to cache HTML content retrieved from SnapKit. Please ensure that the deployment environment supports SQLite.