npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

docker-storage-gc

v4.1.0

Published

Automatically cleanup unused images based on various cache replacement algorithms

Downloads

264

Readme

Docker storage Garbage Collection

What it does

This node module connects to a Docker socket and deletes the least used images until a threshold of disk usage is hit.

How it works

Layers

The goal of the module is to clean up layers from the disk. However, because layers keep a reference to their parent layer, removal is not always possible. The data structure that emerges from the layers is a tree of layers where each layer points to their parent.

In order to have a single root for all the layer we assume a virtual empty layer that all layers have as a parent and that is never deleted.

Images

When interacting with a Docker daemon we don’t refer directly to layers. Our unit of manipulation is an image, which can be thought of as a named pointer to the combination of a layer and some configuration. For the purposes of the garbage collector we only need to know which layers of our tree are entrypoints for the images. Annotating the above tree with this information we end up with something like this:

Whenever docker is asked to delete an image it will delete the layer the image points to and all its ancestors until it hits a layer that is being used by some other image. From this follows that whenever we have a tree of layers all the leaves are necessarily images, otherwise they would have been deleted.

Leaves are not the only images in the system though, an image can point to a layer that is internal to the tree. In the figure above image B could be the ubuntu base image and H and L two images that were based on the ubuntu image.

Eviction policy

When given a layer tree we can attempt to prune it in order to reclaim some space. In order to decide what to prune we first need to come up with our options and then rank them. At every step, the list of images that can reclaim space are only the images at the leaves. In other words, removing the ubuntu image while keeping another image that is based on ubuntu reclaims zero space.

In short, the algorithm this module is parameterized by the desired space to reclaim and follows this procedure:

reclaimed_space = 0
while reclaimed_space < reclaim_target:
    candidates = layer_tree.get_leaves()

    oldest = candidates.sort_by(‘last_used_time’)[0]
   
    reclaimed_space += calculate_saving(layer_tree, oldest)
    oldest.remove()

In order to calculate the expected saving of removing an image all we need to do is walk up the tree of layers until we either find one that has more than one child (i.e it’s being used by some other image) or find one that is an image itself.

So from the figure above removing L will reclaim space equal to L_space + I_space because F has more than one child. Proceeding to also remove H will reclaim space equal to H_space + F_space because B is itself an image and cannot be removed.

Maintaining last used time

Docker does not track, and therefore does not offer over its API, the last time an image was used. It only knows when an image was created. Using this date for the eviction policy is undesirable because an image created long ago that is being used all the time is very valuable. To remedy this this module subscribes to the docker event stream and listens for containers starting/stopping and other events that reference images. Every time an image is referenced the module updates the metadata in its tree data structure that is then used for the eviction policy.