scalablevq
v1.0.6
Published
TypeScript implementation of ScalableVQ decoding
Readme
ScalableVQ: Scalable Vector Quantization
Project Overview
This project implements scalable vector quantization. It construct hierarchical structures through merging clusters and assign bits to these cluster centers, providing scalability in data representation.
"Scalability" refers to the encoded data being partitioned into a base layer and multiple enhancement layers. Decoding begins from the base layer, with each additional enhancement layer improving the fidelity of the reconstructed data.
💡 This project itself does not perform vector quantization. The accuracy achieved depends on the vector quantization method you choose.
Structure
scalablevq: Core package containing modules for hierarchical clustering, bit assignment, splitting and merging codes, and data encoding/decoding.example.py: Illustrates clustering, encoding, and decoding processes using the library's main functionalities.
Key Features
Encoding
- Hierarchical Clustering: Iteratively merges clusters to form a hierarchical cluster tree.
- Bit assign: Allocates bits to clusters within the hierarchy.
- Splitting/Encoding: Partitions quantized data and codebook into several layers based on the cluster hierarchy and assigned bits.
Decoding
- Merging: Combines the encoded layers to reconstruct quantized data and the associated codebook.
- Decoding: Converts quantized data back to approximate the original data using the reconstructed codebook.
Install
Install from pypi:
pip install scalablevqor from source:
pip install git+https://github.com/yindaheng98/ScalableVQExample Usage
💡 Refer to
example.pyfor a complete example.
Before using this library, perform vector quantization using your preferred method (e.g. K-Means from sklearn):
data = ... # your data, shape [N, C]
from sklearn.cluster import MiniBatchKMeans as KMeans
kmeans = KMeans(n_clusters=1024, init='random', random_state=0, n_init="auto")
quantized_data = torch.from_numpy(kmeans.fit_predict(data.cpu()))
cluster_centers = torch.from_numpy(kmeans.cluster_centers_)Encoding:
from scalablevq import encode_layers
n_bits_proposal = [4, 2, 2, 2, 2]
layers = encode_layers(data, quantized_data, cluster_centers, n_bits_proposal)Decoding:
from scalablevq import decode_layer
context = None
for layer in layers:
decoded, context = decode_layer(layer, context)Visualizztion:
layerized_cluster_centers, cluster_tree = build_layers(cluster_centers, data, quantized_data, final_clusters=2**n_bits_proposal[0])
visualize_layers(cluster_centers.cpu(), layerized_cluster_centers.cpu(), cluster_tree, data=None)💡 Use GPU acceleration if possible, as large-scale clustering can be compute-intensive.
Dependencies
- PyTorch
- NumPy
- tqdm
- scikit-learn (optional, for
example.py) - open3d (optional, for visualization)
