tgobi
v0.1.3
Published
Interactive high-dimensional data visualization in the browser.
Readme
tgobi
Interactive high-dimensional data visualization in the browser, inspired by GGobi. Explore data through linked plots, animated tours, clustering, classification, and dimensionality reduction --- all without leaving your browser.

Install
npm install -g tgobiUse a local install when tgobi is part of a project:
npm install tgobi
npm exec tgobinpm install tgobi puts the executable at node_modules/.bin/tgobi for that
project. It does not make tgobi available as a bare shell command unless that
directory is on your PATH. These are equivalent ways to run a local install:
npm exec tgobi
npx tgobi
./node_modules/.bin/tgobiUse a global install when you want tgobi available directly from your shell:
npm install -g tgobi
tgobiCommand Line
The CLI serves the built standalone app and opens it in your browser:
tgobi
tgobi --port 8787
tgobi --host 0.0.0.0 --no-openFrom this repository, build first and run the source checkout directly:
npm run build
node bin/tgobi.js --no-openScreenshots
Load data from disk or start with a bundled sample:

Open the variables panel and add linked plots:

Brush in one plot to highlight the same rows in every linked view, color by a categorical variable, and run a grand tour from the tour panel:

Combine multiple plot types like parallel coordinates and barcharts to explore complex relationships across dimensions. Selections are instantly linked across all views, allowing you to highlight subsets in a categorical barchart and immediately observe their structural distribution in a 2D grand tour or high-dimensional parallel coordinates projection:

Working With Data
tgobi accepts CSV, TSV, JSON, and GGobi-style XML files from the start screen. After loading a file, the schema preview lets you confirm inferred column types before committing the dataset.
The bundled samples are useful for quick checks:
- flea: small categorical dataset for brushing, color, and tour examples.
- olive: regional olive oil measurements.
- places: mixed geographic and numeric data.
- cycle: XML sample for GGobi import coverage.
- large: synthetic large dataset for performance testing.
Using The App
Plots
Add plots with + Plot. Supported plot types:
| Type | Description | |------|-------------| | Scatterplot | Two numeric variables, x-y | | Scatterplot matrix | 2-8 numeric variables, all pairwise scatterplots | | Parallel coordinates | 2+ numeric variables, linked axes | | Dotplot | Single numeric variable, 1D strip | | Barchart | Single variable (categorical or numeric), frequency counts | | Boxplot | Single numeric variable with optional grouping; shows median, quartiles, whiskers, outliers | | Time series | Numeric x-axis, one or more y variables with optional grouping | | Missing pattern | Overview of missingness across all variables |
Multiple plots are linked: selecting, painting, or hovering in one plot highlights the same rows in every other plot.
Brushing and Painting
Use the brush toolbar to select rows:
- Transient: selection disappears when you release the mouse.
- Persistent: each brush stroke paints a group with a distinct color (paint groups 1-8).
The selection toolbar offers:
| Button | Action | |--------|--------| | Exclude | Hide selected rows (shadow/ghost them) | | Include | Restore selected shadowed rows | | Invert | Flip which rows are shadowed | | Isolate | Hide everything except the selected rows | | Restore | Bring all rows back (clear shadow mask) |
Filtering and Excluding Rows
Shadowed (excluded) rows are dimmed in every plot and excluded from all computations --- tour, clustering, classification, and projection skip them. The status bar shows "N of M visible" where N counts only non-shadowed rows.
Common workflow --- exclude a category (e.g. remove one region from the olive oil dataset):
- Add a Barchart of the categorical variable (e.g.
region). - Click the bar for the category you want to remove. This selects all rows in that category (linked across every plot).
- Click Exclude in the selection toolbar. The rows are now shadowed.
- Repeat for any other categories you want to exclude.
- To bring rows back: select them and click Include, or click Restore to un-shadow everything.
Alternative --- keep only a subset:
- Select the rows you want to keep (brush in a scatterplot, click bars in a barchart, or drag a range in a boxplot).
- Click Isolate ("Exclude all but selected"). Everything else is shadowed.
Exclude by boxplot range:
- Add a Boxplot of the numeric variable.
- Click the box body to select all rows in that group, or drag vertically on the boxplot to select a value range.
- Click Exclude to shadow the selected rows.
Coloring
The color toolbar controls how points are colored:
- Fixed: all points in one color.
- Paint: color by painted group (persistent brushing).
- By variable: color by a data column. Categorical variables pair well with
tableau10; numeric variables support sequential or diverging scales.
Identify Tool
Switch to the Identify tool to hover over points and see their row label. Click to pin a label; click again to unpin. Set the label variable in the identify toolbar.
Keyboard Shortcuts
Press ? in the app to see the shortcut reference.
| Key | Action | |-----|--------| | B | Switch to brush tool | | I | Switch to identify tool | | T | Toggle transient / persistent brush mode | | E | Exclude selected rows | | R | Restore all excluded rows | | Space | Play / pause tour | | Esc | Clear selection or stop tour |
Shortcuts are ignored when focus is in an input, select, or textarea, or when meta/ctrl/alt is held.
Data Export
Click Export CSV in the toolbar to download the current dataset as a CSV file. The export respects the current view:
- Visible only: by default, excluded (shadowed) rows are omitted.
- Paint groups: appends a
_paint_groupcolumn when rows have been painted. - Cluster labels: appends a
_clustercolumn when clustering has been applied.
Edges
Load an edges layer (e.g. a graph or path) alongside your data. Edge visibility, alpha, and color mode are configurable. You can also draw sequential edges that connect rows in dataset order.
Hulls
Toggle convex hulls per paint group or color group to visually enclose clusters in scatterplots.
Right Sidebar Tabs
The right sidebar has four tabs: Tour, Project, Cluster, and Classify.
Tour Tab
Animate projections through high-dimensional space. Requires a scatterplot (2D tour) or dotplot (1D tour) to be open.
Shape:
- 2D (scatter): rotates a 2D projection plane through p-dimensional space.
- 1D (dotplot): rotates a 1D projection direction.
Modes:
| Mode | Description | |------|-------------| | Grand | Randomly walks through all projection planes. Good for overview. | | Projection pursuit | Steers the tour toward projections that optimize an index. | | Manual | Fixes all variables except one, letting you scrub that variable's contribution with a slider. |
Projection pursuit goals:
| Goal | Optimizes | When to use | |------|-----------|-------------| | Holes | 1 - central density | Finding projections with hollow structure (clusters on the rim) | | Central mass | Central density | Finding projections with dense centers | | LDA | Between-class / within-class variance | Requires 2+ painted groups; finds projections that separate groups | | PCA variance | Total variance in projection | Finds projections that spread data out most | | Kurtosis | Absolute excess kurtosis | Finding heavy-tailed or multi-modal structure |
The variable circle shows each variable's current contribution as a point on a unit circle. Frozen variables hold their direction while others rotate.
Saved views: click Save to bookmark the current projection. Click a saved view to restore it.
Project Tab
Compute a static low-dimensional embedding and add it to the dataset as new columns.
Methods:
| Method | Type | Output | Loadings | |--------|------|--------|----------| | PCA | Linear | Orthogonal components maximizing variance | Yes (eigenvectors) | | MDS | Distance-based | Preserves pairwise distances | Permutation importance | | ICA | Linear | Statistically independent components | Yes (mixing matrix) | | t-SNE | Nonlinear | Preserves local neighborhoods | Permutation importance | | UMAP | Nonlinear | Preserves local+global structure | Permutation importance |
Controls:
- Method: choose the algorithm.
- Dims: number of output dimensions (2+).
- Variables: check which numeric columns to include.
- Method-specific parameters (perplexity/iterations for t-SNE, neighbors/min dist for UMAP).
After computing:
- X / Y: pick which dimensions to plot.
- Add to data: materializes the embedding as new columns (e.g.
PCA.1,PCA.2) and opens a scatterplot. - Clear: resets the projection.
Component information:
For PCA and ICA, the panel displays a loadings table showing how much each
original variable contributes to each component. Headers are labeled PC1,
PC2, ... (PCA) or IC1, IC2, ... (ICA). Values with |loading| > 0.5 are
highlighted. A cumulative variance row (Cum %) shows running explained
variance for PCA.
For MDS, t-SNE, and UMAP, a variable importance table ranks variables by how much the embedding changes when that variable is permuted (permutation importance, 3 repetitions). This identifies which variables most influence the nonlinear structure.
See Methods Guide for the mathematical details.
Cluster Tab
Assign cluster labels to rows and paint them with distinct colors.
Methods:
| Method | Type | Key parameter | When to use | |--------|------|---------------|-------------| | K-Means | Fixed k | k | Known number of clusters, spherical clusters | | Hierarchical | Fixed k | k, linkage | Small datasets, dendrogram-style | | DBSCAN | Density-based | eps, minPts | Arbitrary shapes, noise detection | | OPTICS | Density-based | eps, minPts, xi | Variable-density clusters | | X-Means | Auto k | kMax | Unknown number of clusters (uses BIC) |
Workflow:
- Check the numeric variables to cluster on.
- Set method and parameters.
- Click Compute.
- Click Paint to color rows by cluster assignment.
Linkage options (hierarchical): complete, single, average.
X-Means iterates k = 1..kMax and picks the best k by Bayesian Information Criterion. OPTICS extracts clusters using the xi steepness parameter.
Classify Tab
Build a classifier from painted groups and visualize the decision boundary in any plot --- including animated tours. Inspired by R's classifly package: instead of shading regions, tgobi samples the predictor space on a grid, asks the trained model what it would predict at each grid point, and keeps only those grid points where the prediction changes between neighbors (the neighbor-disagreement rule). Those boundary points are rendered as outline rings, colored by their predicted class.
Methods:
| Method | Key parameter | Description | |--------|---------------|-------------| | KNN | k | k-nearest neighbors with calibrated neighbor-fraction probabilities | | Naive Bayes | - | Gaussian naive Bayes with softmax posterior | | Logistic | lambda, iter | Multinomial logistic regression, L2 regularized | | Random Forest | trees, max depth | Bagged decision trees with per-class vote ratio |
All four methods return calibrated per-class probabilities that drive the Uncertainty filter.
Workflow:
- Brush 2+ groups of points (persistent mode) --- these become the training labels. Alternatively, set Class to a categorical variable in the data.
- Check the numeric variables you want the model to use.
- Boundary mode: choose either 2D slice (grid varies only along the
first 2 selected variables, others held at their training-set medians)
or Full space (grid varies along every predictor). Full-space grids
stay tour-meaningful in any projection but the point count grows as
resolution^p; tgobi caps the total at 200 000 and shows the effective resolution next to the input. - Pick the Grid resolution. The label next to it shows the projected
point count, e.g.
5×5 = 25 ptsor7⁶ = 117 649 pts (capped from 15). - Click Train, then Show to draw the boundary rings.
Uncertainty filter (slider): each boundary point also carries the
classifier's 1 - max(class probability) at that location. Drag the slider
to hide confident points and keep only the uncertain ones. A live
N of M shown counter ticks down as you raise the threshold.
Misclassified training points render as an X-cross over their painted glyph, so you can see at a glance which examples the model disagrees with.
Where boundaries appear:
The boundary is an overlay layer, not synthetic data rows. It draws in scatter and scatterplot-matrix plots whose axes are predictors, and in a running 2D tour over the same (or a superset of the) predictor variables. It does not appear in the missing-pattern view, parallel coordinates, boxplots, or CSV export --- those views see the original data unchanged.
In a tour, the boundary grid is standardized the same way the tour worker standardizes the data and then multiplied by the active basis, so the rings stay aligned with the rotating clusters. Tour-active variables that aren't predictors contribute nothing to the projection (their standardized value is 0).
Train/test split (optional): when enabled, the labeled data is split stratified by class. The diagnostics panel reports test-set accuracy and a 5-fold cross-validation estimate alongside the training-set confusion matrix.
Methods Guide
See docs/methods.md for the mathematical foundations of each algorithm, key equations, and implementation notes.
Embed In React
import { Tgobi } from "tgobi";
import "tgobi/styles.css";
export function MyPage() {
return (
<div style={{ height: "100vh" }}>
<Tgobi />
</div>
);
}You can pass a DataFrame-compatible object as data:
<Tgobi data={myDataFrame} />