stepincto-smams
v0.1.1
Published
Types for the Sheetmuse Annotated Metadata Standard
Readme
stepincto-smams
TypeScript types for the SMAMS (Sheetmuse Music Annotation Metadata Standard) - a standardized schema for audio annotation data supporting hierarchical analysis of music, speech, and general audio content.
Overview
SMAMS provides structured containers for temporal annotations with nested namespaces, enabling complex multi-level analysis like:
- Speech: Utterance → Word → Phoneme
- Music: Song → Section → Bar → Beat → Note
- Audio Events: Scene → Event → Sub-event
The schema supports rich metadata tracking, pipeline provenance, and temporal validation, making it ideal for research, production audio analysis, and multi-modal AI systems.
Installation
npm install stepincto-smamsQuick Start
import type { SMAMS, Observation, Namespace } from "stepincto-smams";
import fs from "fs";
// Load SMAMS data
const smamsData = JSON.parse(fs.readFileSync("audio-annotations.smams", "utf8")) as SMAMS;
// Access metadata
console.log(smamsData.metadata.audio.title);
console.log(smamsData.metadata.file.duration);
// Access annotations
const wordNamespace = smamsData.annotations.find(ns => ns.namespace === "word");
if (wordNamespace?.data && Array.isArray(wordNamespace.data)) {
wordNamespace.data.forEach(obs => {
console.log(`${obs.value} at ${obs.interval.time}s`);
});
}Core Type System
Root Container
SMAMS
The main container for all audio annotation data.
interface SMAMS {
schema_version?: string; // Version of SMAMS schema
metadata: SMAMSMetadata; // Comprehensive metadata
annotations: Namespace[]; // Top-level namespaces
}Metadata Types
SMAMSMetadata
Comprehensive metadata container aggregating all information about the file and annotation process.
interface SMAMSMetadata {
file: FileMetadata; // Technical file information
source: SourceMetadata | null; // Where audio was obtained
audio: AudioMetadata | null; // Content metadata
annotation?: AnnotationMetadata; // Pipeline information
sandbox?: SandboxMetadata; // Custom extensions
}FileMetadata
Technical metadata about the audio file.
interface FileMetadata {
id: string; // Unique file identifier
path?: string | null; // File system path
duration?: number | null; // Length in seconds
sample_rate?: number | null; // Hz (must be power of 2)
extension?: string | null; // File format (mp3, wav, etc.)
channels?: number | null; // Number of audio channels
}AudioMetadata
Descriptive information about the audio content.
interface AudioMetadata {
title: string; // Track title
artist?: string | null; // Artist/performer name
album?: string | null; // Album/collection name
genres?: string[] | null; // Musical/content genres
release_date?: string | null; // ISO format (YYYY-MM-DD)
}SourceMetadata
Information about where and when the audio was obtained.
interface SourceMetadata {
id?: string | null; // External platform ID
downloaded_at?: string | null; // Download timestamp
source_id?: string | null; // Platform identifier (youtube, spotify, etc.)
}AnnotationMetadata
Details about the annotation pipeline and processing environment.
interface AnnotationMetadata {
pipeline_name: string; // Name of annotation pipeline
hostname: string; // Processing machine name
created_at: string; // Creation timestamp
steps: AnnotationSource[]; // Pipeline steps (min 1)
}AnnotationSource
Information about a model or tool used in the pipeline.
interface AnnotationSource {
model_id: string; // Model/tool identifier
git_revision?: string | null; // Git commit for reproducibility
task?: string | null; // Analysis task type
}SandboxMetadata
Arbitrary fields for future extensions and experimentation.
interface SandboxMetadata {
fields: { [k: string]: unknown }; // Custom extension fields
}Core Annotation Types
Namespace
Container organizing observations of the same type with optional metadata.
interface Namespace {
namespace: string; // Type identifier (word, note, etc.)
data: ObservationList | Observation[] | Record<string, unknown>[];
metadata?: Record<string, unknown> | ModelMetadata | null;
type?: string | null; // Optional type hint
}Observation
Basic annotation unit containing temporal location, value, and confidence.
interface Observation {
interval: TimeInterval; // Temporal boundaries
value: string | number | boolean; // Observed content
confidence?: number | null; // Model confidence (0.0-1.0)
annotations?: Namespace[] | null; // Nested annotations
}TimeInterval
Represents temporal boundaries with start time and duration.
interface TimeInterval {
time: number; // Start time in seconds
duration: number; // Duration in seconds
}ObservationList
Validated container for time-ordered observations.
interface ObservationList {
observations: Observation[]; // Time-ordered observation list
}Hierarchical Structure Examples
Speech Transcription
// Phrase level
const phraseObs: Observation = {
interval: { time: 10.0, duration: 3.5 },
value: "Hello world how are you",
annotations: [wordNamespace] // Contains word-level observations
};
// Word level (nested under phrase)
const wordObs: Observation = {
interval: { time: 10.0, duration: 0.8 },
value: "Hello",
annotations: [phonemeNamespace] // Contains phoneme-level observations
};Music Analysis
// Song section
const sectionObs: Observation = {
interval: { time: 0.0, duration: 32.0 },
value: "verse_1",
annotations: [barNamespace] // Contains bar-level observations
};
// Musical note
const noteObs: Observation = {
interval: { time: 1.25, duration: 0.5 },
value: "C4",
confidence: 0.92
};Usage Patterns
Type-Safe Data Access
import type { SMAMS, Namespace, Observation } from "stepincto-smams";
function extractWords(smams: SMAMS): string[] {
const wordNamespace = smams.annotations.find(ns => ns.namespace === "word");
if (!wordNamespace?.data || !Array.isArray(wordNamespace.data)) {
return [];
}
return wordNamespace.data
.map(obs => typeof obs.value === 'string' ? obs.value : String(obs.value))
.filter(Boolean);
}Working with Nested Annotations
function getPhraseWords(phraseObs: Observation): Observation[] {
const wordNamespace = phraseObs.annotations?.find(ns => ns.namespace === "word");
if (!wordNamespace?.data || !Array.isArray(wordNamespace.data)) {
return [];
}
return wordNamespace.data;
}Metadata Validation
function validateAudioFile(metadata: SMAMSMetadata): boolean {
return !!(
metadata.file.id &&
metadata.file.duration &&
metadata.file.duration > 0 &&
metadata.audio?.title
);
}Demo Data
The package includes a comprehensive demo SMAMS file for testing and development:
import type { SMAMS } from "stepincto-smams";
import fs from "fs";
import { fileURLToPath } from "url";
import path from "path";
// Load demo data
const demoPath = path.resolve(
path.dirname(fileURLToPath(import.meta.url)),
"node_modules/stepincto-smams/demo.smams"
);
const demoData = JSON.parse(fs.readFileSync(demoPath, "utf8")) as SMAMS;
// Explore demo content
console.log("Title:", demoData.metadata.audio?.title); // "Uptown Girl"
console.log("Artist:", demoData.metadata.audio?.artist); // "Westlife"
console.log("Duration:", demoData.metadata.file.duration); // 187.21 seconds
console.log("Namespaces:", demoData.annotations.length); // 4 namespacesDemo Data Contents
The demo includes:
- Audio metadata: "Uptown Girl" by Westlife from Greatest Hits (2000)
- File metadata: 187 seconds, 48kHz, stereo MP3
- Multiple namespaces:
phrase_word_namespace: Phrase-level transcriptionphrase_aligned_lyrics: Phrase-aligned lyrics with timingword_aligned_lyrics: Word-level lyrics with precise timingnotes: Musical note detection data
Integration with Build Tools
TypeScript Configuration
Ensure your tsconfig.json includes:
{
"compilerOptions": {
"strict": true,
"moduleResolution": "node",
"esModuleInterop": true
}
}Runtime Validation
For runtime type checking, consider using libraries like zod or io-ts:
import { z } from 'zod';
const TimeIntervalSchema = z.object({
time: z.number().min(0),
duration: z.number().positive()
});
const ObservationSchema = z.object({
interval: TimeIntervalSchema,
value: z.union([z.string(), z.number(), z.boolean()]),
confidence: z.number().min(0).max(1).optional()
});Common Use Cases
1. Speech Analysis
- Transcription with word-level timing
- Speaker diarization with confidence scores
- Phoneme-level analysis for pronunciation
2. Music Information Retrieval
- Note transcription with pitch and timing
- Chord progression analysis
- Beat and tempo tracking
3. Audio Event Detection
- Environmental sound classification
- Audio scene analysis
- Multi-label audio tagging
4. Multi-Modal Analysis
- Synchronized audio-visual analysis
- Cross-modal alignment
- Temporal correspondence mapping
Error Handling
function safeLoadSMAMS(filePath: string): SMAMS | null {
try {
const data = JSON.parse(fs.readFileSync(filePath, "utf8"));
// Basic structure validation
if (!data.metadata || !data.annotations || !Array.isArray(data.annotations)) {
console.error("Invalid SMAMS structure");
return null;
}
return data as SMAMS;
} catch (error) {
console.error("Failed to load SMAMS file:", error);
return null;
}
}Related Projects
- SMAMS Python Library: Main SMAMS implementation
- JAMS: JSON Annotated Music Specification (inspiration for SMAMS)
License
ISC
Contributing
For issues, feature requests, or contributions to the SMAMS standard itself, please visit the main repository.
