as-str
v0.3.0
Published
Virtual, zero-copy strings for AssemblyScript - slice, trim and search without allocating
Maintainers
Readme
- Installation
- Global Mode (optional)
- Docs
- Usage
- Examples
- Performance
- Architecture
- Contributing
- License
- Contact
Installation
npm install as-strOptionally, for additional performance, also add:
--enable simdGlobal Mode (optional)
By default you import { str } from "as-str" where you use it. If you'd
rather use str without an import in every file, opt into the transform
- it injects the import for you at compile time.
Add the transform to your
asccommand:--transform as-stror in
asconfig.json:{ "options": { "transform": ["as-str"] } }Add the ambient typings so your editor resolves the globals - extend str's preset in
assembly/tsconfig.json:{ "extends": ["assemblyscript/std/assembly.json", "as-str/globals.json"], "include": ["./**/*.ts"] }(For pnpm or other non-hoisted
node_moduleslayouts, drop a copy ofnode_modules/as-str/globals/index.d.tsinto your assembly directory instead - any.d.tsin the project is picked up automatically.)
Now this compiles with no import:
export function method(line: string): string {
return str.slice(line, 0, line.indexOf(" ")).toString();
}The transform only injects names a file actually uses and doesn't already
import, and never touches the library's own sources - so explicit
import { str } from "as-str" keeps working, and you can mix the two
freely.
Docs
Full documentation lives at:
https://docs.jairus.dev/as-str
Usage
A str is a view into an existing string: a reference to the backing
string (so the GC keeps it alive) plus a [start, end) pair of raw byte
pointers. Slicing, trimming, and searching just move the two pointers - no
characters are copied until you materialize a real string with
.toString().
import { str } from "as-str";
const real: string = "GET /index.html 200 1043";
// Wrap once (zero-copy); every op below is a pointer move, not an allocation.
const req: str = str.from(real);
const method = req.slice(0, req.indexOf(" ")); // "GET" - a view
const path = str.slice(real, 4, 15); // "/index.html" - a view
method.toString(); // "GET" - materialized on demand
path.length; // 11
req.includes("200"); // truestr is a class, so it is also the type - annotate with str. It is the
whole API: the instance methods and the static free functions
(str.slice(s, …)) live on it. It carries the full native String
surface (slice, indexOf, trim, split, replace, toUpperCase, …) plus
operators, so it reads like string but allocates only at the boundary where you
ask for an owned string back.
Examples
Slicing and Trimming Without Copying
View-producing methods return another str - no allocation, no copy. The
backing string is shared, and chains of views always anchor to the original.
const v = str.from(" the quick brown fox ").trim(); // view, no copy
v.slice(4, 9).toString(); // "quick"
v.slice(-3).toString(); // "fox"
v.substring(10, 4).toString(); // "quick" (substring swaps args, like String)
v.charAt(0).toString(); // "t"
v.at(-1).toString(); // "x"Tokenizing and Splitting
split yields zero-copy pieces - you only pay for a copy on the pieces you
actually materialize.
const log = "GET /index.html 200 1043";
const f = str.split(log, " "); // str[] - each piece is a view
f[0].toString(); // "GET"
<i32>parseInt(f[2].toString()); // 200
f.length; // 4
// Walk fields without allocating until needed:
const csv = "id,name,email,role";
for (let i = 0, parts = str.split(csv, ","); i < parts.length; i++) {
if (parts[i].equalsString("email")) {
/* found it - still zero-copy */
}
}Searching (String or View Needles)
indexOf, lastIndexOf, includes, startsWith, and endsWith accept a
string or a str as the needle, so you can search a view inside a
view. The scan is SWAR/SIMD accelerated.
const hay = str.from("the quick brown fox");
hay.indexOf("brown"); // 10
hay.includes(str.slice("xxbrownyy", 2, 7)); // true - view needle
hay.startsWith("the"); // true
hay.lastIndexOf("o"); // 17Comparisons and Operators
Operators compare and index content (not identity), across different backing strings.
const a = str.slice("__world", 2); // "world"
const b = str.slice("hello world", 6); // "world", different backing string
a == b; // true (content equality)
a <= b; // true (lexicographic)
str.from("apple") < str.from("banana"); // true
a[0]; // 119 - UTF-16 code unit at 0, no allocation (-1 if out of range)
(a + b).toString(); // "worldworld" - `+` concatenates into a fresh viewEncoding (UTF-8 / UTF-16)
str.UTF8 and str.UTF16 mirror String.UTF8 / String.UTF16,
powered by utf-as and running straight
off the view's pointer range - no intermediate copy. decode returns a
str.
const v = str.slice("xx héllo 世界 xx", 3, 11); // "héllo 世界"
const u8 = str.UTF8.encode(v); // ArrayBuffer of UTF-8 bytes
str.UTF8.byteLength(v); // UTF-8 length, counted in place
str.UTF8.decode(u8); // str round-trip
const u16 = str.UTF16.encode(v); // the view's bytes, copied out
str.UTF16.validate(v); // well-formed UTF-16?The Two Layers
The same operations are reachable two ways:
// 1. Instance methods on a view - the native String method surface.
const v = str.from("hello, world");
v.slice(7).toUpperCase(); // "WORLD"
// 2. Free functions - take a `string` OR a `str` as the first argument.
str.slice("hello, world", 7); // str
str.indexOf("hello, world", "world"); // 7
str.toUpperCase("hello"); // "HELLO" (allocates)Convert a string to a view with str.from(s) (or new str(data, start,
end) from explicit bounds).
str8 - UTF-8 Views (byte-indexed)
str8 is the UTF-8 sibling of str, for text that already lives as UTF-8 bytes
(files, network, WASI, JSON) so you can slice/search/trim it without first
transcoding to UTF-16. It is stored as an ArrayBuffer plus [start, end) byte
pointers and is byte-indexed, following Rust &str / Go string.
import { str8 } from "as-str";
const s = str8.from("héllo, 世界"); // string -> UTF-8 buffer (allocates)
s.length; // 14 (BYTES, like Rust .len() / Go len())
s.codePointCount(); // 9 (Unicode scalars)
s.slice(0, 5).toString(); // "hé" - O(1) zero-copy byte slice
s.indexOf("llo"); // byte offset (Go strings.Index / Rust .find)
s[0]; // 104 - the raw byte (Go s[i])
s.codePointAt(1); // 0xE9 ('é'), decoded from the 2-byte sequence
s.isCharBoundary(1); // false - byte 1 is mid-codepoint
// Wrap existing UTF-8 bytes with no copy:
const view = str8.fromBuffer(someArrayBuffer); // trusts the bytes
str8.fromBufferChecked(buf); // validates UTF-8 firstequals/compareTo/<…>= use byte order, which for UTF-8 is exactly Unicode
codepoint order (matching Rust/Go). Allocating ops (concat, repeat, pad*,
replace, toUpperCase, …) stay in UTF-8 and return a str8; toString()
decodes to a native string. Note slicing cuts raw bytes Go-style and can split
a codepoint - guard with isCharBoundary if you need a valid boundary.
Converting Anything: str(x) / str8(x)
str and str8 are also callable as converters. A view of the same type passes
through, a native string is wrapped/transcoded, and anything else with a
toString() (numbers, the other view, your own classes) is stringified:
str(42).toString(); // "42"
str8("héllo").byteLength; // 6
str(someStr8); // str8 -> str (UTF-16)
v.toStr8(); // str -> str8 (UTF-8)
u.toStr(); // str8 -> str (UTF-16)Performance
📊 Browse the full chart set for this release →
Per-Operation Speedup
Every native String operation vs its str counterpart - native (red) is
the 1× baseline, str (blue) is its speedup:
Throughput
Native vs str SWAR vs str SIMD, in millions of ops/sec:
SWAR and SIMD
The scanning hot paths are accelerated in three tiers, chosen at compile time:
- SIMD - 8 code units per step via
v128, used when--enable simdis set (ASC_FEATURE_SIMD). - SWAR - SIMD-Within-A-Register: 4 code units per step with ordinary
u64math. The default when SIMD is off. - scalar - handles the short sub-block tail.
When SIMD is off the entire v128 branch is dead-code-eliminated, and vice
versa, so you only pay for the tier you build. Wide loads are always bounded by
the remaining length, so they never read past the backing string - no scratch
padding. Both builds are covered by the test suite (run under two modes) and by
differential fuzzing against the native String methods.
Running Benchmarks Locally
npm run bench # microbenchmarks (as-bench)
npm run charts:build # benchmark both builds and render charts to build/charts/
npm run charts # build the charts and serve them locallyArchitecture
A str is a 3-field view - data: string (the GC owner), and start /
end raw byte pointers into that string's UTF-16 data. Every op moves the
pointers; bytes are copied only by toString() (and the allocating ops, which
build their result in one pass).
- Single source of truth. Instance methods and the
str.*free functions both funnel through*Rangestatic helpers that operate on raw(data, start, end)bounds - so a view-producing op is exactly one allocation (the result) and a query is zero. - Accelerated primitives.
findUnit(powersindexOf/includes/lastIndexOf) andcomparecarry SIMD / SWAR / scalar tiers;copyBytesandequalsBytesuse a size-tiered manual loop that beats the bulk-memory intrinsics on small/medium ranges. - Native parity. Semantics mirror AssemblyScript's
String(not JS) and are verified bit-for-bit by differential fuzzing across both SIMD and SWAR builds. - GC-safe. A view keeps its backing string reachable through
data, and views of views anchor to the original - so chains of slices never pin intermediate allocations and the underlying bytes are never collected while a view is alive.
Contributing
Contributions are welcome. To work on str:
npm install
npm test # spec suite (as-test), under simd + nosimd modes
npm run test:fuzz # differential fuzzing vs native String
npm run check # lint + typecheckLicense
This project is distributed under an open source license. Work on this project is done by passion, but if you want to support it financially, you can do so by making a donation to the project's GitHub Sponsors page.
You can view the full license here: License
Contact
Please send all issues to GitHub Issues, and to converse, send me an email at [email protected].
- Email: Send me inquiries, questions, or requests at [email protected]
- GitHub: Visit the official GitHub repository Here
- Website: Visit my official website at jairus.dev
- Discord: Contact me at My Discord or on the AssemblyScript Discord Server
