@gitsense/gsc-utils
v0.2.25
Published
Utilities for GitSense Chat (GSC)
Downloads
70
Maintainers
Readme
GitSense Chat Utils (gsc-utils)
A comprehensive JavaScript library providing utilities for processing, manipulating, and managing various elements within GitSense Chat messages, including code blocks, patches, analyzers, and context data.
Code Structure
.
├── LICENSE
├── README.md
├── package-lock.json
├── package.json
├── rollup.config.js
├── src
│ ├── AnalysisBlockUtils.js
│ ├── AnalyzerUtils
│ │ ├── constants.js
│ │ ├── contextMapper.js
│ │ ├── dataValidator.js
│ │ ├── defaultPromptLoader.js
│ │ ├── discovery.js
│ │ ├── index.js
│ │ ├── instructionLoader.js
│ │ ├── management.js
│ │ ├── responseProcessor.js
│ │ ├── saver.js
│ │ └── schemaLoader.js
│ ├── ChatUtils.js
│ ├── CodeBlockUtils
│ │ ├── blockExtractor.js
│ │ ├── blockProcessor.js
│ │ ├── constants.js
│ │ ├── continuationUtils.js
│ │ ├── headerParser.js
│ │ ├── headerUtils.js
│ │ ├── index.js
│ │ ├── lineNumberFormatter.js
│ │ ├── markerRemover.js
│ │ ├── patchIntegration.js
│ │ ├── relationshipUtils.js
│ │ ├── updateCodeBlock.js
│ │ └── uuidUtils.js
│ ├── ConfigUtils.js
│ ├── ContextUtils.js
│ ├── EnvUtils.js
│ ├── GSToolBlockUtils.js
│ ├── GitSenseChatUtils.js
│ ├── JsonUtils.js
│ ├── LLMUtils.js
│ ├── MessageUtils.js
│ ├── PatchUtils
│ │ ├── constants.js
│ │ ├── diagnosticReporter.js
│ │ ├── enhancedPatchProcessor.js
│ │ ├── fuzzyMatcher.js
│ │ ├── hunkCorrector.js
│ │ ├── hunkValidator.js
│ │ ├── index.js
│ │ ├── patchExtractor.js
│ │ ├── patchHeaderFormatter.js
│ │ ├── patchParser.js
│ │ ├── patchProcessor.js
│ │ └── patchVerifier
│ │ ├── constants.js
│ │ ├── detectAndFixOverlappingHunks.js
│ │ ├── detectAndFixRedundantChanges.js
│ │ ├── formatAndAddLineNumbers.js
│ │ ├── index.js
│ │ ├── verifyAndCorrectHunkHeaders.js
│ │ └── verifyAndCorrectLineNumbers.js
│ └── SharedUtils
│ ├── timestampUtils.js
│ └── versionUtils.jsCore Modules
GitSenseChatUtils (Main Interface)
This is the primary class exported by the library, offering a unified API for interacting with the various utility modules. It aggregates functionalities from CodeBlockUtils, PatchUtils, AnalyzerUtils, MessageUtils, LLMUtils, ConfigUtils, and EnvUtils.
CodeBlockUtils
Provides a comprehensive suite of utilities for processing, manipulating, and managing code blocks within GitSense Chat messages. These utilities handle various aspects of the code block lifecycle, including extraction, header parsing, UUID management, patch detection, and continuation handling.
Core Functions
processCodeBlocks(text, options): The primary function for processing code blocks. It identifies code blocks within text using markdown fences, parses their headers or content, detects block types (code, patch, gs-tool, gitsense-search-flow, analysis), and extracts relevant metadata. It returns detailed information about each block, including any warnings encountered during processing.extractCodeBlocks(text, options): A simplified API wrapper aroundprocessCodeBlocksthat returns only the extracted blocks and warnings, suitable for basic code block identification needs.fixTextCodeBlocks(text): Scans text for code blocks with invalid UUIDs and automatically corrects them by generating new valid UUID v4 strings.
Block Extraction
findAllCodeFences(text): Identifies the positions of all opening and closing markdown code fences (```) within a given text.matchFencesAndExtractBlocks(text, openingPositions, closingPositions): Matches identified opening and closing fences to determine complete and incomplete code blocks, providing warnings for potentially malformed structures.extractCodeBlocksWithUUIDs(messageText): Finds all code blocks in a message and attempts to extract theirBlock-UUIDfrom the content.findCodeBlockByUUID(messageText, blockUUID): Locates a specific code block within a message by itsBlock-UUID.
Header Utilities
parseHeader(header, language): Parses the metadata header from a code block's content based on its programming language, extracting fields likeComponent,Block-UUID,Version,Description,Language,Created-at, andAuthors.isValidISOTimestamp(timestamp): Validates if a string conforms to the ISO 8601 timestamp format.getHeaderLineCount(headerText, language): Calculates the total number of lines a code block's header occupies, including comment delimiters and the two mandatory blank lines that follow it.
UUID Utilities
generateUUID(): Generates a new valid RFC 4122 UUID v4 string.validateUUID(uuid): Validates a UUID string and returns an object indicating its validity and a corrected UUID if the original was invalid.
Patch Integration
containsPatch(content): Checks if the provided text content contains at least one patch block.
Relationship & Context Utilities
detectCodeBlockRelationships(content, codeBlockService, options): Detects special relationships between code blocks, such as patches or parent-child links, using an optionalcodeBlockServiceto verify parent UUID existence.detectIncompleteCodeBlock(content, options): Identifies if a message contains an incomplete code block and returns information about the last one found.extractFilePaths(messageText): Extracts file paths from specially formatted lines in a message (e.g.,#### File: \path/to/file.ext``).
Continuation Utilities
extractContinuationInfo(content, partNumber, language, header): Extracts context information from an incomplete block's content to assist in generating prompts for continuation.generateContinuationPrompt(incompleteBlock, isLastPart): Generates a structured prompt for continuing an incomplete code block, including instructions for metadata preservation and format.
Comment-Delimited Header Parsing
parseCommentDelimitedBlocks(input): Parses code blocks where metadata headers are enclosed in language-specific comment delimiters (e.g.,/** ... */,""" ... """) rather than markdown fences.
Marker Removal
removeCodeBlockMarkers(markdownText): Removes custom--- CODE BLOCK START ---and--- CODE BLOCK COMPLETE ---markers surrounding markdown code blocks, but only when both markers are present for a given block.
Code Block Updating
updateCodeBlock(messageContent, identifier, newCodeContent, language): A generic router function to update a code block's content by either its index orBlock-UUID.updateCodeBlockByIndex(messageContent, blockIndex, newCodeContent, language): Replaces the content of a code block specified by its index within a message.updateCodeBlockByUUID(messageContent, blockUUID, newCodeContent, language): Replaces the content of a code block specified by itsBlock-UUID.updateCodeBlockInMessage(messageText, blockUUID, newCode, language): Updates a code block in message text identified by UUID (moved from original PatchUtils).deleteCodeBlock(messageContent, identifier): A generic router function to delete a code block by either its index orBlock-UUID.deleteCodeBlockByIndex(messageContent, blockIndex): Deletes a code block specified by its index within a message.deleteCodeBlockByUUID(messageContent, blockUUID): Deletes a code block specified by itsBlock-UUID.
Line Number Formatting
formatWithLineNumbers(codeContent, startLine, paddingWidth): Formats raw code content by adding padded line numbers to each line.formatBlockWithLineNumbers(block, startLine, paddingWidth): Formats a processed code block object by adding line numbers to its content.formatBlocksWithLineNumbers(blocks, startLine, paddingWidth): Formats an array of processed code block objects with line numbers.removeLineNumbers(formattedContent): Removes line number prefixes from formatted code content.
Constants
COMMENT_STYLES: Defines comment styles for various programming languages, used byparseHeaderto correctly interpret header delimiters.
PatchUtils
Provides a comprehensive suite of utilities for handling, validating, applying, and correcting patches within GitSense Chat. These utilities are designed to work with the traditional unified diff format and offer enhanced diagnostics and fuzzy matching capabilities to improve patch reliability.
Core Patch Processing
applyPatch(sourceText, patchText): Applies a traditional unified diff patch to source code. It cleans line numbers from the patch content before using thejsdifflibrary for application.createPatch(sourceText, targetCode, metadata, filename): Generates a patch in traditional unified diff format between two versions of code, incorporating the specified metadata and adding line number prefixes to the diff content lines.createPatchFromCodeBlocks(sourceCodeBlockText, targetCodeBlockText, patchMetadata): Creates a patch between two full code block strings (including their metadata headers), adjusting hunk line numbers to account for the original code block's header.
Enhanced Patch Processing & Validation
applyPatchWithDiagnostics(sourceText, patchText): Applies a patch with detailed per-hunk diagnostics, including validation, fuzzy matching for misplaced hunks, and correction suggestions. Returns a comprehensive result object with success status, patched text, and diagnostic reports.validatePatch(sourceText, patchText): Validates a patch without applying it, providing detailed diagnostics and identifying if invalid hunks can be corrected using fuzzy matching.applyHunk(sourceText, hunkText): Applies a single hunk to source code, including validation and optional fuzzy matching for correction.
Patch Parsing & Metadata Handling
determinePatchFormat(patchText): Determines the patch format (e.g., 'traditional') based on the presence of specific markers.extractPatchMetadata(patchText): Extracts metadata fields from a patch block's header.validatePatchMetadata(metadata): Validates patch metadata for required fields (Source-Block-UUID,Target-Block-UUID,Source-Version,Target-Version,Description,Authors) and correct version/UUID formats.extractPatchContent(patchText, format): Extracts the raw unified diff content from between the patch start and end markers.isPatchBlock(codeBlockContent): Determines if a code block's content represents a patch by checking for the mandatory# Patch Metadataheader.detectPatch(messageText): Finds the first valid patch block within a larger message text.findAllPatches(messageText): Finds all valid patch blocks within a message text.
Header Formatting
formatCodeBlockHeader(sourceBlockInfo, patchBlock): Formats the metadata header for a new code block based on source information and patch metadata, following GitSense Chat's inheritance rules.
Hunk Validation
parseHunk(hunkText): Parses a hunk into its components (header, context lines, added lines, removed lines).parseHunkHeader(header): Parses a hunk header (@@ ... @@) into its constituent parts (old start/count, new start/count).validateHunk(sourceCode, hunkText): Validates a single hunk against source code, attempting direct application and fuzzy matching for corrections.validateHunks(sourceCode, hunks): Validates all hunks in a patch against the source code.
Fuzzy Matching
findBestContextMatch(contextLines, sourceCode, options): Finds the best match for context lines within source code using similarity scoring.findExactLineMatches(contextLine, sourceCode): Finds exact matches for a single context line within the source code.findBestMatchWithSlidingWindow(contextLines, sourceCode, windowSize): Finds the best match using a sliding window approach for longer context blocks.
Hunk Correction
recalculateHunkHeader(matchPosition, contextLines, addedLines, removedLines): Recalculates a hunk header based on the matched position and content.reconstructHunk(header, contextLines, addedLines, removedLines): Reconstructs a hunk with the corrected header.generateCorrectedHunk(originalHunk, matchResult): Generates a corrected hunk based on fuzzy matching results.preserveHunkStructure(originalHunkText, correctedHeader): Preserves the original structure of a hunk while updating its header.generateHunkCorrection(hunkValidation, matchResult): Generates a corrected version of a problematic hunk with an explanation.
Diagnostic Reporting
generateHumanReadableDiagnostics(hunkResults): Generates a human-readable diagnostic report summarizing the validation status of all hunks in a patch.generateLLMFeedback(hunkResults): Generates structured feedback about patch validation for consumption by LLMs.describeHunkResult(hunkResult): Generates a human-readable description of a single hunk's validation result.formatHunkResultForLLM(hunkResult): Formats a single hunk's validation result for LLM consumption.formatDiagnosticSummary(patchResult): Formats a diagnostic summary for the entire patch application or validation result.generateErrorMessage(patchResult): Generates a concise error message for a failed patch application or validation.
Patch Verification
verifyAndCorrectLineNumbers(patchText, sourceText, windowSize): Verifies and corrects theNNN:line number prefixes on context and deletion lines within a patch by comparing their content against the original source code within a sliding window.verifyAndCorrectHunkHeaders(patchText): Verifies and corrects the hunk headers (@@ -old,count +new,count @@) within a patch based on the line number prefixes found in the hunk's content lines.formatAndAddLineNumbers(patchText, sourceText): Ensures patch content lines haveNNN:line number prefixes with consistent padding, adding them if missing.detectAndFixRedundantChanges(patchText, autoFix): Detects and optionally fixes redundant changes in a patch where content is deleted and re-added identically.detectAndFixOverlappingHunks(patchText, autoFix): Detects and optionally fixes overlapping hunks in a patch file by merging them.
Constants
CONTENT_LINE_REGEX: Regex to parse context/deletion/addition lines with line numbers.HUNK_HEADER_REGEX: Regex to parse hunk headers.LINE_NUMBER_PREFIX_REGEX: Regex to detect line number prefixes.FUZZY_MATCH_THRESHOLD: Minimum confidence threshold for fuzzy matching.MAX_ALTERNATIVE_MATCHES: Maximum number of alternative matches to consider.DEFAULT_SLIDING_WINDOW_SIZE: Default sliding window size for fuzzy matching.MAX_CONTEXT_LINES_FOR_DIRECT_MATCH: Maximum context lines to use for direct matching before switching to sliding window.PATCH_START_MARKER: The start marker for patch content (# --- PATCH START MARKER ---).PATCH_END_MARKER: The end marker for patch content (# --- PATCH END MARKER ---).PATCH_METADATA_HEADER: The patch metadata header (# Patch Metadata).REQUIRED_METADATA_FIELDS: List of required metadata fields for a patch.ORIGINAL_FILE_HEADER: The header for the original file in the diff (--- Original).MODIFIED_FILE_HEADER: The header for the modified file in the diff (+++ Modified).
AnalyzerUtils
Provides a suite of utilities for managing and interacting with GitSense Chat analyzers. Analyzers are specialized message templates used by the GitSense Chat system to process and analyze code context.
Core Functions
getAnalyzers(basePath, options): Discovers and lists all available analyzers by traversing the directory structure underbasePath. An analyzer is considered valid if its corresponding1.mdinstruction file exists. Theoptionsparameter can includeincludeDescriptionto optionally load the analyzer's description from its JSON metadata.getAnalyzerSchema(basePath, analyzerId): Retrieves the JSON schema for a specific analyzer identified byanalyzerId. It reads the corresponding1.mdfile, extracts the JSON block, and deduces schema types from the metadata fields.getAnalyzerInstructionsContent(basePath, analyzerId): Loads the raw Markdown content of a specific analyzer's1.mdinstruction file based on itsanalyzerId.saveConfiguration(basePath, analyzerId, instructionsContent, options): Saves or updates an analyzer configuration. It parses theanalyzerIdto determine the directory structure, creates directories if necessary, and saves theinstructionsContentto1.md. It can optionally ensureconfig.jsonfiles exist in the analyzer, content, and instructions directories.deleteAnalyzer(basePath, analyzerId): Deletes a specific analyzer configuration and intelligently cleans up empty directories. It checks for protection at all levels (analyzer, content type, instructions type) before deletion.buildChatIdToPathMap(allMessages): Builds a map of chat IDs to file paths from the context messages within a chat, which is crucial for validating LLM-generated analysis metadata.processLLMAnalysisResponse(messageContent, stoppedStreaming): Extracts code blocks from LLM message content, identifies analysis and metadata blocks, and performs initial validation.
Default Prompt Loading
getSystemMessageContent(basePath): Retrieves the raw Markdown content of the shared system message (_shared/system/1.md).getStartMessageContent(basePath): Retrieves the raw Markdown content of the shared start message (_shared/start/1.md).
Helper Functions
readConfig(dirPath): Reads and parses theconfig.jsonfile in a directory.isValidDirName(name): Checks if a directory name is valid according to GitSense Chat rules.ensureConfigJson(dirPath, label): Ensures aconfig.jsonfile exists in the given directory with a specified label.
Constants
ANALYZE_HEADER_PREFIX: Defines the standard header prefix used in analysis blocks (# GitSense Chat Analysis).
MessageUtils
Provides utility functions for processing message files, including context message detection and chat tree navigation.
Core Functions
getChatTemplateMessages(dirname, messageType): Gets template messages from a specific message type directory (e.g., 'notes', 'draft'), parsing metadata to determine message role and content.getMessagesBeforeId(model, message, stopId, messages): Gets a list of messages in a chat tree up to a specific message ID.getMessageById(chatOrMessage, id): Gets a specific message by its ID from a chat tree using an iterative approach to avoid deep recursion.getLastMessage(message): Gets the last message in a specific conversation thread (follows the first child path).findMessages(rootNode, filterFn): Recursively finds all messages in a chat tree that match a given filter function, using an iterative approach with a stack.deleteMessagesByIds(rootNode, idsToDeleteArray): Deletes messages by their IDs from a nested chat structure, re-parenting their children.getMessageContentType(messageContent): Determines the type of message content based on specific prefixes (overview, file content, analyze, regular).isContextMessage(messageContent): Checks if a message is a context message (overview or file content).isContextItemsOverviewMessage(messageContent): Checks if a message is a context items overview message.isAnalyzeMessage(messageContent): Checks if a message is an analyze message.isNewAnalyzerInstructionsMessage(messageContent, strict): Checks if a message is the final "New Analyzer Instructions" message.parseAnalyzeMessage(messageContent): Parses the unique analyzer ID from an analyze message.
ConfigUtils
Provides utility functions for loading and accessing application configuration from data/chats.json.
Core Functions
loadConfig(filePath): Asynchronously loads and parses the application configuration from the specified JSON file path, handling errors appropriately.getProviderConfig(config, providerName): Retrieves the configuration object for a specific LLM provider (e.g., "Google", "Anthropic") from the loaded configuration data.getModelProviderDetails(config, modelName, providerName): Finds the specific details (likemodelIdandmaxOutputTokens) for a given user-friendlymodelNameassociated with a specificproviderNamewithin the configuration.getApiKeyName(config, providerName): Retrieves the name of the environment variable (e.g., "GEMINI_API_KEY") designated to hold the API key for a specified provider from the configuration.getProviderForModel(config, modelName): Retrieves the first provider configuration listed for a givenmodelNamein the configuration data.
EnvUtils
Provides utility functions for loading and accessing environment variables from a .env file.
Core Functions
loadEnv(filePath): Asynchronously loads environment variables from the specified.envfile path intoprocess.env. It ensures the file is only loaded once per process and handles cases where the file might not exist.getApiKey(keyName): Retrieves the value of a specific API key environment variable by its name (e.g., "GEMINI_API_KEY"). It assumesloadEnvhas been called previously and returnsnullif the variable is not set.
LLMUtils
Provides utility functions related to interactions with Large Language Models (LLMs).
Core Functions
estimateTokens(text): Estimates the number of tokens in a given text string using a basic heuristic (approximately 4 characters per token). Note that this is a rough estimate.
JsonUtils
Provides utility functions for working with JSON data.
Core Functions
detectJsonComments(jsonString): Scans a string for C-style block (/* ... */) and single-line (// ...) comments, correctly ignoring comments inside JSON string literals.
GSToolBlockUtils
Provides utility functions for identifying, parsing, formatting, and manipulating GitSense Chat Tool Blocks within markdown content.
Core Functions
isToolBlock(content): Checks if the provided code block content represents a GitSense Chat Tool Block by verifying the presence of the marker line# GitSense Chat Tool.parseToolBlock(content): Parses the content of a verified GitSense Chat Tool Block to extract the JSON payload, stripping initial marker and comment lines.formatToolBlock(toolData): Formats a tool data object into the standard string representation used inside a GitSense Chat Tool Block.replaceToolBlock(markdownContent, toolName, newToolData, CodeBlockUtils): Replaces the content of the first GitSense Chat Tool Block matching a specifictoolNamewithin a larger markdown string.detectAndFormatUnfencedToolBlock(messageContent): Detects an unfenced GitSense Chat Tool block within a message, validates its JSON content, and returns a properly formatted fenced block.
ContextUtils
Provides utility functions for parsing and formatting context message sections.
Core Functions
parseContextSection(sectionText): Parses the details (name, path, chat ID, etc.) and code content from a single context message section.extractContextSections(messageContent): Extracts and parses all context sections from a full context message string.extractContextItemsOverviewTableRows(messageContent): Parses the table rows from a Context Items Overview message.formatContextContent(items, contentType, contentOption): Formats an array of loaded items into a structured context message string suitable for LLM consumption.
AnalysisBlockUtils
Provides utilities for identifying, parsing, and validating structured analysis blocks within GitSense Chat messages.
Core Functions
isAnalysisBlock(content): Checks if the provided content string starts with the standard analysis block header (# GitSense Chat Analysis).getAnalysisBlockType(content): Determines the specific type of analysis block based on its initial header lines.parseOverviewMetadata(content): Parses the metadata fields from an analysis block's content, assuming a Markdown list format.validateAnalysisMetadata(metadata): Validates the parsed analysis metadata object, checking for required fields like 'Chat ID'.
Usage
To use the GitSense Chat Utils library in your project, you can import the main interface class or individual modules/functions as needed.
Main Interface Class
The primary way to interact with the library is through the GitSenseChatUtils class:
const { GitSenseChatUtils } = require('@gitsense/gsc-utils');
const utils = new GitSenseChatUtils();
// Example: Extract code blocks from markdown text
const markdownText = '...'; // Your markdown content
const { blocks, warnings } = utils.extractCodeBlocks(markdownText);
// Example: Fix invalid UUIDs in text
const { text: fixedText, modified } = utils.fixTextCodeBlocks(markdownText);
// Example: Check if a code block is a patch
const isPatch = utils.isPatchBlock(codeBlockContentString);Individual Modules
You can also import and use individual utility modules directly:
const { CodeBlockUtils } = require('@gitsense/gsc-utils');
const { PatchUtils } = require('@gitsense/gsc-utils');
const { AnalyzerUtils } = require('@gitsense/gsc-utils');
const { MessageUtils } = require('@gitsense/gsc-utils');
const { ConfigUtils } = require('@gitsense/gsc-utils');
const { EnvUtils } = require('@gitsense/gsc-utils');
const { LLMUtils } = require('@gitsense/gsc-utils');
const { JsonUtils } = require('@gitsense/gsc-utils');
const { GSToolBlockUtils } = require('@gitsense/gsc-utils');
const { ContextUtils } = require('@gitsense/gsc-utils');
const { AnalysisBlockUtils } = require('@gitsense/gsc-utils');
const { ChatUtils } = require('@gitsense/gsc-utils');
// Example: Using CodeBlockUtils directly
const { blocks, warnings } = CodeBlockUtils.extractCodeBlocks(markdownText);
// Example: Using PatchUtils directly
const patchResult = PatchUtils.applyPatchWithDiagnostics(sourceCodeString, patchString);
// Example: Using AnalyzerUtils directly
const analyzers = AnalyzerUtils.getAnalyzers('/path/to/analyzers');
// Example: Using MessageUtils directly
const messages = MessageUtils.getChatTemplateMessages('/path/to/messages', 'draft');Individual Functions
Many core functions are also exported individually for convenience:
const {
extractCodeBlocks,
fixTextCodeBlocks,
isPatchBlock,
getAnalyzers,
getChatTemplateMessages,
applyPatchWithDiagnostics,
estimateTokens
} = require('@gitsense/gsc-utils');
// Example: Extract code blocks
const { blocks, warnings } = extractCodeBlocks(markdownText);
// Example: Estimate tokens
const tokenCount = estimateTokens("Sample text for token estimation.");