@gitsense/gsc-utils

v0.2.25

Published

2 months ago

Utilities for GitSense Chat (GSC)

Downloads

0High
0Medium
0Low

gitsense

git chat code-blocks patches markdown parsing

GitSense Chat Utils (gsc-utils)

A comprehensive JavaScript library providing utilities for processing, manipulating, and managing various elements within GitSense Chat messages, including code blocks, patches, analyzers, and context data.

Code Structure

.
├── LICENSE
├── README.md
├── package-lock.json
├── package.json
├── rollup.config.js
├── src
│   ├── AnalysisBlockUtils.js
│   ├── AnalyzerUtils
│   │   ├── constants.js
│   │   ├── contextMapper.js
│   │   ├── dataValidator.js
│   │   ├── defaultPromptLoader.js
│   │   ├── discovery.js
│   │   ├── index.js
│   │   ├── instructionLoader.js
│   │   ├── management.js
│   │   ├── responseProcessor.js
│   │   ├── saver.js
│   │   └── schemaLoader.js
│   ├── ChatUtils.js
│   ├── CodeBlockUtils
│   │   ├── blockExtractor.js
│   │   ├── blockProcessor.js
│   │   ├── constants.js
│   │   ├── continuationUtils.js
│   │   ├── headerParser.js
│   │   ├── headerUtils.js
│   │   ├── index.js
│   │   ├── lineNumberFormatter.js
│   │   ├── markerRemover.js
│   │   ├── patchIntegration.js
│   │   ├── relationshipUtils.js
│   │   ├── updateCodeBlock.js
│   │   └── uuidUtils.js
│   ├── ConfigUtils.js
│   ├── ContextUtils.js
│   ├── EnvUtils.js
│   ├── GSToolBlockUtils.js
│   ├── GitSenseChatUtils.js
│   ├── JsonUtils.js
│   ├── LLMUtils.js
│   ├── MessageUtils.js
│   ├── PatchUtils
│   │   ├── constants.js
│   │   ├── diagnosticReporter.js
│   │   ├── enhancedPatchProcessor.js
│   │   ├── fuzzyMatcher.js
│   │   ├── hunkCorrector.js
│   │   ├── hunkValidator.js
│   │   ├── index.js
│   │   ├── patchExtractor.js
│   │   ├── patchHeaderFormatter.js
│   │   ├── patchParser.js
│   │   ├── patchProcessor.js
│   │   └── patchVerifier
│   │       ├── constants.js
│   │       ├── detectAndFixOverlappingHunks.js
│   │       ├── detectAndFixRedundantChanges.js
│   │       ├── formatAndAddLineNumbers.js
│   │       ├── index.js
│   │       ├── verifyAndCorrectHunkHeaders.js
│   │       └── verifyAndCorrectLineNumbers.js
│   └── SharedUtils
│       ├── timestampUtils.js
│       └── versionUtils.js

Core Modules

GitSenseChatUtils (Main Interface)

This is the primary class exported by the library, offering a unified API for interacting with the various utility modules. It aggregates functionalities from CodeBlockUtils, PatchUtils, AnalyzerUtils, MessageUtils, LLMUtils, ConfigUtils, and EnvUtils.

CodeBlockUtils

Provides a comprehensive suite of utilities for processing, manipulating, and managing code blocks within GitSense Chat messages. These utilities handle various aspects of the code block lifecycle, including extraction, header parsing, UUID management, patch detection, and continuation handling.

Core Functions

processCodeBlocks(text, options): The primary function for processing code blocks. It identifies code blocks within text using markdown fences, parses their headers or content, detects block types (code, patch, gs-tool, gitsense-search-flow, analysis), and extracts relevant metadata. It returns detailed information about each block, including any warnings encountered during processing.
extractCodeBlocks(text, options): A simplified API wrapper around processCodeBlocks that returns only the extracted blocks and warnings, suitable for basic code block identification needs.
fixTextCodeBlocks(text): Scans text for code blocks with invalid UUIDs and automatically corrects them by generating new valid UUID v4 strings.

Block Extraction

findAllCodeFences(text): Identifies the positions of all opening and closing markdown code fences (```) within a given text.
matchFencesAndExtractBlocks(text, openingPositions, closingPositions): Matches identified opening and closing fences to determine complete and incomplete code blocks, providing warnings for potentially malformed structures.
extractCodeBlocksWithUUIDs(messageText): Finds all code blocks in a message and attempts to extract their Block-UUID from the content.
findCodeBlockByUUID(messageText, blockUUID): Locates a specific code block within a message by its Block-UUID.

Header Utilities

parseHeader(header, language): Parses the metadata header from a code block's content based on its programming language, extracting fields like Component, Block-UUID, Version, Description, Language, Created-at, and Authors.
isValidISOTimestamp(timestamp): Validates if a string conforms to the ISO 8601 timestamp format.
getHeaderLineCount(headerText, language): Calculates the total number of lines a code block's header occupies, including comment delimiters and the two mandatory blank lines that follow it.

UUID Utilities

generateUUID(): Generates a new valid RFC 4122 UUID v4 string.
validateUUID(uuid): Validates a UUID string and returns an object indicating its validity and a corrected UUID if the original was invalid.

Patch Integration

containsPatch(content): Checks if the provided text content contains at least one patch block.

Relationship & Context Utilities

detectCodeBlockRelationships(content, codeBlockService, options): Detects special relationships between code blocks, such as patches or parent-child links, using an optional codeBlockService to verify parent UUID existence.
detectIncompleteCodeBlock(content, options): Identifies if a message contains an incomplete code block and returns information about the last one found.
extractFilePaths(messageText): Extracts file paths from specially formatted lines in a message (e.g., #### File: \path/to/file.ext``).

Continuation Utilities

extractContinuationInfo(content, partNumber, language, header): Extracts context information from an incomplete block's content to assist in generating prompts for continuation.
generateContinuationPrompt(incompleteBlock, isLastPart): Generates a structured prompt for continuing an incomplete code block, including instructions for metadata preservation and format.

Comment-Delimited Header Parsing

parseCommentDelimitedBlocks(input): Parses code blocks where metadata headers are enclosed in language-specific comment delimiters (e.g., /** ... */, """ ... """) rather than markdown fences.

Marker Removal

removeCodeBlockMarkers(markdownText): Removes custom --- CODE BLOCK START --- and --- CODE BLOCK COMPLETE --- markers surrounding markdown code blocks, but only when both markers are present for a given block.

Code Block Updating

updateCodeBlock(messageContent, identifier, newCodeContent, language): A generic router function to update a code block's content by either its index or Block-UUID.
updateCodeBlockByIndex(messageContent, blockIndex, newCodeContent, language): Replaces the content of a code block specified by its index within a message.
updateCodeBlockByUUID(messageContent, blockUUID, newCodeContent, language): Replaces the content of a code block specified by its Block-UUID.
updateCodeBlockInMessage(messageText, blockUUID, newCode, language): Updates a code block in message text identified by UUID (moved from original PatchUtils).
deleteCodeBlock(messageContent, identifier): A generic router function to delete a code block by either its index or Block-UUID.
deleteCodeBlockByIndex(messageContent, blockIndex): Deletes a code block specified by its index within a message.
deleteCodeBlockByUUID(messageContent, blockUUID): Deletes a code block specified by its Block-UUID.

Line Number Formatting

formatWithLineNumbers(codeContent, startLine, paddingWidth): Formats raw code content by adding padded line numbers to each line.
formatBlockWithLineNumbers(block, startLine, paddingWidth): Formats a processed code block object by adding line numbers to its content.
formatBlocksWithLineNumbers(blocks, startLine, paddingWidth): Formats an array of processed code block objects with line numbers.
removeLineNumbers(formattedContent): Removes line number prefixes from formatted code content.

Constants

COMMENT_STYLES: Defines comment styles for various programming languages, used by parseHeader to correctly interpret header delimiters.

PatchUtils

Provides a comprehensive suite of utilities for handling, validating, applying, and correcting patches within GitSense Chat. These utilities are designed to work with the traditional unified diff format and offer enhanced diagnostics and fuzzy matching capabilities to improve patch reliability.

Core Patch Processing

applyPatch(sourceText, patchText): Applies a traditional unified diff patch to source code. It cleans line numbers from the patch content before using the jsdiff library for application.
createPatch(sourceText, targetCode, metadata, filename): Generates a patch in traditional unified diff format between two versions of code, incorporating the specified metadata and adding line number prefixes to the diff content lines.
createPatchFromCodeBlocks(sourceCodeBlockText, targetCodeBlockText, patchMetadata): Creates a patch between two full code block strings (including their metadata headers), adjusting hunk line numbers to account for the original code block's header.

Enhanced Patch Processing & Validation

applyPatchWithDiagnostics(sourceText, patchText): Applies a patch with detailed per-hunk diagnostics, including validation, fuzzy matching for misplaced hunks, and correction suggestions. Returns a comprehensive result object with success status, patched text, and diagnostic reports.
validatePatch(sourceText, patchText): Validates a patch without applying it, providing detailed diagnostics and identifying if invalid hunks can be corrected using fuzzy matching.
applyHunk(sourceText, hunkText): Applies a single hunk to source code, including validation and optional fuzzy matching for correction.

Patch Parsing & Metadata Handling

determinePatchFormat(patchText): Determines the patch format (e.g., 'traditional') based on the presence of specific markers.
extractPatchMetadata(patchText): Extracts metadata fields from a patch block's header.
validatePatchMetadata(metadata): Validates patch metadata for required fields (Source-Block-UUID, Target-Block-UUID, Source-Version, Target-Version, Description, Authors) and correct version/UUID formats.
extractPatchContent(patchText, format): Extracts the raw unified diff content from between the patch start and end markers.
isPatchBlock(codeBlockContent): Determines if a code block's content represents a patch by checking for the mandatory # Patch Metadata header.
detectPatch(messageText): Finds the first valid patch block within a larger message text.
findAllPatches(messageText): Finds all valid patch blocks within a message text.

Header Formatting

formatCodeBlockHeader(sourceBlockInfo, patchBlock): Formats the metadata header for a new code block based on source information and patch metadata, following GitSense Chat's inheritance rules.

Hunk Validation

parseHunk(hunkText): Parses a hunk into its components (header, context lines, added lines, removed lines).
parseHunkHeader(header): Parses a hunk header (@@ ... @@) into its constituent parts (old start/count, new start/count).
validateHunk(sourceCode, hunkText): Validates a single hunk against source code, attempting direct application and fuzzy matching for corrections.
validateHunks(sourceCode, hunks): Validates all hunks in a patch against the source code.

Fuzzy Matching

findBestContextMatch(contextLines, sourceCode, options): Finds the best match for context lines within source code using similarity scoring.
findExactLineMatches(contextLine, sourceCode): Finds exact matches for a single context line within the source code.
findBestMatchWithSlidingWindow(contextLines, sourceCode, windowSize): Finds the best match using a sliding window approach for longer context blocks.

Hunk Correction

recalculateHunkHeader(matchPosition, contextLines, addedLines, removedLines): Recalculates a hunk header based on the matched position and content.
reconstructHunk(header, contextLines, addedLines, removedLines): Reconstructs a hunk with the corrected header.
generateCorrectedHunk(originalHunk, matchResult): Generates a corrected hunk based on fuzzy matching results.
preserveHunkStructure(originalHunkText, correctedHeader): Preserves the original structure of a hunk while updating its header.
generateHunkCorrection(hunkValidation, matchResult): Generates a corrected version of a problematic hunk with an explanation.

Diagnostic Reporting

generateHumanReadableDiagnostics(hunkResults): Generates a human-readable diagnostic report summarizing the validation status of all hunks in a patch.
generateLLMFeedback(hunkResults): Generates structured feedback about patch validation for consumption by LLMs.
describeHunkResult(hunkResult): Generates a human-readable description of a single hunk's validation result.
formatHunkResultForLLM(hunkResult): Formats a single hunk's validation result for LLM consumption.
formatDiagnosticSummary(patchResult): Formats a diagnostic summary for the entire patch application or validation result.
generateErrorMessage(patchResult): Generates a concise error message for a failed patch application or validation.

Patch Verification

verifyAndCorrectLineNumbers(patchText, sourceText, windowSize): Verifies and corrects the NNN: line number prefixes on context and deletion lines within a patch by comparing their content against the original source code within a sliding window.
verifyAndCorrectHunkHeaders(patchText): Verifies and corrects the hunk headers (@@ -old,count +new,count @@) within a patch based on the line number prefixes found in the hunk's content lines.
formatAndAddLineNumbers(patchText, sourceText): Ensures patch content lines have NNN: line number prefixes with consistent padding, adding them if missing.
detectAndFixRedundantChanges(patchText, autoFix): Detects and optionally fixes redundant changes in a patch where content is deleted and re-added identically.
detectAndFixOverlappingHunks(patchText, autoFix): Detects and optionally fixes overlapping hunks in a patch file by merging them.

Constants

CONTENT_LINE_REGEX: Regex to parse context/deletion/addition lines with line numbers.
HUNK_HEADER_REGEX: Regex to parse hunk headers.
LINE_NUMBER_PREFIX_REGEX: Regex to detect line number prefixes.
FUZZY_MATCH_THRESHOLD: Minimum confidence threshold for fuzzy matching.
MAX_ALTERNATIVE_MATCHES: Maximum number of alternative matches to consider.
DEFAULT_SLIDING_WINDOW_SIZE: Default sliding window size for fuzzy matching.
MAX_CONTEXT_LINES_FOR_DIRECT_MATCH: Maximum context lines to use for direct matching before switching to sliding window.
PATCH_START_MARKER: The start marker for patch content (# --- PATCH START MARKER ---).
PATCH_END_MARKER: The end marker for patch content (# --- PATCH END MARKER ---).
PATCH_METADATA_HEADER: The patch metadata header (# Patch Metadata).
REQUIRED_METADATA_FIELDS: List of required metadata fields for a patch.
ORIGINAL_FILE_HEADER: The header for the original file in the diff (--- Original).
MODIFIED_FILE_HEADER: The header for the modified file in the diff (+++ Modified).

AnalyzerUtils

Provides a suite of utilities for managing and interacting with GitSense Chat analyzers. Analyzers are specialized message templates used by the GitSense Chat system to process and analyze code context.

Core Functions

getAnalyzers(basePath, options): Discovers and lists all available analyzers by traversing the directory structure under basePath. An analyzer is considered valid if its corresponding 1.md instruction file exists. The options parameter can include includeDescription to optionally load the analyzer's description from its JSON metadata.
getAnalyzerSchema(basePath, analyzerId): Retrieves the JSON schema for a specific analyzer identified by analyzerId. It reads the corresponding 1.md file, extracts the JSON block, and deduces schema types from the metadata fields.
getAnalyzerInstructionsContent(basePath, analyzerId): Loads the raw Markdown content of a specific analyzer's 1.md instruction file based on its analyzerId.
saveConfiguration(basePath, analyzerId, instructionsContent, options): Saves or updates an analyzer configuration. It parses the analyzerId to determine the directory structure, creates directories if necessary, and saves the instructionsContent to 1.md. It can optionally ensure config.json files exist in the analyzer, content, and instructions directories.
deleteAnalyzer(basePath, analyzerId): Deletes a specific analyzer configuration and intelligently cleans up empty directories. It checks for protection at all levels (analyzer, content type, instructions type) before deletion.
buildChatIdToPathMap(allMessages): Builds a map of chat IDs to file paths from the context messages within a chat, which is crucial for validating LLM-generated analysis metadata.
processLLMAnalysisResponse(messageContent, stoppedStreaming): Extracts code blocks from LLM message content, identifies analysis and metadata blocks, and performs initial validation.

Default Prompt Loading

getSystemMessageContent(basePath): Retrieves the raw Markdown content of the shared system message (_shared/system/1.md).
getStartMessageContent(basePath): Retrieves the raw Markdown content of the shared start message (_shared/start/1.md).

Helper Functions

readConfig(dirPath): Reads and parses the config.json file in a directory.
isValidDirName(name): Checks if a directory name is valid according to GitSense Chat rules.
ensureConfigJson(dirPath, label): Ensures a config.json file exists in the given directory with a specified label.

Constants

ANALYZE_HEADER_PREFIX: Defines the standard header prefix used in analysis blocks (# GitSense Chat Analysis).

MessageUtils

Provides utility functions for processing message files, including context message detection and chat tree navigation.

Core Functions

getChatTemplateMessages(dirname, messageType): Gets template messages from a specific message type directory (e.g., 'notes', 'draft'), parsing metadata to determine message role and content.
getMessagesBeforeId(model, message, stopId, messages): Gets a list of messages in a chat tree up to a specific message ID.
getMessageById(chatOrMessage, id): Gets a specific message by its ID from a chat tree using an iterative approach to avoid deep recursion.
getLastMessage(message): Gets the last message in a specific conversation thread (follows the first child path).
findMessages(rootNode, filterFn): Recursively finds all messages in a chat tree that match a given filter function, using an iterative approach with a stack.
deleteMessagesByIds(rootNode, idsToDeleteArray): Deletes messages by their IDs from a nested chat structure, re-parenting their children.
getMessageContentType(messageContent): Determines the type of message content based on specific prefixes (overview, file content, analyze, regular).
isContextMessage(messageContent): Checks if a message is a context message (overview or file content).
isContextItemsOverviewMessage(messageContent): Checks if a message is a context items overview message.
isAnalyzeMessage(messageContent): Checks if a message is an analyze message.
isNewAnalyzerInstructionsMessage(messageContent, strict): Checks if a message is the final "New Analyzer Instructions" message.
parseAnalyzeMessage(messageContent): Parses the unique analyzer ID from an analyze message.

ConfigUtils

Provides utility functions for loading and accessing application configuration from data/chats.json.

Core Functions

loadConfig(filePath): Asynchronously loads and parses the application configuration from the specified JSON file path, handling errors appropriately.
getProviderConfig(config, providerName): Retrieves the configuration object for a specific LLM provider (e.g., "Google", "Anthropic") from the loaded configuration data.
getModelProviderDetails(config, modelName, providerName): Finds the specific details (like modelId and maxOutputTokens) for a given user-friendly modelName associated with a specific providerName within the configuration.
getApiKeyName(config, providerName): Retrieves the name of the environment variable (e.g., "GEMINI_API_KEY") designated to hold the API key for a specified provider from the configuration.
getProviderForModel(config, modelName): Retrieves the first provider configuration listed for a given modelName in the configuration data.

EnvUtils

Provides utility functions for loading and accessing environment variables from a .env file.

Core Functions

loadEnv(filePath): Asynchronously loads environment variables from the specified .env file path into process.env. It ensures the file is only loaded once per process and handles cases where the file might not exist.
getApiKey(keyName): Retrieves the value of a specific API key environment variable by its name (e.g., "GEMINI_API_KEY"). It assumes loadEnv has been called previously and returns null if the variable is not set.

LLMUtils

Provides utility functions related to interactions with Large Language Models (LLMs).

Core Functions

estimateTokens(text): Estimates the number of tokens in a given text string using a basic heuristic (approximately 4 characters per token). Note that this is a rough estimate.

JsonUtils

Provides utility functions for working with JSON data.

Core Functions

detectJsonComments(jsonString): Scans a string for C-style block (/* ... */) and single-line (// ...) comments, correctly ignoring comments inside JSON string literals.

GSToolBlockUtils

Provides utility functions for identifying, parsing, formatting, and manipulating GitSense Chat Tool Blocks within markdown content.

Core Functions

isToolBlock(content): Checks if the provided code block content represents a GitSense Chat Tool Block by verifying the presence of the marker line # GitSense Chat Tool.
parseToolBlock(content): Parses the content of a verified GitSense Chat Tool Block to extract the JSON payload, stripping initial marker and comment lines.
formatToolBlock(toolData): Formats a tool data object into the standard string representation used inside a GitSense Chat Tool Block.
replaceToolBlock(markdownContent, toolName, newToolData, CodeBlockUtils): Replaces the content of the first GitSense Chat Tool Block matching a specific toolName within a larger markdown string.
detectAndFormatUnfencedToolBlock(messageContent): Detects an unfenced GitSense Chat Tool block within a message, validates its JSON content, and returns a properly formatted fenced block.

ContextUtils

Provides utility functions for parsing and formatting context message sections.

Core Functions

parseContextSection(sectionText): Parses the details (name, path, chat ID, etc.) and code content from a single context message section.
extractContextSections(messageContent): Extracts and parses all context sections from a full context message string.
extractContextItemsOverviewTableRows(messageContent): Parses the table rows from a Context Items Overview message.
formatContextContent(items, contentType, contentOption): Formats an array of loaded items into a structured context message string suitable for LLM consumption.

AnalysisBlockUtils

Provides utilities for identifying, parsing, and validating structured analysis blocks within GitSense Chat messages.

Core Functions

isAnalysisBlock(content): Checks if the provided content string starts with the standard analysis block header (# GitSense Chat Analysis).
getAnalysisBlockType(content): Determines the specific type of analysis block based on its initial header lines.
parseOverviewMetadata(content): Parses the metadata fields from an analysis block's content, assuming a Markdown list format.
validateAnalysisMetadata(metadata): Validates the parsed analysis metadata object, checking for required fields like 'Chat ID'.

Usage

To use the GitSense Chat Utils library in your project, you can import the main interface class or individual modules/functions as needed.

Main Interface Class

The primary way to interact with the library is through the GitSenseChatUtils class:

const { GitSenseChatUtils } = require('@gitsense/gsc-utils');

const utils = new GitSenseChatUtils();

// Example: Extract code blocks from markdown text
const markdownText = '...'; // Your markdown content
const { blocks, warnings } = utils.extractCodeBlocks(markdownText);

// Example: Fix invalid UUIDs in text
const { text: fixedText, modified } = utils.fixTextCodeBlocks(markdownText);

// Example: Check if a code block is a patch
const isPatch = utils.isPatchBlock(codeBlockContentString);

Individual Modules

You can also import and use individual utility modules directly:

const { CodeBlockUtils } = require('@gitsense/gsc-utils');
const { PatchUtils } = require('@gitsense/gsc-utils');
const { AnalyzerUtils } = require('@gitsense/gsc-utils');
const { MessageUtils } = require('@gitsense/gsc-utils');
const { ConfigUtils } = require('@gitsense/gsc-utils');
const { EnvUtils } = require('@gitsense/gsc-utils');
const { LLMUtils } = require('@gitsense/gsc-utils');
const { JsonUtils } = require('@gitsense/gsc-utils');
const { GSToolBlockUtils } = require('@gitsense/gsc-utils');
const { ContextUtils } = require('@gitsense/gsc-utils');
const { AnalysisBlockUtils } = require('@gitsense/gsc-utils');
const { ChatUtils } = require('@gitsense/gsc-utils');

// Example: Using CodeBlockUtils directly
const { blocks, warnings } = CodeBlockUtils.extractCodeBlocks(markdownText);

// Example: Using PatchUtils directly
const patchResult = PatchUtils.applyPatchWithDiagnostics(sourceCodeString, patchString);

// Example: Using AnalyzerUtils directly
const analyzers = AnalyzerUtils.getAnalyzers('/path/to/analyzers');

// Example: Using MessageUtils directly
const messages = MessageUtils.getChatTemplateMessages('/path/to/messages', 'draft');

Individual Functions

Many core functions are also exported individually for convenience:

const { 
  extractCodeBlocks, 
  fixTextCodeBlocks, 
  isPatchBlock,
  getAnalyzers,
  getChatTemplateMessages,
  applyPatchWithDiagnostics,
  estimateTokens
} = require('@gitsense/gsc-utils');

// Example: Extract code blocks
const { blocks, warnings } = extractCodeBlocks(markdownText);

// Example: Estimate tokens
const tokenCount = estimateTokens("Sample text for token estimation.");

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

GitSense Chat Utils (gsc-utils)

Code Structure

Core Modules

GitSenseChatUtils (Main Interface)

CodeBlockUtils

Core Functions

Block Extraction

Header Utilities

UUID Utilities

Patch Integration

Relationship & Context Utilities

Continuation Utilities

Comment-Delimited Header Parsing

Marker Removal

Code Block Updating

Line Number Formatting

Constants

PatchUtils

Core Patch Processing

Enhanced Patch Processing & Validation

Patch Parsing & Metadata Handling

Header Formatting

Hunk Validation

Fuzzy Matching

Hunk Correction

Diagnostic Reporting

Patch Verification

Constants

AnalyzerUtils

Core Functions

Default Prompt Loading

Helper Functions

Constants

MessageUtils

Core Functions

ConfigUtils

Core Functions

EnvUtils

Core Functions

LLMUtils

Core Functions

JsonUtils

Core Functions

GSToolBlockUtils

Core Functions

ContextUtils

Core Functions

AnalysisBlockUtils

Core Functions

Usage

Main Interface Class

Individual Modules

Individual Functions