ml-note-taker

v0.7.0

Published

6 months ago

A desktop note-taking app with AI-powered transcription.

0High
0Medium
0Low

sklinkert

Getting Started with Create React App

This project was bootstrapped with Create React App.

Available Scripts

In the project directory, you can run:

`npm start`

Runs the app in the development mode.
Open http://localhost:3000 to view it in the browser.

The page will reload if you make edits.
You will also see any lint errors in the console.

`npm test`

Launches the test runner in the interactive watch mode.
See the section about running tests for more information.

`npm run build`

Builds the app for production to the build folder.
It correctly bundles React in production mode and optimizes the build for the best performance.

The build is minified and the filenames include the hashes.
Your app is ready to be deployed!

See the section about deployment for more information.

`npm run eject`

Note: this is a one-way operation. Once you eject, you can't go back!

If you aren't satisfied with the build tool and configuration choices, you can eject at any time. This command will remove the single build dependency from your project.

Instead, it will copy all the configuration files and the transitive dependencies (webpack, Babel, ESLint, etc) right into your project so you have full control over them. All of the commands except eject will still work, but they will point to the copied scripts so you can tweak them. At this point you're on your own.

You don't have to ever use eject. The curated feature set is suitable for small and middle deployments, and you shouldn't feel obligated to use this feature. However we understand that this tool wouldn't be useful if you couldn't customize it when you are ready for it.

Recording Lifecycle

This application provides a comprehensive meeting recording and transcription system. Here's the complete lifecycle of a recording:

Recording Flow Diagram

graph TD
    A["User Starts Recording"] --> B["Enter Meeting Title"]
    B --> C["Select Audio Source"]
    C --> D["Start Recording"]
    
    D --> E["Electron Main Process"]
    E --> F["Helper Audio Bridge"]
    F --> G["Record System Audio + Microphone"]
    
    G --> H["Stop Recording"]
    H --> I["Mix Audio Files"]
    I --> J["Generate Recording ID"]
    J --> K["Upload to S3"]
    
    K --> L{"Upload Success?"}
    L -->|Yes| M["Status: transcribing"]
    L -->|No| N["Status: upload_failed"]
    
    M --> O["Background Processing"]
    O --> P["API Service: processAudio()"]
    P --> Q["Speech-to-Text Transcription"]
    Q --> R["Status: transcribed"]
    
    R --> S{"Has Speaker Mapping?"}
    S -->|No| T["Show Speaker Identification UI"]
    S -->|Yes| U["Generate Summary"]
    
    T --> V["User Assigns Speaker Names"]
    V --> W["Submit Speaker Mapping"]
    W --> U
    
    U --> X["API: generateMeetingSummary()"]
    X --> Y["AI Generates Summary & Action Items"]
    Y --> Z["Status: done"]
    
    Z --> AA["Complete Recording Available"]
    
    %% Error Handling
    K -->|Network Error| BB["Retry Upload"]
    Q -->|API Error| CC["Status: error"]
    X -->|API Error| DD["Summary Generation Failed"]
    
    %% Status Updates
    EE["30-second Auto Refresh"] --> FF["Check Recording Status"]
    FF --> GG["Update UI with Latest Status"]
    
    %% Pending Recordings Management
    HH["Pending Recordings Store"] --> II["Track In-Progress Recordings"]
    II --> JJ["Remove When Complete"]
    
    %% Multiple Workflows
    KK["Direct Processing<br/>(Fallback)"] --> LL["Real-time Progress Updates"]
    MM["Background Processing<br/>(Recommended)"] --> NN["Async Processing with Status Polling"]

Recording Process Steps

The recording process follows a 7-step workflow with visual progress tracking:

Prepare - User enters meeting title and selects audio source
Record - Audio recording is in progress
Process Audio - Audio data is being processed and uploaded
Transcribe - Speech-to-text transcription is being performed
Speaker Identification - User identifies speakers (when applicable)
Generating Summary - AI generates meeting summary and action items
Complete - Recording is fully processed and ready for review

Recording Status Values

The system uses several status values to track recording progress:

initialized - Upload URL generated, ready for S3 upload
transcribing - Audio uploaded to S3, transcription in progress
transcribed - Transcription complete, waiting for speaker mapping
summarizing - Speaker mapping uploaded, generating summary and action items
done - All processing complete including summary generation
done_deleted - Processing complete and all data securely deleted from servers
error - Processing failed (with error details and recommendations)

Processing Workflows

The application supports two processing workflows:

Background Processing (Recommended)

Recording is uploaded to cloud storage immediately
Returns a recordingId for tracking
Processing happens asynchronously in the background
Status updates appear in the sidebar automatically via 30-second refresh loop
User can continue with other recordings while processing occurs

Direct Processing (Fallback)

Audio is processed immediately after recording
Real-time progress updates during transcription
Immediate speaker identification step
Summary generation happens before completion

Audio Sources

The application supports multiple audio input sources:

Electron App (Desktop)

System Audio - Captures all system audio (meetings, calls, etc.)
Microphone - Records from selected microphone device
Hybrid Mode - Can combine system audio with microphone input

Browser Mode

Microphone Only - Records from selected microphone device
Requires microphone permissions from the browser

Speaker Identification

When a recording has multiple speakers, the system provides:

Automatic Speaker Detection - AI identifies different speakers as SPEAKER_01, SPEAKER_02, etc.
Manual Speaker Mapping - User can assign real names to each detected speaker
Audio Playback - 5-second audio samples help users identify speakers by voice
Conditional Display - Speaker mapping only appears when status is 'transcribed' and mapping is needed

Data Management

Recording Storage

Audio files are stored locally (Electron) or temporarily (browser)
Cloud backup available for persistent storage
Recording metadata includes title, date, duration, and processing status

Transcript Data

Segments include speaker labels, timestamps, and text content
Full transcript text for search and review
Structured message format for conversation view

Generated Content

AI-generated meeting summaries
Extracted action items with assignees
Searchable conversation history

Error Handling

The system provides comprehensive error handling:

Audio Capture Errors - Device permission issues, hardware problems
Upload Failures - Network connectivity, storage limits
Transcription Errors - Audio quality issues, unsupported formats
Processing Timeouts - Long recordings, server overload

Each error includes:

Clear error messages
Specific recommendations for resolution
Fallback options when available

User Experience Features

Visual Feedback

Real-time audio visualizer during recording
Progress bar showing completion percentage
Status indicators for each processing step
Loading states and error messages

Accessibility

Keyboard navigation support
Screen reader compatible
High contrast mode support
Responsive design for mobile devices

Performance Optimization

Background processing to avoid blocking UI
Automatic retry for failed operations
Efficient audio compression for uploads
Minimal memory usage during long recordings

Learn More

You can learn more in the Create React App documentation.

To learn React, check out the React documentation.