ml-note-taker
v0.7.0
Published
A desktop note-taking app with AI-powered transcription.
Readme
Getting Started with Create React App
This project was bootstrapped with Create React App.
Available Scripts
In the project directory, you can run:
npm start
Runs the app in the development mode.
Open http://localhost:3000 to view it in the browser.
The page will reload if you make edits.
You will also see any lint errors in the console.
npm test
Launches the test runner in the interactive watch mode.
See the section about running tests for more information.
npm run build
Builds the app for production to the build folder.
It correctly bundles React in production mode and optimizes the build for the best performance.
The build is minified and the filenames include the hashes.
Your app is ready to be deployed!
See the section about deployment for more information.
npm run eject
Note: this is a one-way operation. Once you eject, you can't go back!
If you aren't satisfied with the build tool and configuration choices, you can eject at any time. This command will remove the single build dependency from your project.
Instead, it will copy all the configuration files and the transitive dependencies (webpack, Babel, ESLint, etc) right into your project so you have full control over them. All of the commands except eject will still work, but they will point to the copied scripts so you can tweak them. At this point you're on your own.
You don't have to ever use eject. The curated feature set is suitable for small and middle deployments, and you shouldn't feel obligated to use this feature. However we understand that this tool wouldn't be useful if you couldn't customize it when you are ready for it.
Recording Lifecycle
This application provides a comprehensive meeting recording and transcription system. Here's the complete lifecycle of a recording:
Recording Flow Diagram
graph TD
A["User Starts Recording"] --> B["Enter Meeting Title"]
B --> C["Select Audio Source"]
C --> D["Start Recording"]
D --> E["Electron Main Process"]
E --> F["Helper Audio Bridge"]
F --> G["Record System Audio + Microphone"]
G --> H["Stop Recording"]
H --> I["Mix Audio Files"]
I --> J["Generate Recording ID"]
J --> K["Upload to S3"]
K --> L{"Upload Success?"}
L -->|Yes| M["Status: transcribing"]
L -->|No| N["Status: upload_failed"]
M --> O["Background Processing"]
O --> P["API Service: processAudio()"]
P --> Q["Speech-to-Text Transcription"]
Q --> R["Status: transcribed"]
R --> S{"Has Speaker Mapping?"}
S -->|No| T["Show Speaker Identification UI"]
S -->|Yes| U["Generate Summary"]
T --> V["User Assigns Speaker Names"]
V --> W["Submit Speaker Mapping"]
W --> U
U --> X["API: generateMeetingSummary()"]
X --> Y["AI Generates Summary & Action Items"]
Y --> Z["Status: done"]
Z --> AA["Complete Recording Available"]
%% Error Handling
K -->|Network Error| BB["Retry Upload"]
Q -->|API Error| CC["Status: error"]
X -->|API Error| DD["Summary Generation Failed"]
%% Status Updates
EE["30-second Auto Refresh"] --> FF["Check Recording Status"]
FF --> GG["Update UI with Latest Status"]
%% Pending Recordings Management
HH["Pending Recordings Store"] --> II["Track In-Progress Recordings"]
II --> JJ["Remove When Complete"]
%% Multiple Workflows
KK["Direct Processing<br/>(Fallback)"] --> LL["Real-time Progress Updates"]
MM["Background Processing<br/>(Recommended)"] --> NN["Async Processing with Status Polling"]Recording Process Steps
The recording process follows a 7-step workflow with visual progress tracking:
- Prepare - User enters meeting title and selects audio source
- Record - Audio recording is in progress
- Process Audio - Audio data is being processed and uploaded
- Transcribe - Speech-to-text transcription is being performed
- Speaker Identification - User identifies speakers (when applicable)
- Generating Summary - AI generates meeting summary and action items
- Complete - Recording is fully processed and ready for review
Recording Status Values
The system uses several status values to track recording progress:
initialized- Upload URL generated, ready for S3 uploadtranscribing- Audio uploaded to S3, transcription in progresstranscribed- Transcription complete, waiting for speaker mappingsummarizing- Speaker mapping uploaded, generating summary and action itemsdone- All processing complete including summary generationdone_deleted- Processing complete and all data securely deleted from serverserror- Processing failed (with error details and recommendations)
Processing Workflows
The application supports two processing workflows:
Background Processing (Recommended)
- Recording is uploaded to cloud storage immediately
- Returns a
recordingIdfor tracking - Processing happens asynchronously in the background
- Status updates appear in the sidebar automatically via 30-second refresh loop
- User can continue with other recordings while processing occurs
Direct Processing (Fallback)
- Audio is processed immediately after recording
- Real-time progress updates during transcription
- Immediate speaker identification step
- Summary generation happens before completion
Audio Sources
The application supports multiple audio input sources:
Electron App (Desktop)
- System Audio - Captures all system audio (meetings, calls, etc.)
- Microphone - Records from selected microphone device
- Hybrid Mode - Can combine system audio with microphone input
Browser Mode
- Microphone Only - Records from selected microphone device
- Requires microphone permissions from the browser
Speaker Identification
When a recording has multiple speakers, the system provides:
- Automatic Speaker Detection - AI identifies different speakers as SPEAKER_01, SPEAKER_02, etc.
- Manual Speaker Mapping - User can assign real names to each detected speaker
- Audio Playback - 5-second audio samples help users identify speakers by voice
- Conditional Display - Speaker mapping only appears when status is 'transcribed' and mapping is needed
Data Management
Recording Storage
- Audio files are stored locally (Electron) or temporarily (browser)
- Cloud backup available for persistent storage
- Recording metadata includes title, date, duration, and processing status
Transcript Data
- Segments include speaker labels, timestamps, and text content
- Full transcript text for search and review
- Structured message format for conversation view
Generated Content
- AI-generated meeting summaries
- Extracted action items with assignees
- Searchable conversation history
Error Handling
The system provides comprehensive error handling:
- Audio Capture Errors - Device permission issues, hardware problems
- Upload Failures - Network connectivity, storage limits
- Transcription Errors - Audio quality issues, unsupported formats
- Processing Timeouts - Long recordings, server overload
Each error includes:
- Clear error messages
- Specific recommendations for resolution
- Fallback options when available
User Experience Features
Visual Feedback
- Real-time audio visualizer during recording
- Progress bar showing completion percentage
- Status indicators for each processing step
- Loading states and error messages
Accessibility
- Keyboard navigation support
- Screen reader compatible
- High contrast mode support
- Responsive design for mobile devices
Performance Optimization
- Background processing to avoid blocking UI
- Automatic retry for failed operations
- Efficient audio compression for uploads
- Minimal memory usage during long recordings
Learn More
You can learn more in the Create React App documentation.
To learn React, check out the React documentation.
