@pietrop/serialize-stt-words
v1.0.0
Published
A module to serialize and deserialize words from STT in dpe format into arrays of each attribute.
Downloads
7
Readme
serialize-stt-words
A module to serialize and deserialize words from STT in dpe format into arrays of each attribute.
eg with euristics if mock8hours.json
is 8 hours and 9.6MB
This is the breakdown of file size for each attribute saved seperately.
58K paragraphEndTimes.json
59K paragraphStartTimes.json
93K speakersLit.json
637K textList.json
637K wordEndTimes.json
653K wordStartTimes.json
Well within the 1MB firebase document limit.
Setup
git clone [email protected]:pietrop/serialize-stt-words.git
cd serialize-stt-words
npm install
Usage
{
"words": [
{
"text": "Hello",
"start": 0,
"end": 0.88
},
....
],
"paragraphs": [
{
"speaker": "SPEAKER_B",
"start": 0,
"end": 1.24
},
...
]
}
Returns arrays of
npm install @pietrop/serialize-stt-words
const { serializeTranscript } = require('@pietrop/serialize-stt-words');
const { wordStartTimes, wordEndTimes, textList, paragraphStartTimes, paragraphEndTimes, speakersLit } = serializeTranscript(transcript);
{
"wordStartTimes": [
0,
0.9,
1.13,
...
],
"wordEndTimes": [
0.88,
1.12,
...
],
"textList": [
"Media",
"will",
...
],
"paragraphStartTimes": [
0,
1.25,
...
],
"paragraphEndTimes": [
1.24,
4,
...
],
"speakersLit": [
"SPEAKER_B",
"SPEAKER_A",
...
]
}
The idea being that you could save each separate in a db and recombine later.
const { deserializeTranscript } = require('@pietrop/serialize-stt-words');
const desRes = deserializeTranscript({ wordStartTimes, wordEndTimes, textList, paragraphStartTimes, paragraphEndTimes, speakersLit });
Documentation
There's a docs folder in this repository.
docs/notes contains dev draft notes on various aspects of the project. This would generally be converted either into ADRs or guides when ready.
Development env
- npm >
6.1.0
- Node 12
Node version is set in node version manager .nvmrc
nvm use
Tests
npm test
Deployment
npm run publish:public