bhashini-asr
v0.1.0
Published
Indian-language Speech-to-Text widget + backend proxy for the Bhashini ULCA ASR service. React hooks/components for the browser, Express route + adapter for Node — one package, two subpath exports.
Downloads
138
Maintainers
Readme
bhashini-asr
Drop-in Speech-to-Text for Indian-language web apps. Browser-native for English and Hindi, Bhashini ULCA for the other major Indian languages — one npm package, two subpath exports.
What you get
- A React widget you drop next to any text field — citizen taps a mic, speaks in their language, recognised text lands in the field.
- An Express route factory that proxies the Bhashini call, so the API secret never reaches the browser.
- 11 Indian languages out of the box: English, Hindi, Marathi, Tamil, Telugu, Kannada, Malayalam, Bengali, Gujarati, Punjabi, Odia.
- Mobile Chrome quirks already handled: per-session result-index dedup,
auto-restart on silent
onend, 3-strike back-off on wedged engines. - Dev-stub mode — build the wiring with no Bhashini key, validate against real ASR later.
┌─────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ React app │ /transcribe │ Your Express BE │ compute │ Bhashini ULCA │
│ @../react │ ─────────────►│ @../server │ ─────────► │ ASR pipeline │
│ MediaRecorder │ ◄──────────── │ axios + zod │ ◄──────── │ │
└─────────────────┘ transcript └──────────────────┘ └──────────────────┘Install
npm install bhashini-asrPeer deps (install whichever your app uses):
| If you use | Install |
|---|---|
| React widget (any form library or none) | react@>=18 react-dom@>=18 |
| Formik shortcut (/react/formik subpath) | formik@>=2.4 |
| Express route | express@>=4.18 express-rate-limit@>=7 |
You are not locked into Formik. The Formik integration lives on a dedicated subpath (
bhashini-asr/react/formik). If you don't import from that subpath, Formik is never pulled into your bundle — works the same whether you use Material UI, Bootstrap, react-hook-form, Mantine, plain controlled state, or nothing at all. See the Framework integrations section below for examples.
Quick start — React (frontend)
The widget needs to know where to POST recordings for non-English/Hindi
languages. Pass either a URL (uses fetch internally) or your own
transcribe function (custom HTTP client with auth headers, RTK Query, etc.).
import { SpeechMicButton } from "bhashini-asr/react";
function GrievanceForm() {
const [description, setDescription] = useState("");
return (
<div>
<label>Describe the problem</label>
<textarea
value={description}
onChange={(e) => setDescription(e.target.value)}
/>
<SpeechMicButton
transcribeUrl="/api/v1/speech/transcribe"
onTranscript={(chunk) => setDescription((prev) =>
prev ? `${prev} ${chunk}` : chunk
)}
/>
</div>
);
}Textarea-with-mic shortcut
The simplest drop-in — works with any form library or plain controlled state:
import { MicTextarea } from "bhashini-asr/react";
<MicTextarea
value={remarks}
onChange={setRemarks}
placeholder="Officer remarks…"
micProps={{ transcribeUrl: "/api/v1/speech/transcribe" }}
/>Custom transcribe function (e.g. authenticated officer flow)
import { SpeechMicButton } from "bhashini-asr/react";
import { useTranscribeMutation } from "@/api/speechApi"; // your RTK Query slice
function OfficerNote() {
const [transcribeMutation] = useTranscribeMutation();
return (
<SpeechMicButton
onTranscript={(text) => /* append to your state */}
transcribe={async (audioBase64, language, contentType, samplingRate) => {
const res = await transcribeMutation({
audio_base64: audioBase64, language,
content_type: contentType, sampling_rate: samplingRate,
}).unwrap();
return res; // { transcript, language, fallback }
}}
/>
);
}Restrict the language picker
<SpeechMicButton
transcribeUrl="/api/v1/speech/transcribe"
onTranscript={...}
languages={[
{ code: "en-IN", label: "English" },
{ code: "hi-IN", label: "हिंदी" },
{ code: "mr-IN", label: "मराठी" },
]}
/>Framework integrations
The widget never assumes a form library. Pick the snippet that matches
your stack — anywhere you have a callback that can append text to a
field, SpeechMicButton plugs in.
Plain React (useState)
import { SpeechMicButton } from "bhashini-asr/react";
const [text, setText] = useState("");
<SpeechMicButton
transcribeUrl="/api/v1/speech/transcribe"
onTranscript={(chunk) =>
setText((prev) => (prev ? `${prev} ${chunk}` : chunk))
}
/>Material UI
import { TextField, InputAdornment } from "@mui/material";
import { SpeechMicButton } from "bhashini-asr/react";
<TextField
label="Describe the problem"
multiline
rows={6}
fullWidth
value={text}
onChange={(e) => setText(e.target.value)}
InputProps={{
endAdornment: (
<InputAdornment position="end">
<SpeechMicButton
transcribeUrl="/api/v1/speech/transcribe"
onTranscript={(chunk) =>
setText((p) => (p ? `${p} ${chunk}` : chunk))
}
/>
</InputAdornment>
),
}}
/>React Bootstrap
import { Form, InputGroup } from "react-bootstrap";
import { SpeechMicButton } from "bhashini-asr/react";
<Form.Group>
<Form.Label>Describe the problem</Form.Label>
<InputGroup>
<Form.Control
as="textarea"
rows={6}
value={text}
onChange={(e) => setText(e.target.value)}
/>
<InputGroup.Text>
<SpeechMicButton
transcribeUrl="/api/v1/speech/transcribe"
onTranscript={(chunk) =>
setText((p) => (p ? `${p} ${chunk}` : chunk))
}
/>
</InputGroup.Text>
</InputGroup>
</Form.Group>React Hook Form
import { useForm, Controller } from "react-hook-form";
import { SpeechMicButton } from "bhashini-asr/react";
const { control, setValue, watch } = useForm({ defaultValues: { description: "" } });
const description = watch("description");
<Controller
name="description"
control={control}
render={({ field }) => (
<div>
<textarea {...field} rows={6} />
<SpeechMicButton
transcribeUrl="/api/v1/speech/transcribe"
onTranscript={(chunk) =>
setValue(
"description",
description ? `${description} ${chunk}` : chunk,
{ shouldDirty: true },
)
}
/>
</div>
)}
/>Mantine
import { Textarea } from "@mantine/core";
import { SpeechMicButton } from "bhashini-asr/react";
<Textarea
label="Describe the problem"
minRows={6}
value={text}
onChange={(e) => setText(e.currentTarget.value)}
rightSection={
<SpeechMicButton
transcribeUrl="/api/v1/speech/transcribe"
onTranscript={(chunk) =>
setText((p) => (p ? `${p} ${chunk}` : chunk))
}
/>
}
/>Formik (dedicated subpath)
Only this import path pulls formik into your bundle — every other
example above uses zero form-library code from the package.
import { Formik, Form, Field } from "formik";
import { FieldMic } from "bhashini-asr/react/formik";
<Formik initialValues={{ description: "" }} onSubmit={...}>
<Form>
<div style={{ display: "flex", justifyContent: "space-between" }}>
<label>Describe the problem</label>
<FieldMic
name="description"
transcribeUrl="/api/v1/speech/transcribe"
/>
</div>
<Field as="textarea" name="description" rows={6} />
</Form>
</Formik>Any other library
SpeechMicButton only needs an onTranscript(text: string) callback —
whatever form library you use, give it a function that appends the text
to your field and you're done. The same pattern works for Tanstack Form,
Final Form, Redux Form, Ant Design Form, useReducer, Zustand, MobX, etc.
If you need lower-level control (e.g. you're building your own widget
with a different shell), the raw hooks useSpeechRecognition and
useBhashiniAsr are exported too.
Quick start — Express (backend)
import express from "express";
import { rateLimit } from "express-rate-limit";
import { bhashiniSpeechRoute } from "bhashini-asr/server";
const app = express();
app.use(express.json({ limit: "18mb" })); // base64 14 MB + envelope
app.use(
"/api/v1/speech",
bhashiniSpeechRoute({
inferenceUrl: process.env.BHASHINI_INFERENCE_URL!,
authName: process.env.BHASHINI_INFERENCE_AUTH_NAME!,
authValue: process.env.BHASHINI_INFERENCE_AUTH_VALUE!,
serviceId: process.env.BHASHINI_SERVICE_ID, // optional
rateLimit: rateLimit({ windowMs: 60 * 60 * 1000, limit: 60 }),
}),
);
app.listen(3000);Mounts a single POST /api/v1/speech/transcribe with body validation, rate
limiting, audio-size caps, dev-stub fallback when inferenceUrl is empty,
and structured { data: { transcript, language, fallback }, message }
responses matching the NeGD API envelope.
Plug your own logger
import { logger } from "ts-commons"; // or pino, winston, bunyan…
bhashiniSpeechRoute({
inferenceUrl, authName, authValue,
logger, // anything with .info / .warn / .error
});Custom error envelope
bhashiniSpeechRoute({
inferenceUrl, authName, authValue,
formatError: (err) => ({
status: 400,
body: { success: false, error: String(err) }, // your shape
}),
});Use the adapter directly (no Express)
import { createBhashiniTranscriber } from "bhashini-asr/server";
const transcribe = createBhashiniTranscriber({
inferenceUrl: process.env.BHASHINI_INFERENCE_URL!,
authName: process.env.BHASHINI_INFERENCE_AUTH_NAME!,
authValue: process.env.BHASHINI_INFERENCE_AUTH_VALUE!,
});
// In a Fastify / NestJS / Lambda handler:
const result = await transcribe({
audioBase64,
language: "mr",
contentType: "audio/webm",
});Bhashini account setup
You need four environment values from the Bhashini ULCA Console. Most NeGD ministries are already empanelled — no new approval; just create an ASR pipeline.
| Env var | What it is |
|---|---|
| BHASHINI_INFERENCE_URL | The pipeline's "compute" endpoint URL. |
| BHASHINI_INFERENCE_AUTH_NAME | HTTP auth header name (usually Authorization). |
| BHASHINI_INFERENCE_AUTH_VALUE | The API key / bearer token. Secret — store in Vault. |
| BHASHINI_SERVICE_ID (optional) | Pin a specific model variant. |
See the BA brief in the NHAPOA repo for the ministry-handoff workflow.
Browser and engine matrix
| Browser | en, hi (Web Speech) | mr, ta, te, … (Bhashini) | |---|---|---| | Chrome desktop ≥ 90 | ✅ | ✅ | | Chrome Android ≥ 100 | ✅ | ✅ | | Edge desktop | ✅ | ✅ | | Safari macOS ≥ 14.1 | ✅ | ✅ | | Safari iOS ≥ 14.5 | ✅ | ✅ | | Firefox | ❌ (mic hidden) | ✅ |
The widget hides itself when neither engine is supported.
Deployment
- HTTPS required. Browsers block
getUserMediaon plain HTTP except onlocalhost. Permissions-Policy: microphone=(self)on every page that mounts a mic. Many default nginx configs ship withmicrophone=()(denied) — flip it.- CSP
connect-src 'self'is enough; the browser never connects to Bhashini directly. - Boot-time check — refuse to start in production with no inference URL:
if (process.env.NODE_ENV === "production" && !process.env.BHASHINI_INFERENCE_URL) { throw new Error("BHASHINI_INFERENCE_URL is required in production"); }
API reference (summary)
/react
| Export | Purpose | Form-library coupling |
|---|---|---|
| SpeechMicButton | Engine-agnostic mic + language picker widget. | None |
| MicTextarea | Controlled textarea with built-in mic. | None |
| useSpeechRecognition | Raw Web Speech API hook. | None |
| useBhashiniAsr | Raw MediaRecorder + backend-proxy hook. | None |
| bcp47ToUlca, defaultEngineFor | Helpers. | None |
| SUPPORTED_LANGUAGES | Tuple of 13 ULCA codes. | None |
/react/formik (separate subpath, optional)
| Export | Purpose |
|---|---|
| FieldMic | Formik shortcut. Requires formik peer dep. |
/server
| Export | Purpose |
|---|---|
| bhashiniSpeechRoute(config) | Express route factory. |
| createBhashiniTranscriber(config) | Raw adapter for non-Express runtimes. |
| transcribeBodySchema | zod schema for the request body. |
| BhashiniValidationError, BhashiniIntegrationError | Typed errors with stable codes. |
| SPEECH_ERROR_CODES | Stable error-code constants. |
| consoleLogger, type Logger | Default + interface. |
/core
Runtime-agnostic types and constants safe to import from either side.
Testing
npm test # vitest, server-side tests
npm run typecheck # tsc --noEmit
npm run build # tsup → dist/{core,react,server}Mocked HTTP via axios-mock-adapter. Coverage threshold 80%+ on
server/core. React hooks aren't covered yet — they exercise browser-only
APIs (SpeechRecognition, MediaRecorder) which need jsdom + heavy
mocking; contributions welcome.
License
MIT — see LICENSE.
Built for the National e-Governance Division (NeGD), Ministry of Electronics and Information Technology, Government of India.
