bhashini-asr

v0.1.0

Published

13 days ago

Indian-language Speech-to-Text widget + backend proxy for the Bhashini ULCA ASR service. React hooks/components for the browser, Express route + adapter for Node — one package, two subpath exports.

Downloads

138

0High
0Medium
0Low

vijay-javascript

speech-to-text stt asr bhashini ulca indian-languages hindi marathi tamil telugu kannada voice-input react express negd meity

bhashini-asr

Drop-in Speech-to-Text for Indian-language web apps. Browser-native for English and Hindi, Bhashini ULCA for the other major Indian languages — one npm package, two subpath exports.

What you get

A React widget you drop next to any text field — citizen taps a mic, speaks in their language, recognised text lands in the field.
An Express route factory that proxies the Bhashini call, so the API secret never reaches the browser.
11 Indian languages out of the box: English, Hindi, Marathi, Tamil, Telugu, Kannada, Malayalam, Bengali, Gujarati, Punjabi, Odia.
Mobile Chrome quirks already handled: per-session result-index dedup, auto-restart on silent onend, 3-strike back-off on wedged engines.
Dev-stub mode — build the wiring with no Bhashini key, validate against real ASR later.

┌─────────────────┐               ┌──────────────────┐            ┌──────────────────┐
│ React app       │  /transcribe  │ Your Express BE  │  compute   │ Bhashini ULCA    │
│ @../react       │ ─────────────►│ @../server       │ ─────────► │ ASR pipeline     │
│ MediaRecorder   │ ◄──────────── │ axios + zod      │ ◄────────  │                  │
└─────────────────┘  transcript   └──────────────────┘            └──────────────────┘

Install

npm install bhashini-asr

Peer deps (install whichever your app uses):

| If you use | Install | |---|---| | React widget (any form library or none) | react@>=18 react-dom@>=18 | | Formik shortcut (/react/formik subpath) | formik@>=2.4 | | Express route | express@>=4.18 express-rate-limit@>=7 |

You are not locked into Formik. The Formik integration lives on a dedicated subpath (bhashini-asr/react/formik). If you don't import from that subpath, Formik is never pulled into your bundle — works the same whether you use Material UI, Bootstrap, react-hook-form, Mantine, plain controlled state, or nothing at all. See the Framework integrations section below for examples.

Quick start — React (frontend)

The widget needs to know where to POST recordings for non-English/Hindi languages. Pass either a URL (uses fetch internally) or your own transcribe function (custom HTTP client with auth headers, RTK Query, etc.).

import { SpeechMicButton } from "bhashini-asr/react";

function GrievanceForm() {
  const [description, setDescription] = useState("");

  return (
    <div>
      <label>Describe the problem</label>
      <textarea
        value={description}
        onChange={(e) => setDescription(e.target.value)}
      />
      <SpeechMicButton
        transcribeUrl="/api/v1/speech/transcribe"
        onTranscript={(chunk) => setDescription((prev) =>
          prev ? `${prev} ${chunk}` : chunk
        )}
      />
    </div>
  );
}

Textarea-with-mic shortcut

The simplest drop-in — works with any form library or plain controlled state:

import { MicTextarea } from "bhashini-asr/react";

<MicTextarea
  value={remarks}
  onChange={setRemarks}
  placeholder="Officer remarks…"
  micProps={{ transcribeUrl: "/api/v1/speech/transcribe" }}
/>

Custom transcribe function (e.g. authenticated officer flow)

import { SpeechMicButton } from "bhashini-asr/react";
import { useTranscribeMutation } from "@/api/speechApi"; // your RTK Query slice

function OfficerNote() {
  const [transcribeMutation] = useTranscribeMutation();

  return (
    <SpeechMicButton
      onTranscript={(text) => /* append to your state */}
      transcribe={async (audioBase64, language, contentType, samplingRate) => {
        const res = await transcribeMutation({
          audio_base64: audioBase64, language,
          content_type: contentType, sampling_rate: samplingRate,
        }).unwrap();
        return res; // { transcript, language, fallback }
      }}
    />
  );
}

Restrict the language picker

<SpeechMicButton
  transcribeUrl="/api/v1/speech/transcribe"
  onTranscript={...}
  languages={[
    { code: "en-IN", label: "English" },
    { code: "hi-IN", label: "हिंदी" },
    { code: "mr-IN", label: "मराठी" },
  ]}
/>

Framework integrations

The widget never assumes a form library. Pick the snippet that matches your stack — anywhere you have a callback that can append text to a field, SpeechMicButton plugs in.

Plain React (`useState`)

import { SpeechMicButton } from "bhashini-asr/react";

const [text, setText] = useState("");
<SpeechMicButton
  transcribeUrl="/api/v1/speech/transcribe"
  onTranscript={(chunk) =>
    setText((prev) => (prev ? `${prev} ${chunk}` : chunk))
  }
/>

Material UI

import { TextField, InputAdornment } from "@mui/material";
import { SpeechMicButton } from "bhashini-asr/react";

<TextField
  label="Describe the problem"
  multiline
  rows={6}
  fullWidth
  value={text}
  onChange={(e) => setText(e.target.value)}
  InputProps={{
    endAdornment: (
      <InputAdornment position="end">
        <SpeechMicButton
          transcribeUrl="/api/v1/speech/transcribe"
          onTranscript={(chunk) =>
            setText((p) => (p ? `${p} ${chunk}` : chunk))
          }
        />
      </InputAdornment>
    ),
  }}
/>

React Bootstrap

import { Form, InputGroup } from "react-bootstrap";
import { SpeechMicButton } from "bhashini-asr/react";

<Form.Group>
  <Form.Label>Describe the problem</Form.Label>
  <InputGroup>
    <Form.Control
      as="textarea"
      rows={6}
      value={text}
      onChange={(e) => setText(e.target.value)}
    />
    <InputGroup.Text>
      <SpeechMicButton
        transcribeUrl="/api/v1/speech/transcribe"
        onTranscript={(chunk) =>
          setText((p) => (p ? `${p} ${chunk}` : chunk))
        }
      />
    </InputGroup.Text>
  </InputGroup>
</Form.Group>

React Hook Form

import { useForm, Controller } from "react-hook-form";
import { SpeechMicButton } from "bhashini-asr/react";

const { control, setValue, watch } = useForm({ defaultValues: { description: "" } });
const description = watch("description");

<Controller
  name="description"
  control={control}
  render={({ field }) => (
    <div>
      <textarea {...field} rows={6} />
      <SpeechMicButton
        transcribeUrl="/api/v1/speech/transcribe"
        onTranscript={(chunk) =>
          setValue(
            "description",
            description ? `${description} ${chunk}` : chunk,
            { shouldDirty: true },
          )
        }
      />
    </div>
  )}
/>

Mantine

import { Textarea } from "@mantine/core";
import { SpeechMicButton } from "bhashini-asr/react";

<Textarea
  label="Describe the problem"
  minRows={6}
  value={text}
  onChange={(e) => setText(e.currentTarget.value)}
  rightSection={
    <SpeechMicButton
      transcribeUrl="/api/v1/speech/transcribe"
      onTranscript={(chunk) =>
        setText((p) => (p ? `${p} ${chunk}` : chunk))
      }
    />
  }
/>

Formik (dedicated subpath)

Only this import path pulls formik into your bundle — every other example above uses zero form-library code from the package.

import { Formik, Form, Field } from "formik";
import { FieldMic } from "bhashini-asr/react/formik";

<Formik initialValues={{ description: "" }} onSubmit={...}>
  <Form>
    <div style={{ display: "flex", justifyContent: "space-between" }}>
      <label>Describe the problem</label>
      <FieldMic
        name="description"
        transcribeUrl="/api/v1/speech/transcribe"
      />
    </div>
    <Field as="textarea" name="description" rows={6} />
  </Form>
</Formik>

Any other library

SpeechMicButton only needs an onTranscript(text: string) callback — whatever form library you use, give it a function that appends the text to your field and you're done. The same pattern works for Tanstack Form, Final Form, Redux Form, Ant Design Form, useReducer, Zustand, MobX, etc.

If you need lower-level control (e.g. you're building your own widget with a different shell), the raw hooks useSpeechRecognition and useBhashiniAsr are exported too.

Quick start — Express (backend)

import express from "express";
import { rateLimit } from "express-rate-limit";
import { bhashiniSpeechRoute } from "bhashini-asr/server";

const app = express();
app.use(express.json({ limit: "18mb" })); // base64 14 MB + envelope

app.use(
  "/api/v1/speech",
  bhashiniSpeechRoute({
    inferenceUrl: process.env.BHASHINI_INFERENCE_URL!,
    authName:     process.env.BHASHINI_INFERENCE_AUTH_NAME!,
    authValue:    process.env.BHASHINI_INFERENCE_AUTH_VALUE!,
    serviceId:    process.env.BHASHINI_SERVICE_ID,          // optional
    rateLimit:    rateLimit({ windowMs: 60 * 60 * 1000, limit: 60 }),
  }),
);

app.listen(3000);

Mounts a single POST /api/v1/speech/transcribe with body validation, rate limiting, audio-size caps, dev-stub fallback when inferenceUrl is empty, and structured { data: { transcript, language, fallback }, message } responses matching the NeGD API envelope.

Plug your own logger

import { logger } from "ts-commons"; // or pino, winston, bunyan…

bhashiniSpeechRoute({
  inferenceUrl, authName, authValue,
  logger, // anything with .info / .warn / .error
});

Custom error envelope

bhashiniSpeechRoute({
  inferenceUrl, authName, authValue,
  formatError: (err) => ({
    status: 400,
    body: { success: false, error: String(err) }, // your shape
  }),
});

Use the adapter directly (no Express)

import { createBhashiniTranscriber } from "bhashini-asr/server";

const transcribe = createBhashiniTranscriber({
  inferenceUrl: process.env.BHASHINI_INFERENCE_URL!,
  authName: process.env.BHASHINI_INFERENCE_AUTH_NAME!,
  authValue: process.env.BHASHINI_INFERENCE_AUTH_VALUE!,
});

// In a Fastify / NestJS / Lambda handler:
const result = await transcribe({
  audioBase64,
  language: "mr",
  contentType: "audio/webm",
});

Bhashini account setup

You need four environment values from the Bhashini ULCA Console. Most NeGD ministries are already empanelled — no new approval; just create an ASR pipeline.

| Env var | What it is | |---|---| | BHASHINI_INFERENCE_URL | The pipeline's "compute" endpoint URL. | | BHASHINI_INFERENCE_AUTH_NAME | HTTP auth header name (usually Authorization). | | BHASHINI_INFERENCE_AUTH_VALUE | The API key / bearer token. Secret — store in Vault. | | BHASHINI_SERVICE_ID (optional) | Pin a specific model variant. |

See the BA brief in the NHAPOA repo for the ministry-handoff workflow.

Browser and engine matrix

| Browser | en, hi (Web Speech) | mr, ta, te, … (Bhashini) | |---|---|---| | Chrome desktop ≥ 90 | ✅ | ✅ | | Chrome Android ≥ 100 | ✅ | ✅ | | Edge desktop | ✅ | ✅ | | Safari macOS ≥ 14.1 | ✅ | ✅ | | Safari iOS ≥ 14.5 | ✅ | ✅ | | Firefox | ❌ (mic hidden) | ✅ |

The widget hides itself when neither engine is supported.

Deployment

HTTPS required. Browsers block getUserMedia on plain HTTP except on localhost.
Permissions-Policy: microphone=(self) on every page that mounts a mic. Many default nginx configs ship with microphone=() (denied) — flip it.
CSP connect-src 'self' is enough; the browser never connects to Bhashini directly.

Boot-time check — refuse to start in production with no inference URL:

if (process.env.NODE_ENV === "production" && !process.env.BHASHINI_INFERENCE_URL) {
  throw new Error("BHASHINI_INFERENCE_URL is required in production");
}

API reference (summary)

`/react`

| Export | Purpose | Form-library coupling | |---|---|---| | SpeechMicButton | Engine-agnostic mic + language picker widget. | None | | MicTextarea | Controlled textarea with built-in mic. | None | | useSpeechRecognition | Raw Web Speech API hook. | None | | useBhashiniAsr | Raw MediaRecorder + backend-proxy hook. | None | | bcp47ToUlca, defaultEngineFor | Helpers. | None | | SUPPORTED_LANGUAGES | Tuple of 13 ULCA codes. | None |

`/react/formik` (separate subpath, optional)

| Export | Purpose | |---|---| | FieldMic | Formik shortcut. Requires formik peer dep. |

`/server`

| Export | Purpose | |---|---| | bhashiniSpeechRoute(config) | Express route factory. | | createBhashiniTranscriber(config) | Raw adapter for non-Express runtimes. | | transcribeBodySchema | zod schema for the request body. | | BhashiniValidationError, BhashiniIntegrationError | Typed errors with stable codes. | | SPEECH_ERROR_CODES | Stable error-code constants. | | consoleLogger, type Logger | Default + interface. |

`/core`

Runtime-agnostic types and constants safe to import from either side.

Testing

npm test           # vitest, server-side tests
npm run typecheck  # tsc --noEmit
npm run build      # tsup → dist/{core,react,server}

Mocked HTTP via axios-mock-adapter. Coverage threshold 80%+ on server/core. React hooks aren't covered yet — they exercise browser-only APIs (SpeechRecognition, MediaRecorder) which need jsdom + heavy mocking; contributions welcome.

License

MIT — see LICENSE.

Built for the National e-Governance Division (NeGD), Ministry of Electronics and Information Technology, Government of India.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

bhashini-asr

What you get

Install

Quick start — React (frontend)

Textarea-with-mic shortcut

Custom transcribe function (e.g. authenticated officer flow)

Restrict the language picker

Framework integrations

Plain React (useState)

Material UI

React Bootstrap

React Hook Form

Mantine

Formik (dedicated subpath)

Any other library

Quick start — Express (backend)

Plug your own logger

Custom error envelope

Use the adapter directly (no Express)

Bhashini account setup

Browser and engine matrix

Deployment

API reference (summary)

/react

/react/formik (separate subpath, optional)

/server

/core

Testing

License

Plain React (`useState`)

`/react`

`/react/formik` (separate subpath, optional)

`/server`

`/core`