KolWrite API Documentation

Authentication

All API requests must be authenticated using your unique API key. Include it in the request headers:

x-api-key: YOUR_API_KEY

You can generate and manage your API keys in the KolWrite Console after creating an account.

Quickstart

Get started quickly with these basic examples.

1. Start Transcription

Send a POST request to the `/transcribe` endpoint with the URL of the audio file you want to transcribe.

Request:

POST https://app.kolwrite.com/transcribe
Content-Type: application/json
x-api-key: YOUR_API_KEY

{
  "audio_url": "https://salford.figshare.com/ndownloader/files/14630270"
}

Response:

The API returns a `jobId` which you'll use to retrieve the transcription results.

{
  "jobId": "34c63d45-4cb5-473b-bc60-25c175e69c57"
}

2. Get Transcription Result

Send a GET request to the `/transcribe/{jobId}` endpoint using the `jobId` received from the previous step.

Request:

GET https://app.kolwrite.com/transcribe/34c63d45-4cb5-473b-bc60-25c175e69c57
x-api-key: YOUR_API_KEY

Response (Example):

Once the job status is COMPLETED, the response will include the transcription details.

{
  "jobId": "34c63d45-4cb5-473b-bc60-25c175e69c57",
  "status": "COMPLETED",
  "result": {
    "language": "en",
    "segments": [
      {
        "start": 0.5,
        "end": 3.2,
        "text": "Hello everyone, welcome to today's meeting.",
        "speaker": "Speaker 1",
        "words": [
          {
            "start": 0.5,
            "end": 0.8,
            "word": "Hello"
          },
          {
            "start": 0.9,
            "end": 1.4,
            "word": "everyone"
          },
          {
            "start": 1.5,
            "end": 1.9,
            "word": "welcome"
          }
        ]
      }
    ]
  }
}

API Endpoints

POST /transcribe

Submits an audio file for asynchronous transcription with optional speaker diarization.

Request Body Parameters

Parameter	Type	Required	Description
audio_url	string	Required	The publicly accessible URL of the audio or video file to transcribe.
language	string	Optional	Language code hint (e.g., "he", "en", "es"). If not provided, the language will be automatically detected. Default: "auto"
word_timestamps	boolean	Optional	Whether to include word-level timestamps in the output JSON or SRT. Has no effect on TXT output. Default: true
model	string	Optional	Specifies which transcription model to use: • `core`: Faster, suitable for real-time use cases • `edge`: More accurate, slightly slower Default: "core"
output_format	string	Optional	The desired format for the transcription output: • `json`: Detailed output with segments and timestamps • `txt`: Plain text transcription • `srt`: SubRip subtitle format Default: "json"
include_diarization	boolean	Optional	Whether to enable speaker diarization (identifying different speakers in the audio). When enabled, segments will include speaker labels. Default: false
num_speakers	integer	Optional	The expected number of speakers in the audio. Only used when `include_diarization` is true. If not specified, the system will automatically detect the number of speakers. Default: auto

Example Request (with diarization):

{
  "audio_url": "https://salford.figshare.com/ndownloader/files/14630270",
  "word_timestamps": true,
  "model": "edge",
  "output_format": "json",
  "include_diarization": true,
  "num_speakers": 2
}

Response:

Returns an object containing the `jobId` for the submitted transcription task.

{
  "jobId": "34c63d45-4cb5-473b-bc60-25c175e69c57"
}

GET /transcribe/{jobId}

Retrieves the status and result of a specific transcription job.

Path Parameters

Parameter	Type	Required	Description
jobId	string	Required	The ID of the transcription job, obtained from the POST /transcribe response.

Response Body

Field	Type	Description
jobId	string	The requested transcription job ID.
status	string	Current job status: PENDING, IN_PROGRESS, COMPLETED, or FAILED
result	object \| string	The transcription output (present only when status is COMPLETED). Structure depends on the requested output_format.
error	string	Error message (present only when status is FAILED).

Example Response (JSON Output with Speaker Diarization):

{
  "jobId": "34c63d45-4cb5-473b-bc60-25c175e69c57",
  "status": "COMPLETED",
  "result": {
    "language": "en",
    "segments": [
      {
        "start": 0.5,
        "end": 3.2,
        "text": "Hello everyone, welcome to today's meeting.",
        "speaker": "Speaker 1",
        "words": [
          {
            "start": 0.5,
            "end": 0.8,
            "word": "Hello"
          },
          {
            "start": 0.9,
            "end": 1.4,
            "word": "everyone"
          }
        ]
      },
      {
        "start": 4.1,
        "end": 6.8,
        "text": "Thank you. Let's begin with the quarterly review.",
        "speaker": "Speaker 2",
        "words": [
          {
            "start": 4.1,
            "end": 4.5,
            "word": "Thank"
          },
          {
            "start": 4.6,
            "end": 4.9,
            "word": "you"
          }
        ]
      }
    ]
  }
}

Speaker Diarization

Speaker diarization is the process of identifying and separating different speakers in an audio recording. When enabled, KolWrite will analyze the audio to detect speaker changes and label each segment accordingly.

How to Enable Diarization

To enable speaker diarization, set the include_diarization parameter to true in your transcription request:

{
  "audio_url": "https://example.com/meeting-audio.mp3",
  "include_diarization": true,
  "num_speakers": 3
}

Speaker Count Optimization

For best results, specify the expected number of speakers using the num_speakers parameter. If you don't know the exact number, you can:

Omit the parameter (system will auto-detect)
Provide an estimated range by setting it to your best guess

Output Format with Diarization

When diarization is enabled, each segment in the JSON output will include a speaker field identifying the speaker (e.g., "Speaker 1", "Speaker 2", etc.).

Output Formats

JSON Format

Detailed output with segments, timestamps, and optional word-level data.

TXT Format

Plain text transcription with speaker labels (when diarization is enabled).

SRT Format

SubRip subtitle format with timestamps, compatible with video players and subtitle editors.

Status Codes

Status	Description
PENDING	The job is waiting to be processed in the queue.
IN_PROGRESS	The job is actively being processed by our transcription service.
COMPLETED	The job finished successfully. Results are available in the response.
FAILED	The job failed to process. Check the error message for details.

Table of Contents