KolWrite API Documentation

Welcome to the KolWrite API documentation. Here you'll find everything you need to integrate our powerful transcription and speaker diarization services into your applications.

Authentication

All API requests must be authenticated using your unique API key. Include it in the request headers:

x-api-key: YOUR_API_KEY

You can generate and manage your API keys in the KolWrite Console after creating an account.

Quickstart

Get started quickly with these basic examples.

1. Start Transcription

Send a POST request to the `/transcribe` endpoint with the URL of the audio file you want to transcribe.

Request:

POST https://app.kolwrite.com/transcribe
Content-Type: application/json
x-api-key: YOUR_API_KEY

{
  "audio_url": "https://salford.figshare.com/ndownloader/files/14630270"
}

Response:

The API returns a `jobId` which you'll use to retrieve the transcription results.

{
  "jobId": "34c63d45-4cb5-473b-bc60-25c175e69c57"
}

2. Get Transcription Result

Send a GET request to the `/transcribe/{jobId}` endpoint using the `jobId` received from the previous step.

Request:

GET https://app.kolwrite.com/transcribe/34c63d45-4cb5-473b-bc60-25c175e69c57
x-api-key: YOUR_API_KEY

Response (Example):

Once the job status is COMPLETED, the response will include the transcription details.

{
  "jobId": "34c63d45-4cb5-473b-bc60-25c175e69c57",
  "status": "COMPLETED",
  "result": {
    "language": "en",
    "segments": [
      {
        "start": 0.5,
        "end": 3.2,
        "text": "Hello everyone, welcome to today's meeting.",
        "speaker": "Speaker 1",
        "words": [
          {
            "start": 0.5,
            "end": 0.8,
            "word": "Hello"
          },
          {
            "start": 0.9,
            "end": 1.4,
            "word": "everyone"
          },
          {
            "start": 1.5,
            "end": 1.9,
            "word": "welcome"
          }
        ]
      }
    ]
  }
}

API Endpoints

POST /transcribe

Submits an audio file for asynchronous transcription with optional speaker diarization.

Request Body Parameters

Parameter Type Required Description
audio_url string Required The publicly accessible URL of the audio or video file to transcribe.
language string Optional Language code hint (e.g., "he", "en", "es"). If not provided, the language will be automatically detected.
Default: "auto"
word_timestamps boolean Optional Whether to include word-level timestamps in the output JSON or SRT. Has no effect on TXT output.
Default: true
model string Optional Specifies which transcription model to use:
core: Faster, suitable for real-time use cases
edge: More accurate, slightly slower
Default: "core"
output_format string Optional The desired format for the transcription output:
json: Detailed output with segments and timestamps
txt: Plain text transcription
srt: SubRip subtitle format
Default: "json"
include_diarization boolean Optional Whether to enable speaker diarization (identifying different speakers in the audio). When enabled, segments will include speaker labels.
Default: false
num_speakers integer Optional The expected number of speakers in the audio. Only used when include_diarization is true. If not specified, the system will automatically detect the number of speakers.
Default: auto

Example Request (with diarization):

{
  "audio_url": "https://salford.figshare.com/ndownloader/files/14630270",
  "word_timestamps": true,
  "model": "edge",
  "output_format": "json",
  "include_diarization": true,
  "num_speakers": 2
}

Response:

Returns an object containing the `jobId` for the submitted transcription task.

{
  "jobId": "34c63d45-4cb5-473b-bc60-25c175e69c57"
}

GET /transcribe/{jobId}

Retrieves the status and result of a specific transcription job.

Path Parameters

Parameter Type Required Description
jobId string Required The ID of the transcription job, obtained from the POST /transcribe response.

Response Body

Field Type Description
jobId string The requested transcription job ID.
status string Current job status: PENDING, IN_PROGRESS, COMPLETED, or FAILED
result object | string The transcription output (present only when status is COMPLETED). Structure depends on the requested output_format.
error string Error message (present only when status is FAILED).

Example Response (JSON Output with Speaker Diarization):

{
  "jobId": "34c63d45-4cb5-473b-bc60-25c175e69c57",
  "status": "COMPLETED",
  "result": {
    "language": "en",
    "segments": [
      {
        "start": 0.5,
        "end": 3.2,
        "text": "Hello everyone, welcome to today's meeting.",
        "speaker": "Speaker 1",
        "words": [
          {
            "start": 0.5,
            "end": 0.8,
            "word": "Hello"
          },
          {
            "start": 0.9,
            "end": 1.4,
            "word": "everyone"
          }
        ]
      },
      {
        "start": 4.1,
        "end": 6.8,
        "text": "Thank you. Let's begin with the quarterly review.",
        "speaker": "Speaker 2",
        "words": [
          {
            "start": 4.1,
            "end": 4.5,
            "word": "Thank"
          },
          {
            "start": 4.6,
            "end": 4.9,
            "word": "you"
          }
        ]
      }
    ]
  }
}

Speaker Diarization

Speaker diarization is the process of identifying and separating different speakers in an audio recording. When enabled, KolWrite will analyze the audio to detect speaker changes and label each segment accordingly.

How to Enable Diarization

To enable speaker diarization, set the include_diarization parameter to true in your transcription request:

{
  "audio_url": "https://example.com/meeting-audio.mp3",
  "include_diarization": true,
  "num_speakers": 3
}

Speaker Count Optimization

For best results, specify the expected number of speakers using the num_speakers parameter. If you don't know the exact number, you can:

Output Format with Diarization

When diarization is enabled, each segment in the JSON output will include a speaker field identifying the speaker (e.g., "Speaker 1", "Speaker 2", etc.).

Output Formats

JSON Format

Detailed output with segments, timestamps, and optional word-level data.

TXT Format

Plain text transcription with speaker labels (when diarization is enabled).

SRT Format

SubRip subtitle format with timestamps, compatible with video players and subtitle editors.

Status Codes

Status Description
PENDING The job is waiting to be processed in the queue.
IN_PROGRESS The job is actively being processed by our transcription service.
COMPLETED The job finished successfully. Results are available in the response.
FAILED The job failed to process. Check the error message for details.