Datasources

datasources

Purpose

The Datasources page provides centralized monitoring and control over all data ingestion pipelines feeding into the VesselAI platform.

It allows users to view configured data sources, check their execution status, monitor platform connectivity, and control ingestion jobs — all from a single interface.

User Interface Overview

The Datasources page is organized into three sections:

Header Bar — Page title with search filter and Refresh button
Connectivity Panel — Live health status of platform services
Datasource Jobs Table — Full list of configured sources with status and action controls

Header Bar

The top bar provides:

Search — Filter sources by ID, type, file, or status
Refresh — Reload all datasource states and connectivity checks

Connectivity Panel

A grid of service health indicators showing live status for:

Service	Description
Spark	Distributed compute engine for ETL jobs
Spark History	Historical job execution records
MinIO	Object Storage backend
Trino	SQL query engine (Lakehouse)
Trino Auth	Trino authentication layer
Keycloak	Identity and access management
Redpanda	Data streaming / event broker

Each service shows a badge: up (green), down (red), or n/a (gray).

Datasource Jobs Table

The main table displays all configured data sources with the following columns:

Column	Description
ID	Unique identifier for the data source (clickable to open details)
Type	Source type (e.g., `ais_csv_to_delta`, `open_meteo_poll`)
Enabled	Whether the source is active in the YAML configuration
Mode	Execution mode: Spark (submitted as Spark jobs) or Worker (managed by ingestion worker)
File	YAML configuration file defining the source
Status	Current job status: Running, Finished, Submitted, Stopped, Failed, or Idle
Updated	Timestamp of the last status change
Actions	Control buttons for the source

Actions

Each datasource row provides up to four action buttons:

Start

Submits the ingestion job for execution. Available for Spark-mode sources when the source is enabled.

Stop

Stops a running Spark job using its submission ID. Only available after a job has been started.

Status

Refreshes the execution status for Spark-mode sources. Requires an active submission ID.

Run Once

Triggers a single execution cycle. Currently supported for specific source types (e.g., open_meteo_poll).

Actions that are not available for a source type will show a tooltip explaining why they are disabled.

Source Details Drawer

Click on any source ID to open the details drawer, which displays:

ID — Source identifier
Type — Ingestion source type
YAML File — Configuration file location
Enabled — Whether the source is active
Job State — Current status, Spark submission ID, last update time, and any errors
Spec — Full YAML specification in read-only JSON format

This provides full visibility into the source configuration and execution state.

Source Types and Modes

Datasources operate in two execution modes:

Spark Mode

Sources that run as Spark jobs submitted through the platform's distributed compute infrastructure. These support Start, Stop, and Status actions.

Worker Mode

Sources managed by the ingestion worker process. These run automatically and do not support manual Start/Stop through the Spark submission API.

What Users Can Do

Monitor the health of all platform services via the connectivity panel
View all configured data sources and their current execution status
Start, stop, and check the status of Spark-based ingestion jobs
Trigger one-time data pulls for supported source types
Search and filter sources by ID, type, file, or status
Inspect full source configuration and job state in the details drawer
Identify and troubleshoot YAML parsing errors

Purpose​

User Interface Overview​

Header Bar​

Connectivity Panel​

Datasource Jobs Table​

Actions​

Start​

Stop​

Status​

Run Once​

Source Details Drawer​

Source Types and Modes​

Spark Mode​

Worker Mode​

What Users Can Do​