Skip to main content

Datasources

datasources

Purpose

The Datasources page provides centralized monitoring and control over all data ingestion pipelines feeding into the VesselAI platform.

It allows users to view configured data sources, check their execution status, monitor platform connectivity, and control ingestion jobs — all from a single interface.


User Interface Overview

The Datasources page is organized into three sections:

  • Header Bar — Page title with search filter and Refresh button
  • Connectivity Panel — Live health status of platform services
  • Datasource Jobs Table — Full list of configured sources with status and action controls

Header Bar

The top bar provides:

  • Search — Filter sources by ID, type, file, or status
  • Refresh — Reload all datasource states and connectivity checks

Connectivity Panel

A grid of service health indicators showing live status for:

ServiceDescription
SparkDistributed compute engine for ETL jobs
Spark HistoryHistorical job execution records
MinIOObject Storage backend
TrinoSQL query engine (Lakehouse)
Trino AuthTrino authentication layer
KeycloakIdentity and access management
RedpandaData streaming / event broker

Each service shows a badge: up (green), down (red), or n/a (gray).


Datasource Jobs Table

The main table displays all configured data sources with the following columns:

ColumnDescription
IDUnique identifier for the data source (clickable to open details)
TypeSource type (e.g., ais_csv_to_delta, open_meteo_poll)
EnabledWhether the source is active in the YAML configuration
ModeExecution mode: Spark (submitted as Spark jobs) or Worker (managed by ingestion worker)
FileYAML configuration file defining the source
StatusCurrent job status: Running, Finished, Submitted, Stopped, Failed, or Idle
UpdatedTimestamp of the last status change
ActionsControl buttons for the source

Actions

Each datasource row provides up to four action buttons:

Start

Submits the ingestion job for execution. Available for Spark-mode sources when the source is enabled.

Stop

Stops a running Spark job using its submission ID. Only available after a job has been started.

Status

Refreshes the execution status for Spark-mode sources. Requires an active submission ID.

Run Once

Triggers a single execution cycle. Currently supported for specific source types (e.g., open_meteo_poll).

Actions that are not available for a source type will show a tooltip explaining why they are disabled.


Source Details Drawer

Click on any source ID to open the details drawer, which displays:

  • ID — Source identifier
  • Type — Ingestion source type
  • YAML File — Configuration file location
  • Enabled — Whether the source is active
  • Job State — Current status, Spark submission ID, last update time, and any errors
  • Spec — Full YAML specification in read-only JSON format

This provides full visibility into the source configuration and execution state.


Source Types and Modes

Datasources operate in two execution modes:

Spark Mode

Sources that run as Spark jobs submitted through the platform's distributed compute infrastructure. These support Start, Stop, and Status actions.

Worker Mode

Sources managed by the ingestion worker process. These run automatically and do not support manual Start/Stop through the Spark submission API.


What Users Can Do

  • Monitor the health of all platform services via the connectivity panel
  • View all configured data sources and their current execution status
  • Start, stop, and check the status of Spark-based ingestion jobs
  • Trigger one-time data pulls for supported source types
  • Search and filter sources by ID, type, file, or status
  • Inspect full source configuration and job state in the details drawer
  • Identify and troubleshoot YAML parsing errors