Self-hosted data platform

uproj

Turn raw data into decisions — on your own infrastructure, in days not quarters.

6Grain · Rwanda

The problem

You have the data — using it is the hard part

🛰️

Scattered

Every provider speaks a different API, format and login.

🐘

Heavy

Terabytes per area. Downloading to a laptop is a non-starter.

🧪

One-off scripts

Analysts' notebooks never become a repeatable, scheduled pipeline.

🔒

Sovereignty

Cloud SaaS means your data and results leave the country.

🧰

Glue work

80% of every project is infrastructure — catalog, tiling, storage, access.

Slow

Months from 'we have an idea' to 'it runs every night'.

The idea

One platform owns the boring 80%, so your analysts ship the valuable 20%.

What uproj is

A catalog, a runtime, and an API self-hosted

🗂️

Catalog

Every input & result as a catalog item — one searchable index.

⚙️

Runtime

Upload a Python script → it runs sandboxed, on a schedule, scalably.

🔌

API + UI

Open standards (STAC, OGC), map tiles, downloads, a web UI.

Runs on your own servers. Your data never leaves.

What you get
Days
from a new idea to a pipeline that runs every night — not months
Any
add any source or analytic in Python — it all lands in one catalog
On-prem
your data and results never leave your servers

The whole platform, one picture

🌐
Browser
the web UI
⌨️
CLI / SDK
operator scripts
Entry
🚪
Edge · single entry
one port, routes every path
Platform services
🖥️
Web UI
browse, run, watch
🔌
REST + OGC API
processes · jobs · runs
🗂️
Catalog (STAC)
inputs + results
🗺️
Map tiles
preview on a map
🔐
Auth
identity & access
Data & compute
📦
Object storage
scenes & results
⚙️
Sandboxed runtime
per-run containers
🐘
PostgreSQL
catalog index + state
For your analysts

An analyst writes a Python script. The platform makes it a scheduled, catalogued service — no DevOps.

01

What it does

One pipeline, end to end

Ingest
Pull data in
Mirror scenes from any source into your catalog.
Prepare
Compute
Derive indices, composites, masks — your analytics.
Publish
Back to the catalog
Results land as catalog items, ready to use.
Serve
Map · API · download
Open in a map, query the API, pull a file.
Schedule
Every night
The whole chain re-runs on cron, hands-off.
Reuse
One source of truth
Inputs and results, searchable, in one place.

Connect any data source

📡

Public archives

Open satellite & raster archives — pulled in on a schedule.

🏢

Commercial providers

Vendor APIs and feeds, behind your own credentials.

🗺️

Your own data

Drone, aerial, vector boundaries, anything with a connector.

A connector is just Python — new sources are a script, not a project.

The catalog — everything in one place

Inputs and results, searchable and paginated, with preview and download.

Three concepts, that's the whole model

📦

Process

An uploaded, versioned Python project. Immutable — addressed by version.

🗓️

Job

A saved config: which process, which params, an optional schedule.

▶️

Run

One execution — state, logs, and the result it produced.

Opens straight in QGIS

Results stream into QGIS over open standards — no export, no copy.
02

Architecture

Services behind one entry point

🌐
Browser
the web UI
⌨️
CLI / SDK
operator scripts
Entry
🚪
Edge · single entry
one port, routes every path
Platform services
🖥️
Web UI
browse, run, watch
🔌
REST + OGC API
processes · jobs · runs
🗂️
Catalog (STAC)
inputs + results
🗺️
Map tiles
preview on a map
🔐
Auth
identity & access
Data & compute
📦
Object storage
scenes & results
⚙️
Sandboxed runtime
per-run containers
🐘
PostgreSQL
catalog index + state

Each piece does one job well

🚪

Edge

One port, routes / · /api · catalog · tiles · auth · storage.

🔌

API

REST + OGC API Processes — submit work, read results.

🗂️

Catalog

STAC over PostgreSQL — inputs and results, one index.

📦

Storage

S3-compatible object store for every scene and result.

🗺️

Tiles

On-the-fly map tiles straight from the stored rasters.

🔐

Identity

OIDC sign-in, API tokens, role-based access.

Open standards, not lock-in

🌐

STAC

The catalog speaks STAC — works with QGIS and the whole ecosystem.

⚙️

OGC API

Processes & tiles follow OGC — any compliant client connects.

🗃️

Plain formats

Cloud-optimized GeoTIFF, JSON, S3 — nothing proprietary.

Your data stays portable — you are never trapped in our format.

Runs where you need it

🐳

Containerized

The whole stack is containers — one command to bring it up.

🔌

Air-gappable

No required calls home; runs fully on a private network.

🖥️

Your hardware

On the servers you already own, in your own data centre.

On-prem by design — sovereignty isn't an add-on, it's the default.

03

How a run works

From a click to a catalogued result

1 · Trigger
Button or cron
A run is created — by a person or on schedule.
2 · Admit
Scheduler
Quota & single-active checks; the run goes pending.
3 · Spawn
Sandbox
An isolated container with locked-down, scoped access.
4 · Execute
Your Python
Reads inputs, computes the result, writes output.
5 · Publish
Register
Output stored and registered as a catalog item.
6 · Done
In the catalog
Result, logs and stats — visible and reusable.

Every run is a sandbox

🔒

Locked-down

No privileges, non-root, capped CPU/RAM/processes, hard timeout.

🔑

Scoped access

Per-run credentials: read its inputs, write only its own output.

🌐

Network-isolated

Can't reach the database or auth; optional egress allow-list.

Hostile or buggy code can only affect its own run — nothing else.

Reliable by default

🗓️

Scheduling

Cron per job; the scheduler admits runs and respects quotas.

🚦

No pile-ups

Single-active guard — a slow run never stacks on itself.

📜

Full audit

Structured logs, per-run stats, state for every execution.

04

How analysts build scripts

The contract is tiny

🐍

Your Python

src/ with your code — any libraries you declare.

📄

pyproject.toml

Standard dependencies. The platform builds the environment.

📝

process.yaml

Declares parameters, types and defaults — the UI form is built from it.

▶️

entrypoint

One file to run. It reads params from the environment.

Zip it, upload it — that's a process. No Dockerfile, no infra.

process.yaml — declare your parameters

id: s2-ndvi
title: Sentinel-2 NDVI
category: preparation
entrypoint: src/main.py
inputs:
  - name: bbox
    type: bbox
    default: "28.8,-2.8,30.9,-1.0"
  - name: start_date
    type: string
    default: ""
  - name: max_cloud_cover
    type: integer
    default: 60
  • Types drive the UI form — bbox picker, date, number.
  • Defaults mean a job works with zero config.
  • category sets the default schedule & behaviour.
  • No params plumbing — the platform passes them in.

Categories drive behaviour

📥

ingest

Pulls external data into the catalog. Nightly by default.

🧮

preparation

Derives products from catalog data. Runs after ingest.

📊

compute

Analytics that emit results — on demand or scheduled.

Pick a category; sensible scheduling and publishing come for free.

The SDK is just environment variables

import os, json

params = json.loads(os.environ["UPROJ_PARAMS"])
bbox = params["bbox"]

# read inputs straight from object storage via GDAL
src = "/vsis3/inputs/s2-l2a/.../red.tif"

# where to write + how to call back
out_dir = os.environ["UPROJ_OUTPUT_DIR"]
api     = os.environ["UPROJ_API_URL"]
token   = os.environ["UPROJ_SERVICE_TOKEN"]
  • Params arrive as JSON — no parsing framework.
  • Read rasters in place from storage — no download.
  • A short-lived token authorizes catalog calls.
  • Plain Python — use whatever libraries you like.

Walkthrough — an ingest script

# 1. search the source for new scenes in the window
scenes = search(bbox, since=window_start)

# 2. mirror each asset into the catalog's storage
for s in scenes:
    for asset in s.assets:
        copy_to_storage(asset.url, key=object_key(s, asset))

    # 3. register the scene as a catalog item
    register_item(stac_item(s))
  • Search → mirror → register. That's ingest.
  • Already-mirrored scenes are skipped — re-runs are cheap.
  • Set a start date once to backfill; then it rolls forward nightly.

Walkthrough — a preprocessing script

# read bands in place, compute the index
red = read("/vsis3/inputs/s2-l2a/.../red.tif")
nir = read("/vsis3/inputs/s2-l2a/.../nir.tif")
ndvi = (nir - red) / (nir + red)

# write a cloud-optimized GeoTIFF + register it
write_cog(ndvi, f"{out_dir}/ndvi.tif")
register_item(stac_item("s2-ndvi", scene_id))
  • Reads parents from storage, writes a COG.
  • Publishes back into the catalog as a new product.
  • Same shape for any index, mask or composite you dream up.

Upload — it auto-versions

Each upload is a new immutable version; nothing in production breaks.

Wire a job, set a schedule

Pick the process, fill the form (defaults prefilled), add a cron — done.
05

The platform, hands on

Catalog — browse every input

Inputs and results in one searchable, paginated browser.

Map & analysis

Visualize any scene or result on an interactive map.

Jobs — saved configs + schedule

Each job: a process, its params, an optional cron, a run button.

Runs — every execution tracked

State, logs and the catalogued result for every run.

Secrets — encrypted, per-owner

Credentials stored encrypted, bound to a process input, never echoed back.

Operations dashboard

Containers, disk and the last runs — at a glance for the operator.

Settings — live, no restart

Tune limits, access and categories from the UI; changes apply immediately.

Users & access

Identity backed by OIDC — admins and users, real role-based access.
06

Operations & trust

Security in layers

🧱

Sandboxed runs

Locked-down containers — no privileges, capped, time-limited.

🔑

Scoped storage

Each run reaches only its own data, nothing else.

🌐

Network isolation

Runs can't touch the database or auth; egress is optional & allow-listed.

👤

Identity & RBAC

OIDC sign-in, API tokens, admin vs user — real access control.

Operate with confidence

📊

Observability

Structured logs, per-run stats, an operator dashboard.

💾

Backups

One-command database dump and a storage inventory.

⬆️

Install & update

Install and update each run as a single command — no manual steps.

07

Why uproj

Not a notebook. Not a cloud bill.

📓

vs. notebooks

Repeatable, scheduled, catalogued, multi-user — not a one-off script.

☁️

vs. cloud SaaS

On-prem and sovereign — no per-scene egress fees, your hardware.

🧩

Extensible

Any new source or analytic is just Python — same simple contract.

Where it goes next

🛡️

Tighter egress

Per-process declared egress allow-lists.

📈

Observability stack

Optional metrics + dashboards for larger deployments.

GPU compute

Heavier analytics on accelerated runtimes.

Let's build it

Your data, your servers, your analysts — in production next week.

6Grain · uproj.6grain.com