OpenMetadata/docker/docker-compose-quickstart/docker-compose-fuseki-standalone.yml
Sriharsha Chintalapani 0ab31cb647
fix(rdf): reclaim Fuseki disk via compaction + upgrade Jena 4.10 → 5.6.0 (#28242)
* fix(rdf): reclaim Fuseki disk via compaction + upgrade Jena 4.10 → 5.6.0

PR #28117's SPARQL cleanup converged the logical RDF state but never freed
disk: TDB2 deletes only mark blocks free and the journal grows monotonically
until /$/compact runs. RdfIndexApp.clearRdfData() now calls a new
RdfStorageInterface.compactStorage() between clearAll() and reloadOntologies()
so each recreate run reclaims to a fresh dataset directory. JenaFusekiStorage
posts to /$/compact/{dataset}?deleteOld=true and polls /$/tasks/{id} until
finished, with failures logged and swallowed (disk hygiene, not correctness).

Also unifies the Jena classpath: openmetadata-service was on 4.10.0 and
openmetadata-integration-tests on 5.0.0. Both now pin to 5.6.0 via a single
root pom property, dropping the apache-jena-libs BOM in favour of explicit
jena-core/arq/rdfconnection deps (we're a remote-Fuseki client and never
embed TDB; pulling jena-tdb1/2 triggers a Jena 5/6 static-init regression).
Picks up CVE-2025-49656 and CVE-2025-50151 (admin-side fixes shipped in
Jena 5.5.0). Jena 6.x parked: both 6.0.0 and 6.1.0 hit a recursive clinit
bug where TypeMapper.reset reads RDF.dtLangString before RDF.<clinit>
completes.

Fuseki server bumped 4.10/5.0 → 5.6.0 across all in-repo Dockerfiles; the
unmaintained stain/jena-fuseki:* image references in dev compose files
switched to building from docker/rdf-store/Dockerfile, and Testcontainers
moved to secoresearch/fuseki:5.5.0 (maintained, CVE-fixed; the dataset is
created by JenaFusekiStorage.ensureDatasetExists() so the stain-only
FUSEKI_DATASET_1 env var is no longer needed).
2026-05-18 23:08:46 -07:00

49 lines
No EOL
1.6 KiB
YAML

# Standalone Apache Jena Fuseki for RDF/Knowledge Graph storage
services:
fuseki:
# Build from the in-repo Dockerfile (Fuseki 5.6.0). See
# ../development/docker-compose-fuseki.yml for the full rationale.
build:
context: ../rdf-store
dockerfile: Dockerfile
image: openmetadata-fuseki:5.6.0
container_name: fuseki-standalone
hostname: fuseki
ports:
- "3030:3030"
environment:
# Local-dev default — production deployments MUST override via
# FUSEKI_ADMIN_PASSWORD / FUSEKI_OPENMETADATA_PASSWORD env vars.
- FUSEKI_ADMIN_PASSWORD=${FUSEKI_ADMIN_PASSWORD:-admin}
- FUSEKI_OPENMETADATA_PASSWORD=${FUSEKI_OPENMETADATA_PASSWORD:-openmetadata-secret}
- JVM_ARGS=-Xmx4g -Xms2g
volumes:
# See docker-compose-fuseki-rosetta.yml — host bind path renamed so
# existing directories with the old stain layout aren't silently
# mounted at the new /fuseki-data path.
- ${DOCKER_VOLUMES_PATH:-./docker-volumes}/fuseki-tdb2-data:/fuseki-data
networks:
- fuseki-net
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:3030/$/ping"]
interval: 15s
timeout: 10s
retries: 5
start_period: 30s
deploy:
resources:
limits:
memory: 4G
reservations:
memory: 2G
networks:
fuseki-net:
driver: bridge
# Usage:
# 1. Create the volume directory: mkdir -p fuseki-volume
# 2. Start: docker-compose -f docker-compose-fuseki-standalone.yml up -d
# 3. Access Fuseki UI: http://localhost:3030
# 4. Login: admin/admin
# 5. SPARQL endpoint: http://localhost:3030/openmetadata/sparql