Data governance & data quality

AI for Kuala Lumpur — Governance

This page explains how governance is applied to the platform: who is accountable, how data quality is controlled, how live and warehouse layers are separated, how regulatory constraints are addressed, and how the AI copilot stays grounded on governed data.

Maturity dimensions

Governance pillars

1. Strategy & governance: policy, committee, roles, roadmap, KPIs.

2. Data quality: rules, metrics, profiling, anomaly remediation.

3. Architecture & security: catalog, classification, RBAC, encryption, MDM logic.

4. RGPD & regulatory compliance: register, legal basis, rights, DPA, AIPD, breach response.

5. Metadata & catalog: glossary, technical metadata, lineage.

6. Culture & organization: training, incident review, regulatory watch.

Roles, responsibilities, accountability

Operating model

The governance model of this platform is built around clear accountability: Product defines operational objectives, engineering owns pipelines and serving, governance defines policy and controls, and data roles own critical domains.

Core target roles are aligned with mature governance practice: Data Owner for business accountability, Data Steward for quality rules and metadata, Data Custodian for technical custody, DPO for privacy compliance, and RSSI / security function for protection controls.

The target decision body is a Data Governance Council / Comité Data with business and IT representation, exactly the kind of structure expected in a DAMA-aligned operating model.

What an auditor should find

Audit readiness

An auditor should find a formal policy, named roles, a roadmap, traceable controls, evidence of risk prioritization, and a measurable maturity path.

For this project, the most visible audit evidence should be: warehouse refresh status, documented lineage, explicit quality rules, governance knowledge base, and explainable AI outputs tied to governed context.

Rules, monitoring, correction

Data quality framework

The quality model should be based on the classic dimensions explicitly highlighted by the audit checklist: completeness, accuracy, consistency, and uniqueness. In this project, freshness must be added as a critical live-data dimension.

Concrete examples for this platform: district must belong to an allowed list, transit delay cannot be negative, AQI must stay within realistic operational bounds, timestamps must be valid, and duplicate live snapshots should be controlled.

The target process is: define quality rules, profile critical datasets, automate metrics, detect anomalies, assign ownership, correct within SLA, and report status to decision makers.

This is fully aligned with the quality stream described in the roadmap: identify critical data, define quality rules, build a quality dashboard, and formalize an anomaly correction process.

Protection, classification, access

Architecture & security controls

The platform should distinguish data by sensitivity and apply appropriate controls. Even in a portfolio project, the governance page should make that model explicit: public demo data, internal technical metadata, confidential operational logic, and regulated personal data if real integrations are introduced later.

Target controls include role-based access control, encryption at rest and in transit, review of critical access rights, documented source inventory, and lineage of critical flows.

For AI for Kuala Lumpur, the architectural split between Redis live serving and DuckDB/dbt analytical transformation is itself a governance choice: the live layer optimizes freshness, the warehouse layer optimizes reliability and structure.

Compliance and accountability

RGPD & AI governance

The governance target is to be compatible with a real privacy-by-design posture: maintain an Article 30 processing register, document legal basis, enable data subject rights, manage DPA obligations, and formalize a breach response process.

The AIPD / DPIA logic is also relevant to this project because the supplied model clearly shows when impact assessment becomes mandatory: profiling, large scale processing, sensitive data, innovative technology, surveillance, or automated decisioning.

For future real-city integrations, any personal data use, geolocation enrichment, or citizen monitoring feature should be screened through this governance lens before rollout.

On the AI side, the roadmap explicitly connects governance to IA Act classification, inventory of AI systems, and supervision of higher-risk use cases. This is especially relevant for a copilot that influences operational decisions.

Traceability and catalog logic

Metadata, glossary & lineage

Metadata management is a key maturity signal. The target state includes a business glossary, technical metadata for critical datasets, and documented lineage for critical flows.

In this project, the most important lineage to document is: producer → consumer → Redis → FastAPI → dashboard for live serving, and raw signals → DuckDB → dbt → marts → analytics pages / copilot for analytical reasoning.

A future data catalog would document district definitions, metric semantics, thresholds, refresh frequency, ownership, acceptable quality bounds, and downstream consumers.

Long-term maturity

Culture, training & regulatory watch

Mature governance is not just controls and documentation. It also requires regular training, recurring awareness, incident analysis, and a structured regulatory watch.

The supplied guide makes this explicit: CNIL resources, EDPB guidelines, DAMA-DMBOK references, IA Act monitoring, and professional communities all support long-term maturity.

For this project, the governance page should therefore signal a culture objective: not only build a platform, but also build the habits required to maintain trusted data and trusted AI over time.

How this project grows toward maturity

Governance roadmap

Phase 1 governance foundation: define policy intent, document roles, make controls visible, and anchor AI responses in governed project context.

Phase 2: formalize committee logic, bilingual documentation, clearer ownership, and governance-ready product pages.

Phase 3: add stronger quality metrics, automated checks, lineage visibility, warehouse control evidence, and more explicit audit traceability.

Phase 4: evolve toward a more complete governance operating model with data catalog, richer RACI, regulatory watch workflow, and stronger AI governance documentation.