These prebuilt wheel files can be used to install our Python packages as of a specific commit.
Built at 2025-10-30T04:29:38.403818+00:00.
{
"timestamp": "2025-10-30T04:29:38.403818+00:00",
"branch": "feat-make-pyspark-optional",
"commit": {
"hash": "80f9806b0d7bb288885082bc6226c46c15c66935",
"message": "feat(docker): Add slim and locked variants for PySpark-optional deployment\n\nIntroduces slim and locked Docker image variants for both\ndatahub-ingestion and datahub-actions, for environments with different PySpark requirements\nand security constraints.\n\n**Image Variants**:\n\n1. **Full (default)**: With PySpark, network enabled\n - Includes PySpark for data profiling\n - Can install packages from PyPI at runtime\n - Backward compatible with existing deployments\n\n2. **Slim**: Without PySpark, network enabled\n - Excludes PySpark (~500MB smaller)\n - Uses s3-slim, gcs-slim, abs-slim for data lake sources\n - Can still install packages from PyPI if needed\n\n3. **Locked** (NEW): Without PySpark, network BLOCKED\n - Excludes PySpark\n - Blocks ALL network access to PyPI/UV indexes\n - datahub-actions: ONLY bundled venvs, no main ingestion install\n - Most secure/restrictive variant for production\n\n**Additional Changes**:\n\n**1. pyspark_utils.py**: Fixed module-level exports\n - Added SparkSession, DataFrame, AnalysisRunBuilder, PandasDataFrame as None\n - These can now be imported even when PySpark unavailable\n - Prevents ImportError in s3-slim installations\n\n**2. setup.py**: Moved cachetools to s3_base\n - operation_config.py uses cachetools unconditionally\n - Now available in s3-slim without requiring data_lake_profiling\n\n**3. build_bundled_venvs_unified.py**: Added slim_mode support\n - BUNDLED_VENV_SLIM_MODE flag controls package extras\n - When true: installs s3-slim, gcs-slim, abs-slim (no PySpark)\n - When false: installs s3, gcs, abs (with PySpark)\n - Venv named {plugin}-bundled (e.g., s3-bundled) for executor compatibility\n\n**4. datahub-actions/Dockerfile**: Three variant structure\n - bundled-venvs-full: s3 with PySpark\n - bundled-venvs-slim: s3-slim without PySpark\n - bundled-venvs-locked: s3-slim without PySpark\n - final-full: Has PySpark, network enabled, full install\n - final-slim: No PySpark, network enabled, slim install\n - final-locked: No PySpark, network BLOCKED, NO main install (bundled venvs only)\n\n**5. datahub-ingestion/Dockerfile**: Added locked stage\n - install-full: All sources with PySpark\n - install-slim: Selected sources with s3-slim (no PySpark)\n - install-locked: Minimal sources with s3-slim, network BLOCKED\n\n**6. build.gradle**: Updated variants and defaults\n - defaultVariant: \"full\" (restored to original)\n - Variants: full (no suffix), slim (-slim), locked (-locked)\n - Build args properly set for all variants\n\n**Network Blocking in Locked Variant**:\n```dockerfile\nENV UV_INDEX_URL=http://127.0.0.1:1/simple\nENV PIP_INDEX_URL=http://127.0.0.1:1/simple\n```\nThis prevents all PyPI downloads while allowing cached packages from build.\n\n**Bundled Venv Naming**:\n- Venv named `s3-bundled` (not `s3-slim-bundled`)\n- Recipe uses `type: s3` (standard plugin name)\n- Executor finds `s3-bundled` venv automatically\n- Slim/locked: venv uses s3-slim package internally (no PySpark)\n- Full: venv uses s3 package (with PySpark)\n\n**Testing**:\n\u2705 Full variant: PySpark installed, network enabled\n\u2705 Slim variant: PySpark NOT installed, network enabled, s3-bundled venv works\n\u2705 Integration tests: 12 tests validate s3-slim functionality\n\n**Build Commands**:\n```bash\n./gradlew :datahub-actions:docker\n./gradlew :docker:datahub-ingestion:docker\n\n./gradlew :datahub-actions:docker -PdockerTarget=slim\n./gradlew :docker:datahub-ingestion:docker -PdockerTarget=slim\n\n./gradlew :datahub-actions:docker -PdockerTarget=locked\n./gradlew :docker:datahub-ingestion:docker -PdockerTarget=locked\n\n./gradlew :datahub-actions:docker -PmatrixBuild=true\n./gradlew :docker:datahub-ingestion:docker -PmatrixBuild=true\n```\n\n**Recipe Format** (works with all variants):\n```yaml\nsource:\n type: s3 # Use of existing \"s3\" source type\n config:\n path_specs:\n - include: \"s3://bucket/*.csv\"\n profiling:\n enabled: false # Required for slim/locked\n```"
},
"pr": {
"number": 15123,
"title": "feat(ingestion): Make PySpark optional for S3, ABS, and Unity Catalog sources",
"url": "https://github.com/datahub-project/datahub/pull/15123"
}
}
Current base URL: unknown
| Package | Size | Install command |
|---|---|---|
acryl-datahub |
2.414 MB | uv pip install 'acryl-datahub @ <base-url>/artifacts/wheels/acryl_datahub-0.0.0.dev1-py3-none-any.whl' |
acryl-datahub-actions |
0.101 MB | uv pip install 'acryl-datahub-actions @ <base-url>/artifacts/wheels/acryl_datahub_actions-0.0.0.dev1-py3-none-any.whl' |
acryl-datahub-airflow-plugin |
0.039 MB | uv pip install 'acryl-datahub-airflow-plugin @ <base-url>/artifacts/wheels/acryl_datahub_airflow_plugin-0.0.0.dev1-py3-none-any.whl' |
acryl-datahub-dagster-plugin |
0.019 MB | uv pip install 'acryl-datahub-dagster-plugin @ <base-url>/artifacts/wheels/acryl_datahub_dagster_plugin-0.0.0.dev1-py3-none-any.whl' |
acryl-datahub-gx-plugin |
0.010 MB | uv pip install 'acryl-datahub-gx-plugin @ <base-url>/artifacts/wheels/acryl_datahub_gx_plugin-0.0.0.dev1-py3-none-any.whl' |
prefect-datahub |
0.011 MB | uv pip install 'prefect-datahub @ <base-url>/artifacts/wheels/prefect_datahub-0.0.0.dev1-py3-none-any.whl' |