// the find

dlt-hub/dlt

★ 5,449 · Python · Apache-2.0 · updated Jun 2026

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

dlt is a Python library for building ELT pipelines that handles schema inference, type normalization, incremental loading, and destination management. It targets data engineers who want Fivetran/Airbyte-style functionality without the infrastructure overhead, or who need pipelines that live in their own codebase. Works anywhere Python runs—notebooks, Lambda, Airflow, local scripts.

- Schema inference and evolution is genuinely useful: dlt automatically detects types, flattens nested JSON into relational tables, and handles schema changes without manual intervention or pipeline failures.

- Incremental loading is a first-class concept with built-in cursor tracking and merge strategies, not an afterthought bolted on per-source.

- Destination coverage is solid—BigQuery, Snowflake, Redshift, DuckDB, Delta Lake, Iceberg, filesystem—and the custom destination interface is clean enough to build reverse-ETL without fighting the framework.

- The codebase is well-structured with clear separation between common, sources, and destinations layers, and the CI matrix is thorough with separate workflows per destination category.

- The configuration/injection system is clever but opaque—resolving credentials across env vars, TOML files, and spec classes in a specific precedence order is confusing to debug when something doesn't resolve as expected.

- New destinations are explicitly not accepted as contributions, which means you're dependent on the maintainers for anything not already supported, and the SQLAlchemy destination has known dialect gaps that won't be fixed by the community.

- The `_workspace` module adds a whole deployment/scheduling/MCP/dashboard layer that blurs what the library actually is—it's growing toward an opinionated platform, which adds dependency weight and conceptual surface area for people who just want the ETL primitives.

- Telemetry is on by default and requires explicit opt-out, which will surprise teams with strict data governance policies who don't read the docs carefully.

View on GitHub → Homepage ↗