finds.dev← search

// the find

WeiYe-Jing/datax-web

★ 6,009 · Java · MIT · updated Jun 2024

DataX集成可视化页面,选择数据源即可一键生成数据同步任务,支持RDBMS、Hive、HBase、ClickHouse、MongoDB等数据源,批量创建RDBMS数据同步任务,集成开源调度系统,支持分布式、增量同步数据、实时查看运行日志、监控执行器资源、KILL运行进程、数据源信息加密等。

DataX-Web puts a scheduling and UI layer on top of Alibaba's DataX batch ETL engine, letting you configure data sync jobs between RDBMS, Hive, HBase, ClickHouse, and MongoDB through a web form instead of hand-writing JSON config files. It's aimed at data engineering teams running on-prem Hadoop stacks who want something between 'edit JSON by hand' and 'buy an enterprise ETL tool'.

The JSON builder that auto-generates DataX config from datasource selection is the core value and it works for the common cases — MySQL to Hive, RDBMS batch creation especially saves real time when you have dozens of similar sync jobs. The scheduler (built on xxl-job) gives you distributed executor nodes with sensible routing strategies including consistent hash and failover, which is more than most homegrown schedulers bother with. Incremental sync via timestamp or auto-increment PK is built in and parameterized, not bolted on. Real-time log streaming to the browser is a genuine quality-of-life improvement over SSHing into nodes to tail files.

Last commit was June 2024 and the project has been quiet for over a year — DataX itself still requires Python 2.7 unless you manually swap three files, and this project does nothing to fix that. The encryption migration note in v2.1.2 is alarming: upgrading from 2.1.1 silently breaks all existing datasource credentials with no migration path, just 'recreate everything'. MySQL 5.7 as the only supported metadata database in 2024 is a real constraint. The UI is a compiled Vue bundle checked into the repo with a separate frontend repo, so if you need to change anything in the interface you're dealing with an abandoned build pipeline and two repos.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →