// the find
juicedata/juicefs
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
JuiceFS is a POSIX-compatible distributed filesystem that separates metadata (Redis, MySQL, TiKV, etc.) from data storage (S3 and compatibles). It lets you mount object storage like a local drive and share it across thousands of clients simultaneously. Aimed at ML/data pipelines and Hadoop/Spark workloads that need shared storage without a dedicated NAS.
1. The metadata/data separation is architecturally clean — swapping Redis for TiKV or MySQL is a config change, not a rewrite, and it means you pick your durability/latency tradeoff explicitly. 2. Passed all 8813 pjdfstest cases, which is not something most distributed filesystems can claim; flock, fcntl locks, and xattr actually work. 3. The local disk cache layer is well-thought-out: blocks are cached on the client, so repeated reads on ML training data don't hammer S3. 4. CI is serious — chaos engineering tests, mutation testing, pjdfstest, fio benchmarks, and Windows tests all run in CI, which suggests the test suite actually tracks production failures.
1. Redis as metadata engine is a footgun at scale: Redis Cluster support requires all keys in one hash slot, which caps practical namespace size and creates a single-shard bottleneck. The docs acknowledge this but it still catches people. 2. Close-to-open consistency only — concurrent writers from different clients will race; this is POSIX-ish but not POSIX. Anyone who expects POSIX write ordering without close will have a bad time. 3. The storage format is opaque inside the S3 bucket (chunks/blocks, not files), so you can't access your data without JuiceFS running. Vendor lock-in at the data layout level. 4. The commercial version (JuiceFS Enterprise) has features the community edition doesn't, and the line between them isn't always obvious from the community docs — some operational capabilities like directory quotas and warm-up controls are quietly limited.