// the find
OpenGVLab/InternGPT
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
InternGPT is a Gradio-based demo platform for chaining vision-language models — SAM, DragGAN, ImageBind, HuskyVQA — into a single interactive UI where pointing gestures (clicks, drags) drive the conversation rather than pure text. It's a research demo from OpenGVLab, not a library or framework you'd build on. The intended audience is researchers who want to show off multimodal pipelines without writing a custom UI each time.
The pointing-as-instruction idea is genuinely novel — clicking a region before asking 'what is this?' is more precise than describing it in words, and the paper backs this up with benchmarks. Selectively loading only the models you need (`--load`) is smart given how much GPU memory this stack consumes. HuskyVQA, their fine-tuned VQA model, reportedly reaches 93.89% of GPT-4 quality on their eval set — that's a concrete claim with numbers behind it, not vague capability marketing. Docker support with separate DragGAN and InternGPT-CN variants makes self-hosting somewhat tractable.
The online demo has been suspended since May 2023 and the last commit was August 2024 — the project is functionally abandoned, with most roadmap items still unchecked. The dependency list is a nightmare: StyleGAN2 requires custom CUDA ops, DragGAN pulls in its own model zoo, ImageBind and SAM each have separate weight downloads, and the full `--load` command strings together 16 model classes on a single GPU. There's no requirements pinning strategy that actually works across this many conflicting projects. The repo is also a demo platform, not a composable SDK — if you want to add your own model, you're subclassing an undocumented base class and hoping the Gradio wiring holds.