// the find
pgalko/BambooAI
A Python library powered by Language Models (LLMs) for conversational data discovery and analysis.
BambooAI is a multi-agent Python library for natural language data analysis against pandas DataFrames. It orchestrates a pipeline of specialized LLM agents (planner, code generator, error corrector, reviewer) to turn plain-English questions into executed Python code and visualizations. Aimed at data analysts who want LLM-assisted exploration without writing pandas from scratch.
The multi-agent decomposition is genuinely thoughtful — separating planning, code generation, and error correction into distinct agents with different model assignments means you can put a cheap flash model on routing tasks and a strong reasoning model only where it matters. Self-healing code execution (retry loop with LLM-based error correction) is the right call for this use case; it dramatically improves reliability on messy real-world data. Provider flexibility is real: the JSON config lets you swap any agent to any provider (OpenAI, Anthropic, Gemini, Ollama, vLLM) without touching code, which matters if you want to run fully local. The OWL ontology integration for domain grounding is an unusual and practical addition — it gives the agents actual schema knowledge rather than relying on column-name inference.
Code execution is a serious security surface: by default it runs generated Python in-process, and the Docker sandboxing is opt-in, not the default — any user who just pip-installs and runs locally is executing arbitrary LLM-generated code in their environment without isolation. The vector DB episodic memory is Pinecone-or-Qdrant only, which adds an external dependency and account requirement for what should arguably be a local SQLite fallback. Test coverage is almost nonexistent — two unit test files, one integration test for a single provider, nothing covering the core agent pipeline, prompt construction, or error correction loop, which makes confident upgrades hard. The LLM_CONFIG.json approach puts hardcoded model IDs (including preview/dated slugs like `gemini-2.5-pro-preview-03-25`) directly in config, so these will silently break when Google or Anthropic deprecate those endpoints.