LLM Assisted Visualization Analysis Pipeline
Zhiyang Wang
Advisor: Simone Salvo
A scalable multimodal LLM‑powered pipeline that automates the extraction, classification, and interactive 2D/3D exploration of large scholarly figure collections, enabling visualization service providers to accelerate trend analysis, technique discovery, and consulting workflows.

Project Description
This project presents a scalable, end-to-end system for extracting, classifying, and interactively exploring large collections of scholarly figures using a multimodal language model (LLM). Leveraging Playwright automation and OpenAlex harvesting, we collected over 11,000 publication PDFs. Figures are isolated using pdffigures2 and a Faster R-CNN-based detector, followed by zero-shot chart-type classification via GPT-4o prompting. The resulting metadata populates a dual-mode exploration interface—a traditional 2D dashboard and a 3D free-exploration environment built with Observable Framework, Three.js, and D3.js.
Designed to support visualization service providers, the system enables rapid trend discovery, technique identification, and cross-domain consulting. Initial evaluations show a manual-verification accuracy of 91.2%, with the potential to reduce manual annotation efforts by over 98%. Future work will integrate Vision Transformer embeddings with t-SNE/UMAP dimensionality reduction for taxonomy-free exploration.
This work would not have been possible without the support of the ITP community, as well as Carolina Roe-Raymond (Princeton University) and Devin Richard Bayly (University of Arizona).
Technical Details
Data Acquisition & Preprocessing:
1. OpenAlex API harvesting
2. Playwright browser automation
3. pdffigures2 for figure–caption extraction
4. VisImages-Detection (Faster R‑CNN) for subfigure isolation
Classification:
1. Zero-shot chart-type inference via OpenAI GPT‑4o (version: 2024‑08‑06)
2. Caption-based prompt engineering
Interface:
1. 2D dashboards: Observable Framework
2. 3D exploration: Three.js and D3.js
3. Future work: Taxonomy-free embedding using t‑SNE/UMAP on Vision Transformer features
Research/Context
This work proposes a quantitative analysis pipeline for visualization datasets, combining bibliometric and document analysis foundations. Using this pipeline, we built one of the largest institutional collections of visualization figures (>30k images), supporting flexible, user-defined taxonomy. Unlike prior projects such as Beagle’s web extraction [Battle et al. 2018] and Vis30k’s curated datasets [Chen et al. 2021], our approach eliminates manual annotation overhead by incorporating LLM-based zero-shot labeling. Drawing interface inspirations from Google’s t-SNE Map [Diagne et al. 2018] and Duhaime’s Three.js guide [Duhaime 2017], we designed a 3D interactive exploration environment. Comparisons to VisImages [Deng et al. 2022] emphasize challenges in multi-label figure handling and motivate subfigure isolation strategies. This work demonstrates how AI4Vis techniques can streamline visualization consulting, trend analysis, and technique discovery at scale across institutions.