About Viziometrics

Scientific results are communicated visually in the literature through diagrams, visualizations, and photographs. These information-dense objects have been largely ignored in bibliometrics and scientometrics studies when compared to citations and text. In this project, we use techniques from computer vision and machine learning to classify more than 8 million figures from PubMed into 5 figure types and study the resulting patterns of visual information as they relate to impact. We find that the distribution of figures and figure types in the literature has remained relatively constant over time, but can vary widely across field and topic. We find a significant correlation between scientific impact and the use of visual information, where higher impact papers tend to include more diagrams, and to a lesser extent more plots and photographs. To explore these results and other ways of extracting this visual information, we have built a visual browser to illustrate the concept and explore design alternatives for supporting viziometric analysis and organizing visual information. We use these results to articulate a new research agenda – viziometrics – to study the organization and presentation of visual information in the scientific literature.

Press and Recognition


We originally used patch-based machine vision techniques to classify figures by visualization type, achieving 91% accuracy on a test set with 5 categories – equations (394), photos (782), tables (436), visualizations (890), and diagrams (769). More recently, we have begun using deep learning to achieve higher quality results at the expense of training time. For the task of classifying millions of images that we extracted from source papers, we found approximate 35% of them contains multiple sub-figures. A dismantling algorithm we proposed in ICPRAM 2015 resolves this issue by parsing each composite figure into multiple sub-figures. The algorithm splits each composite figure into visual “tokens” recursively, classifies each token as either auxiliary (e.g., text fragments) or standalone figures, then merges the tokens recursively to reconstruct whole figures. The algorithm terminates when the reonstructed figure achieve a certain “completeness” score based on their types and positions. Using the results of the dismantler, we can more precisely classify the sub-figures.


Our data for this research project comes from several sources. Currently, the prototype includes more than 8 million images from PubMed Central. We plan to add other data sets as they become available.







If you have questions, please email

Po-Shen Lee at sephon@uw.edu


This work supported in part by the Gordon and Betty Moore Foundation, the Alfred P. Sloan Foundation, the UW eScience Institute, and the University of Washington iSchool.