Active · Collaboration with Mikkel Willum Johansen
How does peer review shape the use of diagrams in mathematics? This project investigates whether and how diagram use transitions between preprint and published article — using arXiv as a large-scale natural laboratory where the "before" version of thousands of papers is publicly available. Even a null result would be telling: if peer reviewers systematically ignore diagrams, that too says something significant about mathematical communication.
The project combines a custom-built diagram detector with quantitative corpus analysis in pandas and matplotlib, moving from large-scale detection of transitions toward close reading of exemplary cases. A longer-term ambition is to develop a fuller typology of diagram types — and in particular to identify and separate out a large class of routine or formulaic diagrams.
Tools: `diagram-detector` · `YOLO` · `pandas` · `matplotlib` · `SQLite`
Data: arXiv preprints · published articles
---
Dating by Dressing
Active · Independent
Can machine learning help date historical photographs by analysing clothing? This project trains object detection and image classification models on photographs from the Royal Library's special collections — using images with known dates as training data to estimate dates for the many photographs that lack them. The focus is on women's clothing, which changes more systematically across the vintage period than most other visual features in studio photography.
The pipeline combines YOLO-based dress detection with classification trained on dated images from the Elfelt and Damgaard collections and the Royal Library's carte-de-visite holdings. A Flask-based interface will allow users to submit photographs and receive date estimates with visualised uncertainty — making the tool broadly applicable beyond the current corpus.
Tools: `YOLO` · `ImageNet` · `Flask` · `pandas` · `matplotlib`
Data: Elfelt Collection · Damgaard Collection · Visitkortsamlingen (Det Kgl. Bibliotek)
---
Double Photographs
Early stage · Independent
Stereographic and double-exposed photographs surfaced as outliers in an earlier computer vision project — and turned out to be worth studying in their own right. This project uses template matching and the Royal Library's image API to systematically identify and classify these photographs in the Elfelt collection, distinguishing stereographic pairs from double exposures and mapping their distribution across the collection. A longer-term ambition is to explore which visual features drive the classification — making the model's reasoning interpretable for art and photography historians.
Tools: `template matching` · `OpenCV` · `API client`
Data: Elfelt Collection (Det Kgl. Bibliotek)