ALTIFY: Inferring high-quality icon alt-text from partial app screens
Abstract
Alt-text is essential for mobile app accessibility, yet UI icons often lack meaningful descriptions, limiting accessibility for screen reader users. Existing alt-text inference approaches require extensive labeled datasets, struggle with partial app screens, or operate only post-development. We first conduct a formative study to determine when and how developers prefer to generate icon alt-text, showing strong developer interest in tools that support alt-text generation at the point of UI creation. We thus introduce ALTIFY for automating alt-text generation for UI icons during app development: A text-only large language model that processes extracted UI metadata and a multimodal model that also analyzes icon images. To improve accuracy, the method extracts relevant UI information from the DOM tree, retrieves in-icon text via OCR, and applies structured prompts for alt-text generation. Our empirical evaluation with the most closely related deep-learning and vision-language models shows that ALTIFY generates higher-quality alt-text while not requiring a full-screen input, making it more suitable for integration into developer workflows.