When building AI interfaces, I want visuals to feel as fluid and expressive as text. But handing an LLM a massive predefined library of icons and illustrations is impractical: it bloats the prompt with unnecessary tokens and still might miss the exact visual it needs.
What if we could let the LLM decide what visuals it wants to use itself? It could simply describe the icon (e.g. “storm cloud”, “coffee shop”, “cyberpunk skyline”), and we would resolve the best-fit match on demand. For this exploration, I didn’t focus on matching a specific product style, but rather on the mechanics of resolving and generating icons lazily.
The Idea
The goal is to generate icons on-demand, but make it feel instant. We achieve this by falling back to semantically similar icons while generating new ones in the background.
If an LLM asks for a “rainy day” icon, and we don’t have it, we can immediately serve a “cloud” icon we already have, while we ask an image generation model to create a “rainy day” icon for future requests.
The Pipeline
To make this work efficiently, we use a multi-step lookup pipeline. It starts with fast, exact matches and falls back to more expensive similarity searches and generation only when necessary.
Normalization
When an LLM requests an icon, it usually provides a raw string like storm_cloud. We first normalize this by replacing underscores with spaces and making it lowercase so we have a clean starting point.
Exact Match Check
Before doing any expensive work, we check our S2 storage bucket to see if an icon with this exact name already exists. If it does, we return it immediately with a long cache header.
Query Mapping Cache
If there's no exact match, we check our cache for previous similarity lookups. If someone previously asked for "storm cloud" and we resolved it to "cloud", we can skip the embedding generation and return "cloud" right away.
Embedding & Vector Search
If we've never seen this request before, we generate a text embedding and perform a cosine similarity search against all our existing icons to find the nearest neighbor.
Resolution
Finally, we evaluate the best match from our similarity search. If the similarity is above a certain threshold we consider it a good match and return it. If it's below the threshold, we queue a generation job for the exact requested icon and provide the nearest match as a fallback. If the similarity is especially low, we don't return any icon at all to allow the application to handle it gracefully (e.g by displaying a placeholder icon).
The Result
Over time, as the system handles more requests, it builds up a rich library of icons. The more it’s used, the faster it gets, as more requests hit the exact match or query mapping cache.
Want to see it in action in an AI generated chat interface? Try it out here.
What’s Next
There’s still lots of room for interesting improvements. I’d love to add another step to the pipeline that allows us to map icons to different art and visual styles to further tailor the icons to specific design languages, branding or application contexts. There’s also interesting explorations around on-demand assets that don’t rely on image generation, but instead might pull official company logos, profile pictures, posters etc.
By providing LLMs with a simple, consistent API for using relevant visuals in their responses, I can build much more enticing and engaging interfaces. If you’re interested in richer interfaces, check out this post.