MCP Tool: Product Logos in AI Images (gpt-image-1)

Reference image in, logo cleanly in the scene: that's what image-input mode does.

Part of the MCP series

This article builds on What Is an MCP Server? – where I explain the basics.

AI image generators are great for abstract heroes – but the moment a concrete logo has to appear, it gets messy: the model invents a logo that only resembles the real one. For a product hero meant to show the actual Home Assistant logo, that's useless. The fix: send the logo along as a reference image instead of merely describing it.

That's exactly what I retrofitted into my existing generate_ai_image MCP tool. Here I show how image input works with both providers, how the tool automatically switches to the right model – and which pitfall cost me a few wasted generations.

Made with this very tool: the real Home Assistant logo (passed as a reference image) cleanly on the display – not invented.None

Why text-to-image isn't enough

The classic path is pure text-to-image: prompt in, image out (DALL·E 3, Imagen). But the model has no template – it can only approximate a brand logo from memory. The result: warped letters, wrong proportions, a “logo” that isn't one. For brand assets that's a no-go.

What we need is an image input: we hand the model the real logo and say “build this cleanly into the scene.” Both major providers can do it – just via different endpoints.

Two paths: images.edit vs. generate_content

With OpenAI, image input doesn't go through images.generate but through images.edit with the gpt-image-1 model. You pass one or more input images plus a prompt. Important: the bytes must arrive as file-like objects with a set .name, otherwise the SDK complains.

python

# Referenzbild-Modus: gpt-image-1 image-edit mit (mehreren) Eingabebildern
if reference_images:
    ref_model = model if str(model).startswith("gpt-image") else GPT_IMAGE_REF_MODEL
    size = GPT_IMAGE_SIZE.get(aspect_ratio, "1024x1024")
    inputs = []
    for i, data in enumerate(reference_images):
        bio = BytesIO(data)
        bio.name = f"reference_{i}.png"   # SDK braucht einen Dateinamen
        inputs.append(bio)
    response = client.images.edit(
        model=ref_model,
        image=inputs if len(inputs) > 1 else inputs[0],
        prompt=prompt,
        size=size,
    )
    return _openai_response_to_contentfile(response)

With Google, Imagen cannot take image inputs. For that there's gemini-2.5-flash-image, which runs via generate_content – prompt and image come in as a Part list, and the finished image sits in the response's inline_data.

python

# Imagen kann keine Bild-Inputs -> Flash-Image (generate_content)
if reference_images:
    ref_model = model if "flash-image" in str(model) else GEMINI_REF_MODEL
    client = genai.Client(api_key=GOOGLE_API_KEY)
    contents = [prompt]
    for data in reference_images:
        contents.append(types.Part.from_bytes(data=data, mime_type="image/png"))
    response = client.models.generate_content(model=ref_model, contents=contents)
    for cand in response.candidates or []:
        for part in cand.content.parts or []:
            inline = getattr(part, "inline_data", None)
            if inline and inline.data:
                return ContentFile(inline.data)
    raise ValueError("Gemini (Flash-Image) hat kein Bild zurückgegeben.")

From the MCP tool to the bytes

The MCP tool itself should stay convenient: it takes either Wagtail image IDs (reference_image_ids) or public URLs (reference_image_urls) and resolves both to raw bytes before passing them to the service. If references are present, the pipeline automatically switches to the image-capable model – the pure text-to-image path stays untouched.

python

reference_images: list[bytes] = []

# 1) Wagtail-Bilder per ID
for iid in reference_image_ids or []:
    img = Image.objects.get(id=iid)
    with img.file.open("rb") as f:
        reference_images.append(f.read())

# 2) Beliebige oeffentliche URLs
for url in reference_image_urls or []:
    reference_images.append(requests.get(url, timeout=30).content)

image = generate_and_save_image(
    prompt=prompt,
    title=title,
    use_case=use_case,
    reference_images=reference_images or None,  # None -> reiner Text-Pfad
)

By the way: this MCP server – image generation included – doesn't run in the cloud for me but on a frugal mini-PC in the homelab that also carries Home Assistant and various Docker stacks:

Ad · Affiliate link – if you buy through it, I may earn a commission. It doesn’t change the price for you.

Beelink SER5 Max (Ryzen 7 6800U, 24 GB RAM, 500 GB SSD) Amazon

Sparsamer Mini-PC für Homelab & Self-Hosting – genug RAM für Proxmox, Docker-Stacks oder eine Home-Assistant-VM.

View on Amazon

Pitfall: provider ≠ model

One thing cost me real generations: I wanted to force the provider to openai for a logo image but didn't pass the model. My config resolver then pulls the model from the settings – and there sat the Imagen model. The result: an Imagen model name ends up in the OpenAI call, which acknowledges it like this:

text

Error code: 400 - The model 'imagen-4.0-generate-001' does not exist.

Lesson: provider and model belong together. Either set both explicitly (provider="openai", model="gpt-image-1") or use no override at all and rely on the configured default. The reference-image branch does catch a mismatched model and picks the respective image model – but only if the provider matches the model in the first place.

What I left out

A real live test in CI: the image APIs cost tokens and need keys – the logic is built to the SDK docs and statically checked; the real test ran first against an actual logo.
Auth & error handling: retries, rate limits, invalid URLs – needed for production, deliberately kept brief here.
Text in the image: both models love to drop unwanted letters into the image; a clear “no text” in the prompt helps but is no guarantee.

Conclusion & outlook

With a few lines, a text-to-image tool becomes one that cleanly adopts real logos into the scene – the key is the right endpoint per provider (images.edit or generate_content) and a tool that resolves IDs/URLs to bytes. The hero of this article, by the way, was made exactly this way. The next logical step: the same reference mechanism for consistent characters across multiple heroes.

// More recommendations

Ad · Affiliate link – if you buy through it, I may earn a commission. It doesn’t change the price for you.

netcup – 5 € Gutschein für Neukunden Hosting

5 € Rabatt für netcup-Neukunden (gilt nicht für Domains). Beim Bestellen einlösen.

View offer

Code: 36nc17813356860

mcp ai gpt-image-1 gemini python bildgenerierung

Extending an MCP Tool: Product Logos in AI Images

Why text-to-image isn't enough

Two paths: images.edit vs. generate_content

From the MCP tool to the bytes

Pitfall: provider ≠ model

What I left out

Conclusion & outlook

// related posts

Imagen vs. gpt-image-1: AI Image Models in a Real-World Test

Understanding LLM Agents: From Chatbot to Agent

Local LLMs with Ollama on Your Homelab Server

Post your comment