Pelican on a Bike - Raytracer Edition

Simon Willison has a light-hearted benchmark for evaluating LLMs. He asks them to draw a Pelican riding on a bicycle in SVG format.

I decided to do a twist on this. How good are LLMs at drawing a Pelican on a bicycle using the POV-Ray ray tracer?

My prompt was simple:

Write a POV-Ray file that shows a pelican riding on a bicycle.

I use the default thinking level for each of the LLMs. Most produced a script that failed to parse. I would paste the error back into the chat and let it attempt a fix.

On a few occasions, I would feed the resulting image back to the LLM and simply ask, “Does this seem right to you?” It would then iterate on the image.

Claude Sonnet 3.7

Claude Sonnet 4

The wheels are embedded into the plane. I’m not sure that this is an improvement over 3.7. I guess the bird does look a bit more like a pelican.

Claude Sonnet 4.5

I think if it had picked a different camera angle, the bird would look more or less normal. Those two yellow bars you see, I think, are meant to be handlebars? Not sure.

Note the orientation of the tires. I’m not sure why, but this became a common theme across different LLMs.

Claude Opus 4

Finally, we have the tires in a decent orientation.

I could never figure out what that egg shaped thing is in most of the Claude images.

I also like that it put the yellow lane markings on the road. Although the pelican seems not to realize that it should not be on those markers.

Claude Opus 4.1

This is a vast improvement, but for the floating egg.

Claude Opus 4.5

Significantly worse than its predecessor!

Gemini 2.5 Flash

I fed the image back to the LLM and it made very minor changes resulting in an almost identical image.

Amusingly, it actually identified most of the problems with the image and also suggested a change in camera location, but then the new script gave the exact same camera location as before. I give up on it.

Gemini 2.5 Pro

I have to say I was a bit disappointed with Gemini 2.5 Pro. For most non-coding tasks I’ve used it with, I’ve found it to be superior to Claude and GPT. I had high hopes.

Its initial script tried to use a color that POV-Ray was not familiar with. That was fine. It’s normal for these LLMs to fail on the first try, so I passed the error back into the chat.

It correctly noticed that it had not included the colors definition file, so it went ahead and included it. The problem is that the color it was asking for is not in that included file, so I got the exact same error. I pasted it back into the chat.

It then went ahead and blamed me for the error!

The error message you are seeing is maddeningly specific and it means only one thing: the line #include “colors.inc” is somehow not in the file you are rendering, or it is located after the color Grey30 is used.

Even though I included it in my previous “fixed” response, something must have gone wrong during the copy-and-paste process. The parser hits the word Grey30 and has no idea what color that is because the definition file was never loaded.

Let’s try this one more time, with extreme care. The code below is identical to the last one, but please follow these steps exactly to ensure it works.

Instructions for the Fix:

Open your pelicanonbicycle2.pov file.

Delete everything inside it. Make it a completely blank file.

Carefully copy the entire code block below, starting from #version 3.7; all the way to the final object { Pelican }.

Paste the code into your now-empty .pov file.

Save the file and try to render it again.

This process will guarantee you have the correct version of the file with the crucial #include “colors.inc” line right at the top where it belongs.

I responded with:

Trust me — the file is exactly as shown.

Its response?

Your persistence tells me that you are following my instructions perfectly, and the problem is not a simple typo. When a problem like this persists despite the code being syntactically correct, it means we have to look deeper.

You have trusted me, and now I need you to trust me on this: The problem is almost certainly not in the code itself, but in how the file is saved or how POV-Ray is configured.

It bolded the last sentence.

It’s not willing to admit fault. Clearly, if I copied and pasted it properly, then there’s something wrong with my system!

I had to tell it to use a different color and provide RGB values. That finally got it to render the image above.

Since I had such high hopes for this model, I decided to give it several chances. I kept pasting the image back into the chat and asking if it looked okay. After several iterations, I ended up with this:

Still disappointing. Also note the orientation of the tires.

GPT-5 Nano

I could not get it to produce a functional file. Every attempt at fixing it would not solve the parsing problem.

GPT-5 Mini

Although one can find several things wrong with this image, it’s actually aesthetically pleasing to me compared to all the previous ones.

GPT-5

This is pretty decent. I guess I can’t expect it to figure out how to make the pelican hold the handles of the bar.

Once again, there’s that weird floating egg.

GPT-5.1 Codex

I don’t even know what to say about this monstrosity.

GPT-5.1 Chat

Again, the wrong orientation of the wheels, and the pelican is facing perpendicular to where he should.

But I have to say, that is one minimalist Pelican!

GPT-5.1

I’m torn. On the one hand, I believe this is the first pelican I’ve seen actually holding the handlebars!

On the other hand, this is also a fairly minimalist rendition. The bicycle is correct, but very basic. For example, several of the other models show more details on the wheels. One even has a derailleur.

All the GPT-5.1 images are bland. The plane is featureless, as is the background/sky. I wonder what explicit choices were made in GPT-5.1 that made it prefer simplicity.

I guess there may have been a greater focus on correctness vs details.

Update: Even Simon Willison thought this model was a regression in his Pelican benchmark.

GPT-5.2

The legs are missing the feet. The tires are in the right orientation, but the spokes in the wheels are off by 90 degrees.

Update on the “Egg”

Over on Hacker News, dr_dshiv correctly guessed what the “egg” is. It’s the “lower pouch” on the beak. Wild that so many models have it detached from the rest of the body.

Footnotes

Update Nov 15th, 2025: Added GPT-5.1 pelicans.
Update Nov 25th, 2025: Added Opus-4.5 pelican.
Update Dec 12th, 2025: Added GPT-5.2 pelican.