Beware of impending clichés of deceased horse attacks (sidebar: why does anyone worry about how many stones it takes and never ask why the fascination with killing birds?)
Having made the call for the broken state of Google’s touted ability to locate open licensed images getting some notice (my peak Hacker News hit) and followed up with what looks like improvement is even not, and sad for how wrong and terrible Google’s delivered results turns out to be.
I could not resist going back in to look more closely.
That sensation when you open the refrigerator door and smell something rotten?
Do you rummage to find and clean the source, or close the door, tip toe away, and leave it for someone else to deal with?
I looked again yesterday, and the improvements in Google Search’s ability to find creative commons license images for “dog” had skyrocketed from 3 on September 27 to a whopping 24 (note it’s even more now, 40, but I do not see ones I saw yesterday).
There were many I could investigate, but one that caught my eye was credited in the results to the World Animal Foundation, but over in the results, the image credit and links was to Rawpixel. That is weird. Also, for future reference, Google indicated via License Details link that this cute pooch is licensed CC0.
And sure enough, this dog is the lead image in a long article from “World Animal Foundation” (likely to be confused with another organization?) with lots of dog pics on 8 Best Invisible Dog Fences Reviewed 2022. You will find nary an attribution, link to source, (Zero for TASL), because of course, public domain means you “do not have to attribute the source.”
But Google seems to know the source, because it attributes the photo as Creator and Credit to Rawpixel.com and links a source at https://www.rawpixel.com/image/6064555/free-public-domain-cc0-photo. And it looks pretty legit, showing there a CC0 license. No attribution to the photographer, well because CC0.
Are they really the creator of this image? Let’s put Google to search for a image source, using “Lens” (reverse image search), where I click Find Image Source. There are many many pages of links to blogs, web sites, ads, all kinds of dog article, and I guarantee (I only looked at a few) none of them credited or linked to the place where the image came from (ahem, because CC0 says you can just grab and image, use it, and not credit the source).
I do know though if you page through, you will locate it, often on one of the Big Three, Pixabay, Pexels or Unsplash. Thar she goes!
How do I know for sure this is the source? Well, it is listed under the account of a person. And Pixabay is pretty thorough on their review (I have seen it myself when uploading images, they don’t just take everything).
Via View source on the pixabay photo page, I can see from the URL structure of the opengraph tag, that very likely this photo was uploaded September 24, 2016:
How can we determine when a web page was published if it does not reveal it? Always there’s an answer (found with Google) from Labnol at Find the Date When a Web Page was First Published on the Internet — and I use Google itself to do this – Look Google says this Pixabay page appeared September 27, 2016.
On that date, according to the Internet Archive, Rawpixel.com was still setting up shop as a stock photography site.
Can we find the date when the Rawpixel page for this same dog was published? The view source is a jumble of code to parse, but I found nothing. And the Labnol trick revealed no date for the last 25 years even with some efforts to search before the date Kathleen posted her image to Pixabay.
I am very convinced the Pixabay image is the original, and has been harvested for reuse at Rawpixel, which, while a violation of the current Pixabay license, does not apply as when Kathleen uploaded her dog photo it was when Pixabay was licensing images CC0.
One more thing of interest… photo metadata aka EXIF.
and compare to the one downloaded from the World Animal Foundation post where google found it
Rawpixel is scraping images from Pixabay, and then modifying the EXIF data to take credit, and assert a license. Why? That is how Google locates images licensed under creative commons, from photo metadata.
Is this legal under the original CC0 license? Likely. Does it smell rotten? You tell me.
Yeah, riddle me this. Google Image search returns a search result from a post at the World Animal Foundation. The information it pulls from the image metadata is a credit to rawpixel.com… why in the right side preview of the results does it provide a direct link to the image source, not where it was found at the World Animal Foundation, but on Rawpixel.
And Google? Only the know the answer. They can just shrug. They just index the web. And they give precedence in search results to sources that are not really the sources.
My work is not done. I am still digging. It’s smelly out there.