We use it so frequently that perhaps we do not marvel that web search works. Especially the one where the brand name becomes the verb. In that googling something, typing a few words into a box, pressing a button, and getting a long stream of positive results.
And of course “works” is the laden term here-it means you get things back, but why? how? what’s left out? what is forced in? Can anyone actually explain how all the gears and sprockets do it’s thing?
I cannot explain but I try as much as possible to look for clues. This comes up often as I am looking for images to use in projects, blog posts, using Google Image Search with the settings to return results licensed for reuse (I will send someone a gold coin for a means to make this my default setting on results).
As it has before, a recent forway leads me to wonder how Google rates higher results from sites that scavenge images from the original places they were posted, the weird space of public domain images:
Last week I was prepping a post for an upcoming event over at the H5P/PB Kitchen project, where we are having Arley Cruthers join us to share about branching scenarios. Using “branching scenario” as a search term does not provide anything helpful- mostly flow chart diagrams. Often literal keywords fail, so I begin pushing the words around to describe some kind of action that produces an image that is more suggestive.
The image search terms path decision get me some ones that feel right. The first results are all from pixabay, and might work, but it’s the image of the boy facing a decision that speaks most to me. It’s 14th in my list of results.
The result links to the image on pxhere.com and as it is listed as CC0 I can use this image and be done. That’s public domain as most people know it- grab a copy, use it, and you do not have to attribute.
Except I know from my previous looks that pxhere.com just scarfs up public domain photos from elsewhere to publish on their site (I find many of of flickr photos there, most often not credited to me). This is all fueled by the incentive of ad economics and search result puffery.
Just for fun, as I know how it plays out, I run a reverse image search on that photo of the boy and the path. The matches show it used on many other web sites. I do not check, but past experience shows that all to nearly all never cite the source. Because you do not have to.
Paging into maybe the 7th, 8th, 9th page of results, I land on the source, in Pixabay and image by user qimono. I bother with this because I always want, where possible, to credit the person that shared this photo.
If I enter the same search terms in Pixabay I used in Google images, it is 2nd in the results:
My question that no one may ever explain is why does Google rank the pxhere.com result much higher than the source? Pxhere.com is predominantly photos scavenged from other public domain sources, and they almost never attribute the creator of the photo.
If I had a hand in the algorithm, I would not give the scavengers the weight in results. But I don’t, and I am not sure if anyone can truly explain how this works.
I remain curious. But I am dedicated to going as far upstream as I can to cite the sources of images that are directly from the person who shared them or from a site that credits them.
This even came into play finding an image for this post. I thought “black box” would produce interesting imagery, and plunked into Google Image search with options set for Creative Commons licensed results. The image of the tech like drawings over the eyeball jumped out as a possible, but on following the link to the Undark column Unpacking the Black Box in Artificial Intelligence for Medicine (and only partly avoiding the rabbit hole of reading the article). The image is credited to Getty Images, and nothing on this article carry a hint of an open license.
So I passed.
This fussing over images, results, finding the sources is just Alan’s obsession. With each day of searching and seeing how imagery is used, the rate of lack of attribution growth is exponentially larger the places I seem images attributed. I don’t think it will change ever, but at least it gives me something to blog about.
How about you? Is it enough to get search results, grab and go, or do you glance into that search result with a bit of wariness?