I have to admit doing some bit of tossing phrases into one of the DALL-E Mini generators to gawk/shrug/tweet about what emerged. There is a bit of natural curiosity to either get a result that is a stunning graphic but what seems more common in my attempts… crap.
So I like seeing my colleague Bryan Alexander testing the waters.
and seeing limitations as pointed out by Mark Sample
I have a lot of questions about what is going on here that deserves to be called intelligence or even squinting a bit in pain when some article calls this “jaw-dropping” (admission, I have seen about 0.000000000000000001% of all AI generated images, yet this stops me in no way from drawing conclusions).
This sideline, quasi curiosity been part of my own experiment with a sort of blog-ish like effort in the OEG Connect community space (though
hardly anyone else no one has taken up the offer to make their own thread).
There’s a bit of a fad rush to be prompt tossing into the machine. What makes for a good AI image prompt? There seems to be a bible of sorts, and might be worth testing out.
Yet I feel like this is all a distraction, like someone is harvesting the data of all the things tossed into these machines. I read inferences that the AI is learning as it goes, but who is teaching it? Why can’t we tell craiyon et al when something it turns is good or crap?
But going above and beyond this, a quote from Seb Chan’s Fresh and New email issue on Generative things and choosing the right words keeps circling back:
My social media feeds overflowed with DALL-E and Midjourney ‘promptism’ visuals. The coining of ‘promptism’ by others nicely deflects attention from the underlying technologies to the craft of finding the right language (the prompt) to ‘talk to the machine’. Its a bit like those people who seem to have a magical ability to craft ‘the perfect Google search query’ but aren’t trained librarians and have really just have done a bit of SEO work in their past.https://buttondown.email/sebchan/archive/79-generative-things-and-choosing-the-right-words/
Apparently as Seb mentioned, promptism has been manifesto-ized as a “movement”
Again I feel like the public attention on spawning out images of orangutans riding avocados done Impressionist style is diverting focus from asking – can we do do more than create 9 box cartoons? The use of AI to identify Holocaust victims at least seemed something a tad more useful even if as usual when i comes down to explaining it gets very fuzzy vague fast.
Can AI Improve Old Photos?
So when I saw a Research Buzz tweet with this article GFP-GAN is a New Free AI Tool That Can Fix Most Old Photos Instantly (GFP = “Generative Facial Prior”). Sure I could have reweeted and moved on to creating things like “A bottle of ranch testifying in court” (look there is a twitter account to follow @weirddalle) but I feel I understand better wand blog better when I can do things hands on.
The methodology is splained as:
The improved 1.3 version of the GFP-GAN model tries to analyze what is contained in the image to understand the content, and then fill in the gaps and add pixels to the missing sections. It uses a pre-trained StyleGAN-2 model to orient their own generative model at multiple scales during the encoding of the image down to the latent code and up to reconstruction.
Using additional metrics helps the AI enhance facial details, focusing on important local features like a person’s eyes, mouth, and nose. The system then compares the real image to the newly restored image to see if they still have the same person in the generated photo.https://petapixel.com/2022/07/28/gfpgan-is-a-new-free-ai-tool-that-can-fix-most-old-photos-instantly/
The Petapixel article used as an examples the pristine ones the software provides in a gallery, but I wan to push the buttons myself. So I headed over to the GFP-GAN demo site…
I started with a 1970s era photo of my Dad and his best friend who we called him “Uncle George” though was not a relative, and you can see it did do well to sharpen the faces.
The creators do note the limits:
GFP-GAN is not without its flaws though. According to the developers, while the restored images are much more detailed than previous and other versions, the current restored images are not very sharp and can have a “slight change of identity.” This means that in some cases, the restored images can sometimes look like a different person. …
Bouchard says even though the results are mostly fantastic and remarkably close to reality, “The resulting image will look just like our grandfather if we are lucky enough. But it may as well look like a complete stranger, and you need to keep that in consideration when you use these kinds of models.”https://petapixel.com/2022/07/28/gfpgan-is-a-new-free-ai-tool-that-can-fix-most-old-photos-instantly/
AndI see this a bit in my Dad’s face. To the average person, the improvement is clear, but to me, his son, I can say it looks like him, but there is something intangibly wrong,in the smile, the jaw, that leaves me feeling like it’s more “Dad-ish” than “Dad’.
I tried another old photo of my Dad as a teen and his Dad, my grandfather. I enjoy this photo because I never knew my Dad as a kid and my grandfather passed away before I was born. The results are again impressive, note especially the addition of some reflections in my grand-father’s eye glasses
Okay, the image is sharper, can we do more? Quite some time ago I played with some early AI web tools that attempted to colorize old photos. There’s a slew of them now, I just tried the one fro Hotpot
It’s again ‘Interesting” but problematic in my grandfather’s coat colors and also some weird changes in the background.
BackI went to GFP-GAN to see what it could do with a photo of my mother’s brother, Harvey, when he served in the Navy during World War II.
I have to be impressed here, not only with the clarity of his face, blurred in the original, but also how much more realistic his jacket and slacks look. Let’s try some Hotpot AI color:
This is pretty compelling, and I find myself wondering where this might be with the palm trees in the background. I had found his draft card indicating he served on the USS LST-892 which likely would have operated out of maybe San Diego.
One more,I tried a collage of scanned photos of my grandparents in their house that my parents moved in when my Dad’s father passed away. We moved from there when I was 2 so I don’t remember it at all.
Again, the sharpness in the faces counts as a definite improvement and almost makes me realize how much my memory/minds can fill the gaps of the slightly blurry original. They really pop to life in the colorized version, though I am fairly sure grandma did not have blue legs.
I do find it useful to see what this image improving algorithm can do. I am less sure where the AI is coming into play here, isn’t this just algorithms at work? I yet have an understanding of exactly what it is doing. Note-maybe I need to read the paper or cheat and watch the video (I just noticed I skipped over it)
And probably this sill is promptism in the machine- I toss an image into an AI site and it returns something.If it does not produce desirable results, I try to adjust by using a different image. I still have no insight at all on what steps the machine went through.
But a difference here, over just tossing in some box a phrase like “Santa taking a picture in a tornado” is my own stake in the photos being improved. There are things about the images no algorithm will ever know.
Promptism from 1964
As serendipity goes, today I experienced something that connected to this post in an unanticipated way. This being the first day of a vacation during the travel portion I partook in something I’ve not done in a long while, reading for pleasure. I grabbed a paperback off the shelf, a novel I had purchased a while ago, but never read, Philip K Dick’s The Penultimate Truth (light post apocalyptic reading!).
In the first pages, the speech writer Joseph Adams is putting to use a device:
At the keyboard of the rhetorizor he typed, carefully, the substantive he wanted. Squirrel. Then, after a good two minutes of sluggish, deep thought, the limiting adjective smart.
“Okay,” he said, aloud, and sat back,touched the rerun tab.
The rhetorizor, as Colleen reentered the library with her tall gin drink, begam to construct got him in the auddiemension. “It is a wise old squirrel,”i said tinnily (it possessed only a two-inch speaker), “and yet this litle fellow’s wisdom is not its own; nature has endowed i–“The Penultimate Truth, page 2
Adams gets angry with the generated response and slaps the machine to turn it off. His companion advises him that his prompt needs improvement.
“Dear,” Colleen said, and sighed.”I heard you type out only two semantic units. Give it more to ogpon”
“Listen.” he said grimly. “I want to see what this stupid assist machine that cost me fifteen thousand Wes-Dem dollars is going to do with that. I’m serious, I’m waiting.” He jabbed the rerun tab of the machine.The Penultimate Truth, page 2
So my casual reading of a scifi novel links me right back to this idea of us entering phrases into a machine with expectations of it generating content for us. Reminder- PKD visioned this in 1964.
Here we are in 2022 and promptism is peaking.
Featured Image: My own prompt generated creation, by me, a human. I took the results from craiyon the prompt “A techno utopian stares in boredom at his mobile phone” – I was going to comment on the gender but I did enter “his” rather than “a”. Mea cupla. This is superimposed on a google images search on “dall-e examples” – call it shared under CC BY as if anyone really could use this