Apparently reCAPTCHA has Digitized All the Books

When I first came across the reCapctha project, then at Carnegie-Mellon, now a Google owned product, I thought the concept was one of the most clever things ever. Those two scrambled word things you have to type into a box on a web form were there to make it harder for bots to pretend to be people and flood web sites or spam web forms.

Yes, its annoying- but the repeated effort by millions of people was actually helping to improve the digitization of books. One word was one whose digitization was known, a control word, and if enough people identified the other, its digitization could be fairly reliable concerned (see the examples of its accuracy and a worthy TED Talk by Luis von Ahn about reCaptcha and some fun misplaced captchas).

I even played around in 2008 with taking these pairs of words and making a story out of them and later made a ds106 assignment along these lines.

But those were the good ole captcha days, when you actually got words to untangle. In the last two years or so, I have noticed that what we were presented was not even full words, but word fragments; I guess to make it harder for those pesky bots.

And then the images of numbers started appearing. And now, I see that all we get are numbers. What the captcha is going on?

Apparently all the books are done, and the numbers are actually snippets from Google Streetview, so what we are doing in proving out humanness to captchas is helping Google better identify street addresses.

Maybe your address.

But it’s even more sophisticated then that. If I read it correctly, Google is tracking activity before and after the captcha to better “know” you (if you are a person or a bot), and sending you easier captchas if it thinks you are human,

The reCAPTCHA team has been performing extensive research and making steady improvements to learn how to better protect users from attackers. As a result, reCAPTCHA is now more adaptive and better-equipped to distinguish legitimate users from automated software.

The updated system uses advanced risk analysis techniques, actively considering the user’s entire engagement with the CAPTCHA””before, during and after they interact with it. That means that today the distorted letters serve less as a test of humanity and more as a medium of engagement to elicit a broad range of cues that characterize humans and bots.

As part of this, we’ve recently released an update that creates different classes of CAPTCHAs for different kinds of users. This multi-faceted approach allows us to determine whether a potential user is actually a human or not, and serve our legitimate users CAPTCHAs that most of them will find easy to solve. Bots, on the other hand, will see CAPTCHAs that are considerably more difficult and designed to stop them from getting through.

So if you see a captcha that looks like:

Congrats! Google things you are a plain human, and you get the easy captcha (and again, you are helping them confirm the machine reading of street addresses).

I do wonder what you have to do for Google to toss you a bot challenge captcha.

But you can rest assured knowing that all the books have been digitized. They do need to change the web site, the book digitizing era is gone.

26645249! 4563285! 4822259!

Comments

I’m just thankful because the numbers are so much easier than some of the crazy words they would display. I loved the idea of the project but some of those captchas attempting to tell if I was human or not had me questioning myself.

Alan Levine aka CogDog says:

December 19, 2013 at 9:18 am

I agree with that, the primary consideration should be how little it interferes with the user experience.

It is just less interesting than real words as was dined when it first came out. I think I have seen one “106” in the street numbers!

Reply

Pingback: LOC, Civil War, South Georgia, More: Short Thursday Morning Buzz, December 19, 2013 | ResearchBuzz

Didn’t know that bit about helping to identify streetview addresses. Kinda bugs me, actually. I always like the little altruistic boost I got when I filled in a reCaptcha thinking I was doing a tiny bit to help preserve a little slice of culture and knowledge.I don’t have that same feeling thinking I am helping to identify an address in streetview. Just not the same.

Maybe with the recent legal win for Google Books they’ll resume or enhance the book work. They’ve only digitized the books they’ve scanned…which is a long way from all of them!

Alan Levine aka CogDog says:

December 23, 2013 at 8:05 am

I figured it out! They are digitizing a picture book with a lot of numbers. IT’S A COOKBOOK!!!!!

Reply

Pingback: reCAPTCHA : What’s Google doing now?? | O(lol n)

Bill Smith says:

January 18, 2014 at 12:01 pm

Hey, thanks for informing us on this topic! Now I have fun (especially on the Daily Create submissions) typing in incorrect responses for the image. It will be accepted if the computer generated number is correct. Just one out of a few million entries, but I’m hacking it!

Slow down the machine!

Reply

Pingback: Why I hate CAPTCHA (or “I am a Human!) | MarkjOwen's Blog

Blog Pile

Apparently reCAPTCHA has Digitized All the Books

Comments

Leave a Reply Cancel reply

My Profile

Your Profile