When I first came across the reCapctha project, then at Carnegie-Mellon, now a Google owned product, I thought the concept was one of the most clever things ever. Those two scrambled word things you have to type into a box on a web form were there to make it harder for bots to pretend to be people and flood web sites or spam web forms.

Yes, its annoying- but the repeated effort by millions of people was actually helping to improve the digitization of books. One word was one whose digitization was known, a control word, and if enough people identified the other, its digitization could be fairly reliable concerned (see the examples of its accuracy and a worthy TED Talk by Luis von Ahn about reCaptcha and some fun misplaced captchas).

I even played around in 2008 with taking these pairs of words and making a story out of them and later made a ds106 assignment along these lines.

But those were the good ole captcha days, when you actually got words to untangle. In the last two years or so, I have noticed that what we were presented was not even full words, but word fragments; I guess to make it harder for those pesky bots.

And then the images of numbers started appearing. And now, I see that all we get are numbers. What the captcha is going on?

recaptcha9915262

Apparently all the books are done, and the numbers are actually snippets from Google Streetview, so what we are doing in proving out humanness to captchas is helping Google better identify street addresses.

Maybe your address.

But it’s even more sophisticated then that. If I read it correctly, Google is tracking activity before and after the captcha to better “know” you (if you are a person or a bot), and sending you easier captchas if it thinks you are human,

The reCAPTCHA team has been performing extensive research and making steady improvements to learn how to better protect users from attackers. As a result, reCAPTCHA is now more adaptive and better-equipped to distinguish legitimate users from automated software.

The updated system uses advanced risk analysis techniques, actively considering the user’s entire engagement with the CAPTCHA””before, during and after they interact with it. That means that today the distorted letters serve less as a test of humanity and more as a medium of engagement to elicit a broad range of cues that characterize humans and bots.

As part of this, we’ve recently released an update that creates different classes of CAPTCHAs for different kinds of users. This multi-faceted approach allows us to determine whether a potential user is actually a human or not, and serve our legitimate users CAPTCHAs that most of them will find easy to solve. Bots, on the other hand, will see CAPTCHAs that are considerably more difficult and designed to stop them from getting through.

So if you see a captcha that looks like:

no more words

Congrats! Google things you are a plain human, and you get the easy captcha (and again, you are helping them confirm the machine reading of street addresses).

I do wonder what you have to do for Google to toss you a bot challenge captcha.

But you can rest assured knowing that all the books have been digitized. They do need to change the web site, the book digitizing era is gone.

26645249! 4563285! 4822259!

The post "Apparently reCAPTCHA has Digitized All the Books" was originally squeezed out of the bottom of an old rusted tube of toothpaste at CogDogBlog (http://cogdogblog.com/2013/12/recaptcha-all-the-books/) on December 18, 2013.

9 Comments

Leave a Comment

All fields are required. Your email address will not be published.