October 17, 2013

CAPTCHA on its Last Legs for Authentication


Thumbnail image for SAT-for-TechNotes.jpg
Whether you are familiar with the name "captcha", we have all encountered, loved, and hated them regularly.  A "captcha" is one of those images containing squiggly and distorted text that one must enter in order to perform some function.  For instance, you generally see them when you sign up for the first time for e-mail services or when you try to establish a new user name and password - such as a new online service like an e-commerce account. They are particularly evident is in blogs and other discussion forums, as shown in the figure below.  In fact, an older Webtorials TechNote was exactly what lead to our discovery that this is a test that will soon become meaningless.  And on further study we are currently testing eliminating captchas in hope that removing this annoying and ineffective feature will facilitate discussion.

captcha-sample.JPG
Our Experience

First a little background.  According to many sources, captcha" is an acronym for "Completely Automatic Public Turing test to tell Computers and Humans Apart." The idea, of course, is ensuring that a form is being submitted by a person rather than a bot or other software operation.

We love to have comments on our articles at Webtorials.  However, we greatly prefer that the comments be from real people.  So I was puzzled early this week when one of our contributors noted that she was getting a lot of spam from us.

As it turns out, the messages were indeed generated by Webtorials, but they were behaving as the system was designed.  Whenever a comment is entered, the Webtorials back office sends an e-mail to our staff for comment approval.  If the messages are deemed to be "real," they get posted, sometimes with a little editing for clarity.  And if the comment is spam, then we simply ignore it.

In this case, a comment was entered, including a valid captcha, and sent to the author for approval.  On further examination, we saw that there were literally hundreds of entries, with the frequency escalating, over the period of less than a week.  The figure included here shows an example of what we saw on our system.

 
captcha2-redacted.jpg

So, how much of this is real information?  The email address along with other information is redacted because we think it might be valid. (I sent an email and it did not bounce.  Not very convincing, but...) 

Concerning the validity of the IP address, our colleague Lisa Phifer at Core Competence notes that the "IP address isn't false but rather useless as a method of directly fingering a real spammer. Spammers use bots and redirection more often than IP spoofing. In this case, it's easy to find the IP on lists of comment spammers ( https://www.projecthoneypot.org/ip_117.26.195.122). Whois on the IP shows it belongs to http://www.fjtelecom.com and so is probably a DHCP pool address assigned to a residential broadband subscriber device (probably infected with a bot)."

Since these comments were for an older TechNote, and in fact in an unused directory, remediation was simple.  I turned off the ability to comment.  And in the future we may have to limit the comment period to perhaps 90 days simply in order to prevent this type of abuse.

Can the Problem Be Solved?

But the bigger lesson is that with advances in OCR (Optical Character Recognition) technology, captchas have to be more and more complex in order to be effective.  Perhaps you have already noticed that the captchas are more difficult to read, and this is precisely why.

So this began the next quest.  Exactly how difficult is it to read a generic captcha, such as the one displayed below?  Thank goodness, it was not trivial.  I tried several methods, such as asking Adobe Acrobat to use its built-in OCR, and failed.  A trial of several Internet-based OCR systems likewise failed.  Nevertheless, one of the on-line sites in fact had a rather advanced captcha with an apology that they knew the captcha was inconvenient, but hackers had already been using their site for...  reading captchas!  The good news is that with a reasonable amount of effort (about an hour), I was not able to read the test captcha using an optical reader.  

But then I had a "duh" (as opposed to "aha") moment.  I realized that if I googled "captcha reader" then the number of readily available tools is astounding. (Lisa also suggested "captcha breaker.")

One of the options for most captcha implementations is to offer an audio version so you may "listen" to the code.  (We do not offer this at Webtorials.)  However, this is hardly a solution to the issue because speech recognition technology is likewise advancing quickly - even for garbled speech.  And, of course, you have to be able to hear the output.  So, while you are investigating this, try googling "audio captcha breaker."

For an overview of the efficacy of various captcha methods, check out this site

At Webtorials, we find it difficult to understand why somebody would spend the energy to fill in comment forms with bogus information in the first place.  We love our information at Webtorials, but we're giving it out for free, so it's not exactly like breaking into a bank account. Nevertheless, it's difficult to find a way to stop a particular behavior without a clear indication as to why the behavior is taking place in the first place.

Action Items

Unfortunately, this is not a problem with a ready solution.  At least from my perspective, the more advanced captcha images are already becoming quite difficult to decipher, and I find myself requesting several variations to get a usable set of images.

This may very well turn out to be one of those ideas that seemed good at the time, but ultimately a different solution is needed.  We are quickly approaching a point where OCR capabilities will be so advanced that they will equal that ability of humans to "read" a code.

So, perhaps Scientific American author David Pogue got it right when he wrote about eighteen months ago that captcha really should stand for "Computers Annoying People with Time-Wasting Challenges That Howl for Alternatives." 


8 Comments

"Eight Alternatives to the Hated Captcha" and, for the more technically oriented, "10 Things to Check Before Using a CAPTCHA" give some inventive suggestions. I would love to hear from our readers as to how you are planning to deal with authentication alternatives.

For a few non-CAPTCHA methods of deterring comment spam, look here.

People do not "waste" their time doing this at all: its normally all done by software (search scrapebox or GSA search engine ranker) and the idea is that people leave a comment and the comment will contain a link to their website.

So I am very surprised at that comment in this article "At Webtorials, we find it difficult to understand why somebody would spend the energy to fill in comment forms with bogus information in the first place"

Often the username will be the keyword they are trying to rank for... its simple a way to try to improve a websites ranking.. Google see's the comment and the link to the website and it's a backlink.. Simple as that really..

Not everysite moderates comments like you do..

As an aside... I haven't had to enter a captcha to leave this comment.

The motivation in this case appears to be advertising. Take a look at the URLs in the comment text of the post. For example, http://www(dot)ca57(dot)com/-c-8.html, is advertising women's handbags - probably counterfeit handbags.

[This is in reference to the comment "We love our information at Webtorials, but we're giving it out for free, so it's not exactly like breaking into a bank account."]

An alternative to proving that end user is human is to check and sanitize the input for comments to eliminate javascript, binary characters, etc. Essentially, consider that comments should be text only, not other stuff. This is the approach that Web Application Firewalls (WAF) take. You do have a WAF protecting your site don't you?

Indeed, SEO (Search Engine Optimization) does seem to be the primary motivation for spamming comment fields, so thanks to both Karl and David!

No, Karl, we do not have a WAF per se. There's not enough traffic. But we do have some pretty stringent restrictions set to make sure that links in comments don't get searched. However, this does seem appropriate for larger enterprises. Also, you will notice that I edited your comment to use (dot) as the separator so the URL did not get searched.

Of course, this raises the entire question of both the efficacy and the ethics of SEO as a common business practice. I had intended to mention this topic as a point-of-interest only, but I am becoming convinced that it deserves a TechNote of its own.

What if you required each person to have an account and to login to the account before being able to post a comment? Wouldn't this method work? I don't think that bots have accounts.

Requiring usernames and accounts would VERY significantly if not totally stop spammed comments. You are precisely correct.

However, over the past two years, we have made significant attempts to eliminate the need for usernames and passwords, both for accessing Webtorials and for commenting at Webtorials.

Our reasoning is twofold. First, we are trying to create an open community where people can participate easily. Requiring that users remember yet another username/password (and our having to maintain username/password files, lookups, etc.) is in contrast to our trying to promote this open community.

Second, there is a security risk the we would like to avoid. Many people use the same username and password for several or even many sites. If any one of the sites where usernames and passwords are stored are compromised, then there is a possible security breach for many sites.

For more on advanced authentication methods, we just published a Practical Methods for Improving Authentication.

I read somewhere that some organizations were using CAPTCHAs as a crowd-sourcing means of validating the machine (sic!) digitalization of historical (typed?) documents to expand the content of historical material available online. (Images typically in courier type-face, admittedly rarely used in today's CAPTCHAs). So whilst CAPTCHAs might be annoying, and ineffective as an authentication method, at least I could see that there was some benefit to them.

Search Webtorials

Get E-News and Notices via Email


  

 



  

I accept Webtorials' Terms and Conditions.

Trending Discussions

See more discussions...

Featured Sponsor Microsites






















Archives

Notices

Please note: By downloading this information, you acknowledge that the sponsor(s) of this information may contact you, providing that they give you the option of opting out of further communications from them concerning this information.  Also, by your downloading this information, you agree that the information is for your personal use only and that this information may not be retransmitted to others or reposted on another web site.  Continuing past this point indicates your acceptance of our terms of use as specified at Terms of Use.

Webtorial® is a registered servicemark of Distributed Networking Associates. The Webtorial logo is a servicemark of Distributed Networking Associates. Copyright 1999-2018, Distributed Networking Associates, Inc.