JAW Speak

Jonathan Andrew Wolter

One Unit Test should have Prevented Google from Categorizing the Entire Internet as Malware

with 14 comments

Reading time: 2 – 3 minutes

The google wide massive glitch which this morning, categorized nearly every search result as malware could have been prevented by unit testing. This is a wonderful example why even “silly little scripts” should be test driven.

This is what happened. A file was checked in with a ‘/’ in it. This file listed some or all of the sites Google warns to be dangerous malware sites. When it rolled out to different data centers it it caused search results to be rendered with warning messages next to almost every search result.

One could have written unit tests to assert that typos (such as ‘/’) are not parsed, accidentally causing the entire web to be miscategorized. I really hope to read more about this on the Google Testing Blog, it’s a prime example for this company to take leadership and further promote Unit Testing, and even the opportunity for Test Driven Development. Miško, care to take that post up?

They handled it well, and Marissa Mayer made a post on the Official Google Blog:

We periodically update that list and released one such update to the site this morning. Unfortunately (and here’s the human error), the URL of ‘/’ was mistakenly checked in as a value to the file and ‘/’ expands to all URLs.

I say let’s use this as a publicity moment for testing. I am certain that testing has literally prevented many disasters in complex systems, however because the tests did their work — they got no publicity. Let’s see promotion of “lessons learned,” Google!

Where have you been saved by tests? Or, where was your big blunder that a test could have fixed?

Bookmark and Share

Written by Jonathan

January 31st, 2009 at 4:30 pm

Posted in code, testability

14 Responses to 'One Unit Test should have Prevented Google from Categorizing the Entire Internet as Malware'

Subscribe to comments with RSS or TrackBack to 'One Unit Test should have Prevented Google from Categorizing the Entire Internet as Malware'.

  1. It’s an interesting point. ‘/’ is not a typo, so how would that test get written?

    Including ‘/’ is disastrous, but it’s neither syntactically nor semantically wrong. Someone would have to write a test that says that this URL is valid but disallowed. That’s a kind of thinking about testing that’s rather subtle and infrequently practised, especially amongst those who think that the job of tests is to trap defects.

    On the other hand, if we believe that the job of tests is to show by example that the right thing happens then we might be in with a chance. Just so long as “not blacklisting the whole web” occurs to someone as being correct behaviour.

    Keith Braithwaite

    31 Jan 09 at 4:48 pm

  2. Yes, I say that ‘/’ is a typo. It is incorrect and produces behavior that Google clearly never wants to have happen. It is as wrong as code saying that 1 + 1 = 3.

    The larger issue is how data files are dealt with and trusted, or not trusted. Just like we should never trust user input from the web, it is not advisable to trust config files when the downside is too large.

    When we have a behavior that we want to assert, I find tests to be the best way to do this — and run them in an automated testing suite.

    I think the file parser should have read in the settings, and for ‘/’ (and perhaps others such as ‘/com’ or ‘/com/google’) ignored them or thrown an exception. This would have been detected immediately as the site reliability engineers deployed the file, and it would be immediately rolled back.

    Another way to catch this is with an integration test. The system could have blown up as soon as it would have read such an invalid file.

    Jonathan

    31 Jan 09 at 6:22 pm

  3. The question is – even with a TDD approach, would this have been considered “something to test’?

    TDD says to stop testing when you can’t think of anything more to test. Humans aren’t perfect; we make mistakes and sometimes we don’t think of every possibility.

    Robert

    31 Jan 09 at 7:06 pm

  4. Yes, and unit testing should have prevented 9/11 too! Come on! You testing zealots are driving me nuts. I am certain that testing has literally CAUSED many disasters in complex systems, because of the false sense of security that it gives to poor developers using poor technologies. I say let’s use this as a publicity moment for humility and technologies that minimize human intervention and error. I can’t believe they’re editing a typo-ready text file to list malware sites!

  5. @Robert,

    Good point. But if the post from Marissa Mayer indicates that google manages very important things such as this with a file checked in… then I think the team should feel compelled to be paranoid.

    For me, testing replaces Fear with Boredom.

    @Sebastien,

    Great to hear some passion in the comments! In my experience a well tested codebase is much more flexible. Code that just works magically and only is observed in a full integration test, or by starting it up and using it is very brittle. Big code always needs refactoring, sooner or later. Tests give us the power to actually refactor.

    It is interesting that a text file is used to list these sites. You would be expecting something like a database with associated controls and tests around what can possibly be valid data, right? Well, if you built a whole app to do it — I think tests would be sensible. Just because it is a simple text file does not mean it should not be tested.

    There is a complexity branch on every one of the urls in that list. We could think of this data as highly complex, and warranting extensive testing. Especially if there isn’t another application on top of it all (which would probably have tests around the models).

    Jonathan

    31 Jan 09 at 9:33 pm

  6. accidents happen. You can’t test everything even when you are one of the biggest companies like Google.

    rishi

    1 Feb 09 at 1:50 pm

  7. I’ll be a monkey’s uncle if Google don’t already do some validations of their configuration files. That’s a kind of testing that I haven’t seem many teams do. I guaruntee that there’s been some additional test tests written for that file now.

    How many other teams validate production configuration before they deploy it?

    Julian Simpson

    2 Feb 09 at 11:28 am

  8. It is very easy to look at the result and say they should do this and do that.

    To define a good testing cases for all the possibility is not easy. People and company are learning from mistake. Let hope that they will do better. However, what I don’t like is Google’s monopoly power. Somebody need to do something so we will NOT depend on google alone.

    Emily

    2 Feb 09 at 4:12 pm

  9. The ‘/’ case is extremely obscure, and I would not blame anyone for failing to test for it. In fact, I’m pretty sure there are other string combos in the system that will go nuts.

    A test that would make more sense is to run a list of trusted sites (e.g. google.com itself) through the filter. If any sites fail then there is likely a major problem in the classification code. It’s not foolproof, but would go a long way to detect the sort of catastrophic failure we saw.

    Jerry

    2 Feb 09 at 4:20 pm

  10. “One Unit Test should have Prevented Google from Categorizing the Entire Internet as Malware”

    And a company of engineers, a distributed database, a web crawler, PageRank, a monetization system to fund it, and a bunch of datacenters is all that prevented us armchair TDD quarterbacks from having a hit search engine. :P

    They obviously do a bunch of testing already. Maybe if they’d spent any more effort doing testing early on, some butterfly-hurricane chain would have caused them never to gotten so big so fast.

    It’s easy to criticize. But we wouldn’t have anything to criticize if they hadn’t first built the one website that basically directs the entire web today. Why aren’t we praising the little-fish search engines for winning the battle but losing the war?

    Ken

    2 Feb 09 at 6:57 pm

  11. I agree with @Jerry, using information that Google has about the web, such as a whitelist of trusted sites, would be more useful than trying to validate the filter by manually writing test cases.

    Casper Kuijjer

    3 Feb 09 at 4:31 pm

  12. You crazy unit testing bastard. Why don’t you go work for Google if you’re so smart!

    Dave C

    3 Feb 09 at 11:35 pm

  13. Greetings I am so happy I found your website, I really found you by accident, while I was looking on Bing for
    something else, Anyhow I am here now and would just like to say thanks a lot for a remarkable post and a all round thrilling blog (I
    also love the theme/design), I don’t have time to go through
    it all at the minute but I have bookmarked iit and alsoo added your RSS feeds,
    so when I have time I will be back to read much more, Please do keep
    up the awesome job.

  14. budget rental truck discount Codes…

    One Unit Test should have Prevented Google from Categorizing the Entire Internet as Malware at JAW Speak…

Leave a Reply