How can Google tell if a site may be hacked?
I have worked with Google privately on some of their technology and was one of the few people they called for help when they themselves were hacked by China who stole their source code.
Google have quite a few tools in their arsenal, and more have likely been added since.
To name a few:
- They acquired and now fully host Virustotal, one of the biggest AV engine aggregators in the world that can simultaneously scan files with 40–50 antivirus engines looking for signs of malware.
- They have a network of independent proxies that will load a webpage and run it inside of a custom sandbox. Any unexpected system level modifications detected inside that sandbox help automatically report unwanted malicious behavior.
- They perform malware detection on the sites they crawl natively and run them through simple algorithmic checks looking for well known malicious third party scripts and embedded code.
- They partner with third party data providers that feed them potential malicious seed data.
It is Google's job to present data that is of highest quality for the browsing and searching experience. Helping prevent attacks against its users is keenly part of that tenant.
Quite a few tools also help the web administrator deal with such problems proactively such as reporting on malicious content by AS (BGP Autonomous System Level Reporting), or directly to the administrator through tools such as the Webmaster Tools.