CloudFlare not minifying code for Googlebot / Crawlers

Update (12/27/2017): I got an email from Scott at CloudFlare today letting me know that a fix had been deployed. After checking, it does now seem that CloudFlare is correctly minifying pages for Googlebot and W3C.

In doing some Website debugging recently, I noticed that CloudFlare is not “minifying” HTML code for some clients. Unfortunately, one such client is the Googlebot crawler/robot.

This is problematic because I had been using the Cloudflare auto-minification feature under the premise that it would aid with loading speed optimization, which has been a component of Google’s ranking algorithm since at least 2010.

However, as you can see here, Google’s crawlers are receiving the un-minified version of the HTML code (notice the highlighted HTML comments and the leading whitespaces):

HTML Code as Fetched by Google

Whereas my Web browser (Chrome) is receiving minified HTML code:

HTML Code in Web Browser

I did submit a support ticket to CloudFlare seeking clarification. Their response was:

“Our minification service is particularly conservative, and will break at the first perceived error. “

Understandable. They recommended that I use the W3C validator to validate HTML code (which, of course, I had already done and verified the code was 100% compliant). Oh, and by the way, the W3C validator also receives the unminified HTML code with comments and white-spacing.

But even if there were HTML errors, the real conundrum here is why two different clients are receiving different code; A regular browser such as Chrome does receive minified/optimized code, yet Googlebot and the W3C validator does not.

Another observation I have made is that “spoofing” the user-agent to Googlebot from my computer does not return un-minified code. Hence, the determining factor is not solely the client’s user-agent; It is more likely based on IP or reverse DNS lookup.

For now, I am operating on the assumption that CloudFlare intentionally bypass minification for certain high-traffic clients, such as Web crawlers. This makes sense from a resource utilization standpoint because minification does consume additional CPU compute cycles upon each request. Googlebot alone probably makes tens of millions of requests to the CloudFlare servers every day. Simply disabling that feature for Googlebot would advantageous.

Unfortunately, that which is advantageous for CloudFlare is disadvantageous for me in terms of Google rankings. So, it may now be time to install mod_pagespeed and/or research other options (which will probably mean moving away from CloudFlare).

Sad Face.