前言
使用缓存,可以避免不必要的页面服务端处理和渲染开销。特别对于博客这样以静态文章为主的场景,缓存的好处显得更加明显。
一直以来,我都在利用 WP Super Cache 插件对 WordPress 博客进行缓存,套上 Cloudflare 后便自以为是地认为缓存做的不错。
问题
直到最近抓包发现,我博客的 Response Header 里有这么一行字眼: Cf-Cache-Status: DYNAMIC。
咦?我利用 Super Cache 将页面缓存成 html 文件后,按理来说后端服务器返回的应该是一个 html 文件,而不是 php 实时渲染的内容。
这显然是静态文件的情况,为什么 Cloudflare 将其当作像 API 一样的动态请求,而没有出现 HIT 或者 MISS 的字眼呢?这种情况下,对于每次 html 页面的请求,Cloudflare 依然会每次回源,由源服务器提供缓存在本地的静态文件。
原来,Cloudflare 为了防止无意缓存含有可变内容的页面,而默认不会缓存 html 文件:
Cloudflare does not cache HTML resources automatically. This prevents us from unintentionally caching pages that often contain dynamic elements. For example, the content on certain HTML pages may change based on specific visitor characteristics, such as authentication, personalization, and shopping cart information.
但同时为我们 指明了方向:
However, you can configure HTML caching through specific Cloudflare Page Rules settings. The degree of HTML caching flexibility varies based on your domain plan as described in the best practice sections below.
那么,我们还需要手动设定缓存规则,让 Cloudflare 缓存页面。
设置 WP Super Cache
重新检查一下 WP Super Cache 的本地缓存规则,并显式加入有关 Cache-Control 的 header,有助于 Cloudflare 更好地做前置缓存。
我针对常见资源文件使用一个月的缓存时间,对 html 类的文件使用半个月的缓存时间。以下是 nginx 虚拟主机配置文件的摘要:
map $http_cf_visitor $cf_scheme { '{"scheme":"http"}' ''; '{"scheme":"https"}' 'on'; default ''; } set_real_ip_from 0.0.0.0/0; set_real_ip_from ::/0; set_real_ip_from unix:; real_ip_header CF-Connecting-IP; real_ip_recursive off; server { listen 80; root <hidden>; index index.html index.htm index.php; server_name <hidden>; location = /favicon.ico { log_not_found off; access_log off; } location = /robots.txt { log_not_found off; access_log off; } location ~* .(ogg|ogv|svg|svgz|eot|otf|woff|woff2|tiff|mp4|ttf|css|rss|atom|js|jpg|jpeg|gif|png|ico|zip|tgz|gz|rar|bz2|doc|xls|exe|ppt|tar|mid|midi|wav|bmp|rtf)$ { add_header Cache-Control "public, max-age=2592000"; log_not_found off; access_log off; } set $cache_uri $request_uri; if ($request_method = POST) { set $cache_uri 'null cache'; } if ($query_string != "") { set $cache_uri 'null cache'; } # Don't cache uris containing the following segments if ($request_uri ~* "(/wp-admin/|/xmlrpc.php|/wp-(app|cron|login|register|mail).php|wp-.*.php|/feed/|index.php|wp-comments-popup.php|wp-links-opml.php|wp-locations.php|sitemap(_index)?.xml|[a-z0-9_-]+-sitemap([0-9]+)?.xml)") { set $cache_uri 'null cache'; } # Don't use the cache for logged in users or recent commenters if ($http_cookie ~* "comment_author|wp-postpass|wordpress_logged_in") { set $cache_uri 'null cache'; } location / { add_header Cache-Control "public, max-age=1296000"; try_files /wp-content/cache/supercache/$http_host/$cache_uri/index-https.html $uri $uri/ /index.php?$args ; } location ~ \.php$ { include snippets/fastcgi-php.conf; fastcgi_param HTTPS $cf_scheme if_not_empty; fastcgi_pass unix:/run/php/php-fpm.sock; } # deny access to .htaccess files, if Apache's document root # concurs with nginx's one location ~ /\.ht { deny all; } }
设置 Cloudflare
在 Cloudflare 的 “Caching” -> “Cache Rules” 里:
首先增加 BYPASS 的规则。因为,谁也不想将已登录用户看到的页面缓存下来给所有人看 🤯。
创建如下的 “Custom filter expression”,并对这些情况(php 文件,常见 WordPress 功能,robots.txt,Sitemap,RSS Feed,已登录用户 和 在我们网站上发表过评论的用户)做 Bypass cache 处理。
( (http.host eq "iedon.com") and ( (starts_with(http.request.uri.path, "/wp-admin")) or (http.request.uri.path.extension eq "php") or (http.request.uri.path eq "/feed") or (http.cookie contains "comment_author") or (http.cookie contains "wp-postpass") or (http.cookie contains "wordpress_logged_in") or (starts_with(http.user_agent, "WordPress/")) ) )
最后增加主规则,”Eligible for cache” 并且 “Use cache-control header if present, use default Cloudflare caching behavior if not”:
(http.host eq "iedon.com")
部署好规则后,看起来像这样:
踩坑
2024/08/04:发现规则顺序有误
For conflicting settings (for example, bypass cache versus eligible for cache), the last matching rule wins. For example, if cache rule #1 is set to cache everything on example.com/images and cache rule #2 is set to bypass cache on example.com, then cache will be bypassed for all URLs that match example.com, since rule #2 is the last matching rule.
上文来自官方 Doc,提到规则的匹配是同时进行,如有冲突,则以最后匹配者获胜。而非我实现理解的按照 Order 进行执行,一旦命中规则后停止向下执行。
正确做法应该像这样(为了避免日后遗忘,有意加上实际执行顺序的标记):
参考
- Customize Caching with Cloudflare Rules, https://developers.cloudflare.com/cache/troubleshooting/customize-caching/
- Guide to HTML Full Page Caching with Cloudflare for WordPress, https://servebolt.com/help/cloudflare/guide-to-html-full-page-caching-with-cloudflare-for-wordpress/
Leave a Reply