I’m running a self-hosted version of FTR (3.9.11) and believe I’ve stumbled upon a bug with fingerprints. Per the docs, load order should be:
- Custom/standard site configs
- Fingerprint matches
- Global rules
But I’m seeing:
- Custom/standard site configs
- Global rules
- Fingerprint matches
As best as I can tell, FTR loads and caches global rules when searching for headers/single page links. Fingerprints get tacked on afterwards, once HTML content is retrieved — so global rules are prioritized.
Tried turning off caching, but that didn’t help. Please let me know if I’ve misconfigured something.
We’ll take a look at this.
Could you please provide an example where the loading order is interfering with the config rules you’re trying to set up?
Using a fresh install of 3.9.11:
- Add
title: 'Load me first'
to the the top of the standard .medium.com.txt
site config
- This makes
Load me first
the title of every Medium post that doesn’t require a fingerprint match: https://blog.medium.com/weve-acquired-knowable-to-accelerate-our-growth-and-bring-audio-to-medium-2b7898e06891
- But it doesn’t work for those that do:
https://levelup.gitconnected.com/automate-whatsapp-messages-with-python-instantly-3e08d6600612
This change to line 134 of ContentExtractor.php
shows how global rules are being loaded/cached before FTR has retrieved the HTML for fingerprints. It delays loading the global config until we have HTML, which “fixes” the fingerprint issue above.
// load global config?
if ($config->autodetect_on_failure() && $html != "") {
Just a quick hack vs. the desired solution, obviously, since this prevents the use of global rules for headers/single page links.
Hope this helps!