Global configs load before fingerprints

I’m running a self-hosted version of FTR (3.9.11) and believe I’ve stumbled upon a bug with fingerprints. Per the docs, load order should be:

  1. Custom/standard site configs
  2. Fingerprint matches
  3. Global rules

But I’m seeing:

  1. Custom/standard site configs
  2. Global rules
  3. Fingerprint matches

As best as I can tell, FTR loads and caches global rules when searching for headers/single page links. Fingerprints get tacked on afterwards, once HTML content is retrieved — so global rules are prioritized.

Tried turning off caching, but that didn’t help. Please let me know if I’ve misconfigured something.

We’ll take a look at this.

Could you please provide an example where the loading order is interfering with the config rules you’re trying to set up?

Using a fresh install of 3.9.11:

  • Add title: 'Load me first' to the the top of the standard .medium.com.txt site config
  • This makes Load me first the title of every Medium post that doesn’t require a fingerprint match: https://blog.medium.com/weve-acquired-knowable-to-accelerate-our-growth-and-bring-audio-to-medium-2b7898e06891
  • But it doesn’t work for those that do: https://levelup.gitconnected.com/automate-whatsapp-messages-with-python-instantly-3e08d6600612

This change to line 134 of ContentExtractor.php shows how global rules are being loaded/cached before FTR has retrieved the HTML for fingerprints. It delays loading the global config until we have HTML, which “fixes” the fingerprint issue above.

		// load global config?
		if ($config->autodetect_on_failure() && $html != "") {

Just a quick hack vs. the desired solution, obviously, since this prevents the use of global rules for headers/single page links.

Hope this helps!