Project Lighthouse

Hash matching is the foundation of CSAM detection. It allows platforms to identify known child sexual abuse material without any human ever viewing it.

A perceptual hash is a digital fingerprint derived from the visual structure of an image or video - its gradients, edges, and luminance patterns. When CSAM is confirmed by an analyst at an organization like NCMEC, C3P, or the IWF, a perceptual hash is generated and added to a database. Your platform generates a hash for every image or video uploaded by a user and compares it against that database. If there's a match, the content is known CSAM.

Unlike a traditional checksum, a perceptual hash survives common modifications. A resized, cropped, recompressed, or color-shifted copy of a known image will still match. This is what makes perceptual hashing essential. Offenders routinely modify images to evade detection, and perceptual hashing is the primary defense against that.

Why it matters

Hash matching is the only detection method that is both fully automated and produces near-zero false positives. The hash databases maintained by NCMEC, C3P, and the IWF contain only material that has been reviewed and confirmed by trained analysts - typically requiring independent confirmation by at least two or three people. A match against these databases means the content is known CSAM. There is no ambiguity.

Your platform can detect and remove CSAM at upload time with no human review required for the initial detection. Your legal exposure is minimal - acting on a confirmed hash match is always defensible.

Hash matching does not detect new, previously unseen CSAM. That requires classifier-based detection, which is effectively a non-starter for most platforms right now - the models are expensive to build, difficult to evaluate without access to illegal material, and carry significant false positive risk. Only the largest platforms have the resources and legal cover to operate classifiers in production. The majority of CSAM in circulation is redistributed known material, and hash matching alone will catch the bulk of it.

Available tools

Project Arachnid is the place to start. Operated by the Canadian Centre for Child Protection (C3P), it provides a free API that checks submitted images and videos against C3P's hash database. C3P is the easiest organization to work with and the fastest to get going. Arachnid's database is built from C3P's own web crawling and from contributions by a network of 18 hotlines and child protection organizations across 17 countries. The database contains material that may not be in NCMEC's or the IWF's databases. If you do nothing else, start here.

PhotoDNA is the most widely deployed perceptual hashing tool for CSAM detection, developed by Microsoft and donated to NCMEC. It covers images only - not video. Microsoft offers a hosted version through the Azure Content Safety API, which is free for qualifying platforms. Platforms can also self-host the PhotoDNA library, though access to the self-hosted version is harder to obtain and requires you to source your own hash database. For most platforms, the hosted API is the right choice.

IWF Image Intercept is the newest entrant, funded by the UK Home Office and aimed primarily at smaller platforms and startups. It checks uploads against the IWF's hash database of approximately 2.9 million hashes. It's designed to be simple to integrate - a code plugin rather than a full API integration. Access is currently limited to eligible smaller businesses. The IWF also licenses its hash list separately to members.

Use all of them. Each organization maintains its own database built from its own sources. NCMEC's is the largest. C3P's includes material found by Arachnid's web crawler that may never have been reported to NCMEC. The IWF's includes hashes from the UK Home Office's Child Abuse Image Database, which contains material from police investigations that may never have appeared online. A hash in one database may not be in another. Running all three maximizes your detection coverage, and the cost of running multiple tools is negligible.

Lighthouse connects to all three databases and handles the full flow - hash matching at upload, match verification, and reporting to NCMEC - through a single integration. See Integrating Lighthouse for setup.

Hash Matching