Duplication Detector

Duplication Detector, created for Wikipedia:Copyright problems on the English Wikipedia, is a tool used to compare any two web pages to identify text which has been copied from one to the other. Either, neither, or both pages may be current or old revisions of a Wikipedia article.

Please supply the URLs of two websites to compare (you can also choose, using the advanced version, to upload either document from your computer). The tool supports text, HTML, and PDF documents. For other types of documents, check Google's cache for an HTML version by doing a Google search for "cache:URL". To make the tool run faster for very large documents, increase minimum number of words to 3. For source documents containing scattered numerals, you may have to check "Remove numbers" to get the best matches.

Duplication Detector can see article text hidden by templates like {{copyvio}}, since the text is still in the HTML page source, but cannot see text that has been removed. You need to use the URL of an old revision in this case.

Simple version (generates pages that can be linked to):

Document 1 (URL):
Document 2 (URL):

Minimum number of words:
Minimum number of characters:
Remove quotations:
Remove numbers:


Advanced version (allows uploads):


Document 1 (URL):
(or) Document 1 (Upload):

Document 2 (URL):
(or) Document 2 (Upload):

Minimum number of words:
Minimum number of characters:
Remove quotations:
Remove numbers:


Things to do in the future:

If you have any questions about Duplication Detector, please contact its author Derrick Coetzee at his talk page on English Wikipedia.

The PHP source for Duplication Detector is available under the Simplified BSD License. It does not require Toolserver to run, so feel free to download and use it yourself using your own webserver or php command-line tool. (.tar.gz) (.zip) Latest version available from Github.