Have you ever read anything online and thought “that looks familiar” only to find it was lifted directly from your website? That’s exactly what happened to me recently, so I asked around to see what tools were out there to identify plagiarised copy and the potential impact on search results.
I am a fairly prolific publisher of online content. As well as my day job writing corporate copy for clients, I post regularly here and something new at least once a week on my football culture blog, Outside Write. Over the nearly two decades that I’ve been producing online content I’ve only twice had to pull someone up for using my content – photography, in both cases – although it was pure luck that I spotted them, not a dedicated search.
But then recently a friend of mine who had written a guest post messaged me: a website had lifted an entire paragraph word-for-word from his guest post on my site into their copy. Worse still, it was being used to sell a product.
Luckily, I knew the site’s editor from the circuit, highlighted it privately by email, he apologised and promised to raise it with his writer. The copy was changed that day. Not long after I read about a whole book on GDPR that had been pulled by its publisher because of complaints of alleged basic errors and plagiarised copy. As someone who’s written around 100,000 words on GDPR for clients in the last two years, these incidents got me thinking about plagiarism and how to deal with it.
Does plagiarism impact search engine results?
Before looking into plagiarism detection tools, I wanted to understand the potential impact any existing copy lifting may have on my websites. Outside Write in particular relies heavily on search traffic – nearly three-quarters (73%) comes via search engines due to the evergreen nature of its content.
I asked Scott Colenutt of Brighton-based consultancy Site Visibility for his take. Colenutt told me that it’s in Google’s best interests to deal with plagiarism as best it can algorithmically to keep its search results relevant and trusted by users. However, he added, it is still possible for pages with plagiarised content to outrank the original content in search results.
Colenutt recommends contacting the website administrator and asking for the duplicated content be removed, much as I had done in my instance. If you don’t know the website admins, tools like hunter.io and voilanorbert.com can help you find the right email address to contact.
“As a further measure to help protect your website from plagiarism, you can add something called a self-referencing canonical tag to each page,” Colenutt explains. “Though this won’t stop manual plagiarism, it can combat the effectiveness of automatic content scrapers, as the canonical tag is published along with the offending content and will help Google understand that you are the original source.”
Plagiarism detection tools
I have found in the course of research that copyright begins at the point of publishing, and if you feel another site has pinched your copy you can use the Internet Archive to prove that yours came first.
I experimented with a number of tools recommended by my community to see if I could find any further instances of plagiarism. The easiest one I tested was Copyscape, which enables users to insert a URL and it will search for identical content. You have to enter each URL individually.
Copyscape also offers banners to place on your site to ward off would-be copycats, as well as a tool called Copysentry that, for $4.95 per month, provides email alerts whenever plagiarised copy appears online.
Other plagiarism tools I looked at included:
The only issue with those is that they require you to copy and paste chunks of copy into the text or upload documents. Many online plagiarism detection tools are designed for academics and students, so very document-specific, I find.
I need your suggestions – what would you recommend as a tool that searches your content via a URL rather than copy and pasting chunks of text? Please leave a comment below or ping me on Twitter.