The piracy of educational books, a dilemma that calls for awareness and protective measures

We at Smart Publishing Protection provide protection for 27,000 online publications, of which 4,000 correspond to textbooks and university-level academic materials.

Ninety-five percent (95%) of the websites that advertise books for download do not actually host the files

We estimate that there are some 15,000 illegal book downloads around the world each minute. Although the piracy affecting the publishing industry does not receive as much attention in the media as do movies or music, the figures are alarming. It has been estimated that 24% of users of online books engage in piracy; that is, practically a quarter of all readers download books illegally.

The illegal downloads affecting the publishing industry are unevenly distributed across different types of publications. For example, if we consider the textbook and university publishing segment, piracy is considerably less common than in the case of fictional or literary works. However, in the case of educational publications, the way in which the materials are illegally hosted and distributed online is more sophisticated.

Of the 27,000-plus works registered on our Smart Publishing Protection technology platform, almost 15% of the books and audiobooks among them are academic publications.

Another variable that one must take into account is the title’s date of publication. According to data from the 2018 Piracy Observatory conducted by GFK, the greatest portion of illegal downloads (42%) corresponds to publications released less than a year prior. Thirty-four percent (34%) of the total of illicit accesses corresponds to titles released between one and three years prior to the download, while 24% are publications released more than 3 years prior. In short, a novel published during the last year is more likely to be pirated than are educational titles released in 2015.

Although the level of piracy of educational books is less than that of fiction, the dilemma that it represents for educational publishers is no less pressing, since the investment that textbook and university-level materials require is much greater, meaning that the impact of piracy on ROI is more severe.

Counterfeit websites and false advertising can be just as harmful, but cannot be legally acted upon

With 15,000 illegal downloads happening every minute, it’s hard to think of anything that could be more damaging to the health of the publishing industry. However, now imagine a reader trying to illegally download the latest title by his favorite author. If you type the name of the book into Google search, the results will list dozens of links related to that title. The user will click on the top-ranked URLs and will most likely browse dozens of counterfeit sites (so-called “fake websites”), where users may then click on buttons or fake links and are shown all kinds of advertising before finally reaching the file they are looking for, which in the end does not exist on that site. This tactic is referred to as false piracy, whose level of activity even surpasses that of book piracy and similarly causes harm.

A scant 5% of websites that claim to list a book actually host the file for download or online reading. The remaining 95% are fake websites, which only contain advertising and profit by baiting users to click on the links. At Smart Publishing Protection, we consider such websites to be equally illegal and we report them to Google so that they do not appear in search results.

In this manner, we act to better position the brand image and products of our client publishers. Such counterfeit sites entice users with titles that they do not in fact host, so one cannot act against them from a legal standpoint. However, Smart Publishing Protection, thanks to its TCRP agreement with Google, can act to delist these URLs in the search engine, as they can be reported for employing deceitful advertising, promising content they do not have. Via such delisting, the problem can be solved in a matter of hours.

How does one go about fighting book piracy? By simulating user behavior using machine learning technology.

Smart Publishing Protection’s technology is based on semantic searches, that is, we seek and identify links to sites whose content contains the name of the work we wish to protect together with options to download or view it online. To automate these searches, we adopt the would-be downloaders’ own perspective: how would they conduct a search for it? Or even more importantly: how would they try to find it as quickly as possible? We employ a large number of variables that we then study, configure and reconfigure as we learn from the data. For example, is the title of the work long or short? Is it well-known or relatively unknown? Could the title lead to ambiguous search results? Is there a website with that name? Using our system, we take into account each parameter that could potentially be used to masquerade as legitimate.

To enhance its efficacy over time, our algorithm is continuously learning, varying with the means of access we are scrutinizing. Most of the results we capture are indexed by Google search, whose listings return a large number of websites dedicated to false piracy. As trusted Google partners, our delisting effectiveness regarding the more than 11 million URLs located so far is no less than 100%.

One of the ways in which we control the spread of piracy is through the use of blacklists and whitelists provided by customers, or those generated by the system itself, but our efforts are not limited to this technique by any means. Many instances of piracy occur on social networks, on which we deploy similar algorithms tailored for each specific platform. We at Smart Protection also have agreements in place with social networks for the immediate takedown of illegal posts.

On video hosting platforms, such as YouTube, Vimeo or Dailymotion, it is common to find fake videos with links to illegal websites in the content descriptions themselves. We act immediately on this type of piracy and seek to remove the offending material immediately, storing the illegal domain data and capturing the download files, which we also report for their removal.

How do we approach websites that host both legal and illegal content? By deploying an expert team with analysts specialized in each outlet.

On some platforms, such as the Blogspot social network, content download can be done more directly, and can even be located within completely legal blogs. In this case, a Blogspot reader could locate an illegal download link within a blog devoted to literary criticism, for example. These cases are similar to those found in the Scribd domain, which may contain offers for both authorized and unauthorized content.

When we discover results on websites that are not completely legal, we proceed to set up a filter and conduct a more rigorous check, in this case performed personally by our analysts. In this manner, we are educating the system so that over the long term it can learn to differentiate where it should or should not capture information.

Lastly, it is important to bear in mind the copyright regulation approved by the European Parliament in February, in which a legislative framework is created between major content platforms (such as Google or social networks) and copyright owners to facilitate the removal of those contents that do not represent legitimate use. In addition, it was determined that social media platforms will be obliged to share profits whenever an author legitimately claims any content as their own, involving functionality currently under development by Smart Publishing Protection.

Greater awareness of piracy affecting academic materials is critically important

It is obvious that the edition and production of educational books and audiobooks, in any format, represents a significant investment, not only of economic resources, but also of creative effort. From the very moment of creation onward, it is therefore necessary to consider and deploy mechanisms for copyright protection.

In this regard, academic and textbook publishing is not only affected by the forms of piracy described above. At times, through lack of familiarity with the law, or intentional fraud, many uses greatly exceed the limits of legitimate use for citations and reviews for educational purposes, with the result that such academic works are “shared” massively without oversight. In light of this, it becomes everyone’s job to promote the idea in society that educational books, like works of fiction, are also subject to copyright.

Do you want to know if your editorial content is being pirated online?

Your publications could be at risk from digital piracy, request your free Content Scan today…