Crisis and Challenge of Algorithm-led Digital Content Control System – Copyright Risk of ChatGPT Content

Introduction

The way we think about AI is changing along with AI-led re-structuring in different fields. Re-structuring is the process by which systems that have become habitual in the past are being broken down and taken on new forms of organization. In the past, we have seen too many new things brought about by the development of AI. The list goes on, from 2018 when AI defeated top human players at chess to 2019 when AI stable diffusion made huge progress in generating images. What we’re seeing is a change from being driven by AI to things that are digital or data-based. Our expectations of the future are being technologically reproduced by artificial intelligence. As Crawford point out that AI is not a all-around, unbiased, or objective computing method that makes decisions on its own without human guidance. Its systems are acclimated by people, institutions, and imperatives that command the way it works. And they are immeshed in social, political, cultural, and economic realms (Crawford, 2021).

ChatGPT is the most well-known AI tool in recent times. Millions of people around the world have discovered that ChatGPT can answer questions posed by users and perform specific tasks requested, bringing new prospects for the application of artificial intelligence. However, we also need to take legal and ethical considerations into the generated content of ChatGPT. The content produced by ChatGPT faces issues such as plagiarism, copyright, freedom of speech, and personal privacy (Panagopoulou & Parpoula, 2023). ChatGPT brings new moral and ethical issues such as generated disinformation, which we need to pay attention to.

Image Source: https://www.thefountaininstitute.com/blog/chat-gpt-ethics

Digital Content and ChatGPT Mechanism

ChatGPT is a transformer-based language model, which is pre-trained based on a large number of text corpora and then fine-tuned according to special data. (Ouyang et al, 2022). This enables it to generate coherent and contextual text that can perform a variety of tasks such as text completion, text generation, and session AI (Wang et al, 2023). The basic principle of ChatGPT is to generate human-like responses given input cues or conversation history (Harry, 2023). It does this by leveraging a large amount of pre-training from various text corpora from the Internet. ChatGPT appears to have information from across the Web, but is currently open mainly for 2021 and earlier, with limited information available after that date.

To generate responses in a conversational context, ChatGPT uses a method called “prompt engineering”. The input to the model includes the user’s message or prompt as well as the model’s previous responses or conversation history. This allows the model to generate contextually relevant responses conditional on the entire conversation (Hillary, 2023). During fine-tuning, ChatGPT is trained on specific datasets and carefully generated examples to make it more useful and secure. OpenAI uses reinforcement learning from human feedback (RLHF) to adjust the model (Figure 1), where human AI trainers provide comparisons and rank the responses generated by different models. This process helps the model align with human values and ensures that it produces more relevant and appropriate responses. Thus, ChatGPT is able to use natural language processing to establish human conversations and to exchange ideas.

Figure 1, Visualizing the development process of ChatGPT

Image Source: https://huyenchip.com/2023/05/02/rlhf.html

The Crisis to Digital Content

Copyright is a type of intellectual property that makes a work original when it is created independently by a human author with a minimum of creativity. In copyright law, there are many different types of works, including paintings, musical compositions, books, poetry, blog posts, etc. (U.S. Copyright Office, n.d.). Copyright refers to the exclusive right to control the creative works created by authors, copyright holders and / or performers within a certain period of time and is controlled by the Copyright Act of 1987. Any party who uses any legal-copy work without the consent or authorization of the author, copyright owner and/or performer may be sued with infringement under the Act (Fraiwan & Khasawneh, 2023). However, ChatGPT is trained on a large training dataset from the Internet, which includes a large number of texts from websites, articles, books, social media posts and academic papers. Importantly, ChatGPT does not know the source of the data; it only sees the text and learns the patterns and relationships between words, phrases, and sentences. Most of the text used to train ChatGPT is of course protected by copyright – except for facts whose copyright protection has expired or texts in the public domain such as discoveries or works (Shawn Jason, McDermott & Emery, 2023).

Image Source: https://fernandoamaral.org/chatgpt-copyright/

In the course of ChatGPT answering user questions, ChatGPT may process and store user interactions to improve its performance. If sensitive or personally identifiable information is collected without proper consent or appropriate security measures, this may result in a privacy breach. In some cases, there may be a risk that user data may be misused or shared with third parties without proper consent or for purposes of which the user is unaware. This may result in an invasion of privacy (Gutierrez et al., 2022).

Bias in training data, constraints in contextual comprehension, complexity and model misjudgment, and missing and incomplete data are some of the reasons why ChatGPT generates erroneous information. The training data contains bias, inaccuracy, or misinformation, which the model will keep learning and repeating. Due to ChatGPT’s weak comprehension of context, it is unable to provide accurate answers when asked questions about background or context. Complicated issues and unclear circumstances could lead to model misinterpretation and inaccurate data generation (Hurst, 2023). Furthermore, incomplete and missing data force models to require additional precise knowledge on particular domains or themes (Harry, 2023), which results in the generation of misleading information (Hurst, 2023). Therefore, this may mislead users and negatively affect the public. For example, if ChatGPT answers a medical question with wrong treatment advice, it could lead users to take wrong actions that could endanger their health.

ChatGPT’s content generation may contain copyright infringement and piracy issues, mainly for the following reasons. First, ChatGPT’s training data usually contains a large amount of text from the Internet, which may contain copyrighted works. If ChatGPT uses these copyrighted works without authorization when generating the content, it may constitute copyright infringement (Bailey, 2023). Second, since ChatGPT can generate text similar to human writing styles, this makes it possible to generate similar or even identical content to existing works. This presents an opportunity for potential piracy, that is, repackaging, republishing, or distributing someone else’s work without authorization or payment from the original author. This may lead to the widespread dissemination of pirated works and damage to the rights and interests of copyright holders (Bailey, 2023). For instance, if the content generated by ChatGPT contains copyrighted music, literary works, or other creations without the authorization of the copyright holder, then this constitutes copyright infringement. In addition, if the text generated by ChatGPT is very similar to an existing work to the extent of copying or imitating, and it is published or disseminated without the permission of the original author, then this is an act of piracy.

Image Source: https://www.theipmatters.com/post/chatgpt-and-copyright-issues

The rise of ChatGPT and other forms of generative artificial intelligence has caused writers and artists to worry about copyright violations. Through their creative works, artists and performers have historically made money and safeguarded their intellectual property. Nonetheless, the development of generative artificial intelligence has greatly facilitated and increased the frequency of the process of copying and creating original works. This implies that it is possible to create new content by using protected training materials without authorization, which would be against the original author’s copyright. For instance, ChatGPT can generate content that looks like news stories, novels, or articles, which can lead to copyright violations and other forms of piracy (Crawford, 2021). Therefore, the income and intellectual property of artists and performers may be significantly impacted by this unlawful use and possible copyright violation. The value and originality of the original work will be diminished if generative artificial intelligence can quickly produce content that is identical to or even more similar than the original work. Furthermore, because generative AI is capable of replication, vast volumes of content may be produced and shared rapidly, which makes it more challenging to identify and respond to copyright infringement.

Thus, in order to guarantee that artists and performers receive just compensation and recognition, actions must be made to safeguard their copyrights (Copyright Basic – The Official Portal of Intellectual Property Corporation, n.d.). This might entail more regulation and oversight of generative AI technologies as well as stronger regulations and policies to guarantee that the application of generative AI complies with copyright laws. To further safeguard the rights and interests of artists and performers, efficient copyright protection systems and technology must be established in order to recognize and thwart unlawful use and piracy. As generative AI technology advances, service providers must take on greater accountability, including for content produced and shared by AI, particularly in regards to infringements on intellectual property, discrimination, misinformation, and abuses of personal rights (Flew, 2021).

The need for AI regulation: Associations and unions in the news are calling for greater regulation of generative AI, particularly the providers of the underlying models. They argue that generative AI should be at the heart of any meaningful AI market regulation to ensure its legal, transparent, and responsible use.
– Buchanan, 2018

The letter also mentions restrictions on underlying model providers operating central platform services to distribute digital content. This reflects questions about how current digital content distribution platforms operate and suggests a need for stricter regulation of these providers. As it stands, there is a blurring issue regarding both intellectual property and copyright issues regarding ChatGPT-generated content, so it is important to understand the legal implications of the technology. First, can the copyright owner of the text use to train ChatGPT to file a copyright infringement claim against OpenAI? Second, can the output of ChatGPT be copyrighted? If so, who owns the copyright?

Maybe We Need to Know More?

Based on the above analysis, some improvements are necessary to be done upon existing legal frameworks. According to Gutierrez (2022), although the existing AI Act has tried to Catch up with the pace of the development of AI technologies, the recently introduced GPAIS rules fail to treat fairly the characteristics of large artificial intelligence models. In view of this, it is suggested that relevant rules of the AI Act should clarify the specific types of AI models to be ruled/governed. For example, as the definition in Article 3(1b) AI Act is considered as overly inclusive, it should be examined that definitions about AI should be specific, instead of trying to cover all types of AI models.

Regarding the ethical frameworks, it is advised that some policy initiatives should be carried out. For example, prior to permitting users to access to key financial information of some companies via ChatGPT, the platform should ask users to fill in some forms and promise that this financial information will only be used as guidance, not the mere factors leading to final financial decisions (Fishman, 2016).

On the other hand, with an aim to ensure the accuracy of content produced on the ChatGPT platform, the platform should seek for multi-party participation and cooperation. In particular, content generators, copyright holders, users, and governments should work with one another. Specifically, with an aim to promote lifelong learning in schools, content generators, users and governments should collaborate with each other to ensure adequate respect for privacy, and fairness of accessing to the same types of content via ChatGPT, regardless of regional differences. In the meantime, governments should also ensure that non-discriminated and transparent content is included in the database of ChatGPT (Mhlanga, 2023). Once there is discriminated content produced and spread via ChatGPT, users, and governments should play their active roles to report and ban them.

Reference

Aziz, O. S., Samuel, D. R., & Azami, N. A. (2020). Privacy Law in Malaysia. Corporate Communications, Azmi & Associates. https://www.azmilaw.com/insights/privacy-law-in-malaysia/

Bailey, B. (2023). ChatGPT and copyright infringement: What every business should know. Red Points. https://www.redpoints.com/blog/chatgpt-and-copyright/

Buchanan, K. (2018). Malaysia: Anti-Fake News Act Comes into Force. Library of Congress. https://www.loc.gov/item/global-legal-monitor/2018-04-19/malaysia-anti-fake-news-act-comes-into-force/

Copyright Basic – The Official Portal of Intellectual Property Corporation of Malaysia. (n.d.). https://www.myipo.gov.my/en/copyright-basic/

Crawford, K. (2021). The Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence (1st ed.). Yale University Press, pp. 211–227. https://doi.org/10.2307/j.ctv1ghv45t

Fishman, B.J. (2016). Possible futures for online teacher professional development. In C. Dede, A. Eisenkraft, K. Frumin, & A. Hartley (Eds.), Teacher learning in the digital age. Online professional development in STEM education. USA: Harvard Education Press, 3-31.

Flew, T. (2021). Regulating platforms. Cambridge: Polity Press, pp. 79-86.

Fraiwan, M., & Khasawneh, N. (2023). A Review of ChatGPT Applications in Education, Marketing, Software Engineering, and Healthcare: Benefits, Drawbacks, and Research Directions. arXiv.Org. https://doi.org/10.48550/arxiv.2305.00237

Gutierrez, C.I., Aguirre, A., Uuk, R., Boine, C.C., & Franklin, M.A. (2022). A proposal for a definition of general-purpose artificial intelligence systems. Future of Life Institute – Working Paper. http://ssrn.com/abstract=4238951

Harry, G. (2023). How does ChatGPT work? Here’s the human-written answer for how ChatGPT works. Zapier. https://zapier.com/blog/how-does-chatgpt-work/

Hillary, N (2023). How to Communicate with ChatGPT – A Guide to Prompt Engineering. Free Code Camp. https://www.freecodecamp.org/news/how-to-communicate-with-ai-tools-prompt-engineering/

Hurst, L. (2023). The rapid growth of ‘news’ sites using AI tools like ChatGPT is driving the spread of misinformation. Euronews. https://www.euronews.com/next/2023/05/02/rapid-growth-of-news-sites-using-ai-tools-like-chatgpt-is-driving-the-spread-of-misinforma

Mhlanga, D. (2023). Open AI in Education, the Responsible and Ethical Use of ChatGPT Towards Lifelong Learning. SSRN Electronic Journal, 2(3), 1-10. https://doi.org/10.1007/978-3-031-37776-1_17

Muftić, F., Kadunić, M., Mušinbegović, A., & Abd Almisreb, A. (2023). Exploring Medical Breakthroughs: A Systematic Review of ChatGPT Applications in Healthcare. Southeast Europe Journal of Soft Computing, 12(1), 13-41.

Ouyang, L., Wu, J., Xu, J., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Fraser Kelton, Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. arXiv.Org. https://doi.org/10.48550/arxiv.2203.02155

Ray, P. P. (2023). ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems, 3, 121–154. https://doi.org/10.1016/j.iotcps.2023.04.003

Shawn, H., Jason, K., McDermott, W., & Emery. (2023). Copyrights, Professional Perspective – Copyright Chaos: Legal Implications of Generative AI. Bloomberg Law. https://www.bloomberglaw.com/external/document/XDDQ1PNK000000/

The Global Gender Equality Constitutional Database. (n.d.). Federal Constitution of Malaysia 1963, as amended to 2019 – Equality and Non-Discrimination. https://constitutions.unwomen.org/en/countries/asia/malaysia?provisioncategory=b21e8a4f9df246429cf4e8746437e5ac

Wang, F.-Y., Miao, Q., Li, X., Wang, X., & Lin, Y. (2023). What Does ChatGPT Say: The DAO from Algorithmic Intelligence to Linguistic Intelligence. IEEE/CAA Journal of Automatica Sinica, 10(3), 575–579. https://doi.org/10.1109/JAS.2023.123486

Panagopoulou, F., Parpoula, C., & Karpouzis, K. (2023). Legal and ethical considerations regarding the use of ChatGPT in education. arXiv.Org. https://doi.org/10.48550/arxiv.2306.10037

U.S. Copyright Office, (n.d.). What is Copyright? Copyright.gov. https://www.copyright.gov/what-is-copyright/

Images Reference

Chip, H. (2023). RLHF: Reinforcement Learning from Human Feedback. https://huyenchip.com/2023/05/02/rlhf.html

Fernando, A. (2023). Copyright and ChatGPT: Who owns AI-generated content? https://fernandoamaral.org/chatgpt-copyright/

Matt, G. S. (2023). ChatGPT Creator Faces Multiple Lawsuits Over Copyright & Privacy Violations. Search Engine Journal. https://www.searchenginejournal.com/chatgpt-creator-faces-multiple-lawsuits-over-copyright-privacy-violations/490686/

Pete, A. (2023). 5 Ethics Issues for ChatGPT and Design. The Fountain Institute. https://www.thefountaininstitute.com/blog/chat-gpt-ethics

Shreya, S. (2023). ChatGPT and Copyright Issues. IP Matters. https://www.theipmatters.com/post/chatgpt-and-copyright-issues