Skip to main content

The “Spooky” Truth Behind AI: IP and Privacy Concerns in Training Bioscience AI Models



To say artificial intelligence, or “AI,” is a hot topic is an understatement. AI, which traces its roots back to Alan Turing’s work in the 1950s, is far from new, and consumers have been using AI in some form for years. Asking Siri for directions to your favorite coffee shop, the uncanny ability of your Instagram feed to show you posts you’re interested in, and having Alexa play your favorite songs on demand are all examples of AI in action.

But AI has an important role for the greater good, too, as its capabilities are increasingly being harnessed in myriad scientific disciplines, including bioscience. Generally speaking, AI enables machines, like computers, to perform tasks that traditionally were accomplished by humans, permitting repeatability, increasing accuracy, and, of course, greatly increasing the speed of analysis and, as a result, of new developments.

A subset of AI called machine learning (ML) enables computers to learn from data, developing rules directly from established training input and implementing those rules in the ML algorithms via statistical methods. ML enables quick recognition of patterns from large amounts of data, providing findings more reliably (and faster) than through human analysis and prediction. A subset of ML called deep learning (DL) uses a deeply layered neural network, seeking to mimic human handling of information and draw conclusions similar to how a human might.

It should come as no surprise, then, that bioscience is using ML and DL models to take great leaps forward, in everything from image analysis helpful for disease diagnosis, to health data analysis helpful for finding new drug targets. Of course, as with any AI model, these models must be “trained” on datasets to “learn” how to analyze new data. When those datasets include patient data, either in image or personal health information form, there are intellectual property and privacy concerns to consider.

IP Considerations

Intellectual property includes patents, trademarks, and copyrights. Copyrights are the most pertinent IP right when considering what can (or cannot) be done with datasets used to train ML and DL models in the biosciences.

Copyrights cover a surprisingly broad swath of works, including, in some instances, the data in a training set. While ideas cannot be copyrighted, the tangible expression of an idea can. So, the idea of a microscopic image of a cancer cell cannot be copyrighted, but the image taken by a microscopist of the cancer cell can be copyrighted. Note that for copyright, very minimal creativity is required. If the selection and arrangement of a phone book can be copyrighted (and it can), so, too, can the image resulting from the microscopist’s choice of field, contrast, and exposure in imaging a cancer cell. Further, data sets consisting of patient personal health information, such as those reportedly used by COVID researchers in China, can also be copyrighted, at least as to the specific selection and arrangement of the dataset.

The copyright status of training data matters in determining how training data can be used, both in training an ML or DL model, and in downstream use of the model. Copyright is a bundle of rights that include, among other things, the ability to create copies of the work, the ability to create derivative works, and the ability to sell or lease the work. Those rights initially vest with a work’s author, but in many cases are owned by the institution providing the dataset for use in AI model development.

Some bioscience companies avoid potential copyright issues by creating their own training datasets in-house. But for others using datasets obtained from a third party, copyright may be an issue. Biomedical datasets originating from universities or research hospitals, even if copyrightable, may be provided under an open-source license that would permit any copying incidental to their use in training, as well as any inclusion of portions of the dataset in downstream derivative products—but not necessarily. Further, as “big data” continues to grow, for-profit institutions may offer datasets that can only be used for a limited purpose.

When it comes to copyrights, there is no “one size fits all” answer for how a dataset can be used. Ask your lawyer for legal advice.

Privacy Considerations

Bioscience companies may obtain information from hospitals and other “covered entities” through various means. It is important to ascertain what laws may cover this information and for what purposes the bioscience company obtained the information.

Does the Health Insurance Portability and Accountability Act (HIPAA) apply?

This is always the first question when dealing with health information, but it’s not the last. HIPAA applies to specific entities, including a health care provider who transmits health information in electronic form for certain transactions (a “covered entity”), and vendors that assist those entities with a number of varied processing, administrative, management, or analysis tasks (a “business associate”). For example, if a doctor’s office hires a company to analyze patient data, this likely falls under HIPAA and there would be a business associate agreement between the doctor’s office and the company to—among other things—limit the company’s re-use of this data.

Certain information that is shared is eligible for re-use by the business associate. For example, health information can be used for certain purposes if the patient agrees to the use and the business associate agreement permits the use. Another category of data that can be used, and is no longer considered covered by HIPAA, is de-identified information. De-identified information does not identify an individual and cannot be used to identify an individual. If the information can be re-identified, it is not de-identified and falls under HIPAA.

If a business associate receives data from a covered entity, it must agree to take certain security measures to protect the data through the business associate agreement. It also must enter into business associate agreements of its own with its vendors that have access to the data.

What if I get health information from an app?

Unless the app is tied to a covered entity, it likely does not fall under HIPAA. However, the Federal Trade Commission (FTC) has been taking a close interest in the Health Breach Notification Rule, which applies to non-HIPAA-covered health information. In the past year, the FTC has taken multiple enforcement actions for a “breach of security” when there was an unauthorized acquisition of identifiable health information, including through disclosure by the app company itself to other companies, including some of the world's largest international technology firms.

Can I use health information to train my product?

It depends. Ask your lawyer for legal advice.

Factors include:

  • Legal obligations. For example, whether the information falls under HIPAA, the Health Breach Notification Rule, any of the comprehensive state privacy laws, Illinois’ Biometric Information Privacy Act, or Washington’s My Health My Data Act. Remember that health information is more than a diagnosis and prescription for treatment; it can also include images and videos.
  • Contractual obligations. What does the contract permit? What disclosures were made to the individuals whose health information is being used? Are there other contractual limitations, such as through separate agreements with government agencies?

  • How closed the training is. For example, is a hospital’s data only training the hospital’s AI or is it being used to train a product that will be shared with other hospitals and non-covered entities?

Key Takeaways When Training AI Models

AI has varied uses that can enhance analysis and development in bioscience. However, since AI models need to be trained on datasets, there are intellectual property and privacy concerns to consider when implementing the training to protect both bioscience companies and the individuals’ information that is being used to train the models.

If you need help with determining your IP and privacy risks with using health information to train product lines, contact our intellectual property team or our privacy & data security team

This article is provided for informational purposes only—it does not constitute legal advice and does not create an attorney-client relationship between the firm and the reader. Readers should consult legal counsel before taking action relating to the subject matter of this article.

  Edit this post