MITB Banner

Machine Learning May Soon Be An Alternative To Animal Testing

Share

MLanimaltesting-bn

MLanimaltesting-bn

In the field of biotechnology and chemical research, it is common to test the effect of certain chemicals or materials on animals first, instead of humans. Although these studies are controversial in nature (animals may be harmed or killed in the process), they have proven to be the most effective method in certain areas of research — especially toxicology.

But now, as machine learning is changing the technology landscape, bio researchers are trying novel methods using ML and related fields to fulfil the possibility of an alternate testing technique. If successful in the long run, computers and ML can completely replace animal studies for chemical safety-related projects.

Combining Machine Learning And Big Data For RASAR

Professor Thomas Hartung and team from John Hopkins University designed a novel method based on an earlier technique called read-across structure activity relationships (abbreviated as ‘RASAR’). This technique actually combines chemical similarities with supervised learning. RASAR was designed by taking the earlier Read-Across approaches into consideration.

The chemical similarity is determined by two steps — by using binary fingerprints for chemicals or using the Jaccard distance to establish similarity on these fingerprints. Hartung and team tell the reason why ML has a significant impact after creating chemical similarity.

Supervised learning methods then provide a statistical model of the insights deliverable from chemical similarity. Due to automation, the approach can be applied to large datasets and thus validated according to common statistical methods such as cross-validation. Supervised learning models built on chemical similarity also allow assignment of confidence to individual predictions.

This means that ML could greatly help in knowing chemical similarities for a large number of toxic chemicals and their information collected on a database, instead of conducting animal tests extensively.

To demonstrate this, the researchers built an ML model called ‘Simple RASAR’ trained in logistic regression to predict hazards from similarities for every chemical. These chemicals are either labelled negative (not hazardous) or positive (hazardous) by referring to the similarities in the chemical information in the database.

The model was tested for the European Council’s REACH (Registration, Evaluation, Authorisation and Restriction of Chemicals) regulation as well as with the help of nine standard toxicology methods. Before testing simple RASAR, the aspect of reproducibility (this is actually a workflow conducted in the study) by adhering to OECD guidelines on chemical testing is also evaluated. Apart from this, the effect of structural analogues is also studied.

For RASAR’s database, chemical-related data from REACH in collaboration from PubChem was collected. Over 80,000 chemicals were analysed which resulted in more than 800,000 chemical labels. These labels were created by inferring from other standard regulations in chemical hazards.

Simple RASAR And Data Fusion RASAR

Before starting off with the ML models, the prerequisite of constructing RASARs is done in two steps.

  • Unsupervised Learning
  • Supervised Learning

In the first step, chemical similarities are established through locality sensitive hashing methods. This creates local graphs for all the chemicals, which in turn are used to generate feature vectors through K-nearest neighbours. The second step applies supervised learning (logistic regression in this case) to the unsupervised learning method (first step). This forms the core of Simple RASAR, which acts as the aggregation function just like logistic regression. Therefore, Simple RASAR generates 2D vectors, that is, positive and negative chemical analogues, while Data Fusion RASAR is an extension of this model where it trains a random forest tree using the generated 2D vectors. A detailed illustration can be found here.

Model Training And Evaluation

Here, the feature vectors differ for both these models, and that’s why there’s a difference in supervised learning models (logistic regression in Simple RASAR and random forests in Data Fusion RASAR). For training these models, spark.ml is the library package used in the study. With more than 300,000 iterations in training, these models are evaluated through five-fold cross-validation after training.

Conclusion

In the study, three main results are given. The first being the test reproducibility with respect to OECD guideline while the second and third being Simple RASAR and Data Fusion RASAR modes respectively. Reproducibility accuracy was significantly good in terms of accuracy and even on chemical specificity (more than 90 percent accurate). This means it is on par with animal tests.

Similarly, Simple RASAR and Data Fusion RASAR through cross-validation achieve an accuracy in the range of 80-95 percent. All of this means that ML is nearly there in predicting chemical hazards by comparing properties of a large collection of dangerous chemicals.

This new study is a typical example of how ML could make animal tests redundant thus saving cost and time all along. However, there is a lot to achieve to make it a full-fledged method.

Share
Picture of Abhishek Sharma

Abhishek Sharma

I research and cover latest happenings in data science. My fervent interests are in latest technology and humor/comedy (an odd combination!). When I'm not busy reading on these subjects, you'll find me watching movies or playing badminton.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.