How Machine Learning Is Revolutionising The Study Of Galaxies With Image Classification

There are billions of galaxies in space. Morphological galaxy classification is based on their shapes and general visualisation, and it is important for astronomers because it gives them an account of their composition and evolution.

The most popular morphological classification scheme known as the Tuning Fork, proposed by scientist Edwin Hubble. In this classification (refer the image), E stands for elliptical, S for spiral and SB for barred spiral. Numbers in elliptical galaxies are its ellipticity and alphabet indices in spiral galaxies is based on its compactness.

There are millions of celestial images generated by various programs every year. For example, the Sloan Digital Sky Survey (SDSS), produces one million images of galaxies. Since analysing these images manually is a time consuming process, machine learning algorithms are put to use to process images of galaxies and classify the data.

The traditional manual classification method of galaxies has two distinct disadvantages:

1) It is a very slow process and takes a lot of time for astronomers to sit and scrutinize and classify the galaxies based on their shape derived from their images

2) The process is susceptible to human error

Classification Algorithm

Basic idea of ML for morphological galaxy classification involves three major steps:

1.Image pre-processing:

Each galaxy image acts as the input. There is noise in the form of fluctuations, in the first-hand image. In this step, this noise from the image is removed by applying a predetermined threshold to pixel intensities. From the remaining pixels, a secondary image is formed. The centroid of this image is found out by centroiding techniques and the centroid coordinates which are components of the image are calculated. Pixels with standard deviations more than a certain amount can be removed for improving the image quality because by doing this, it essentially removes the extra bright and unwanted objects. Next, the image is scaled to a uniform size. In the preprocessing stage the galaxy images are scaled, rotated, cropped and centered, thereby minimizing the portion of the image and serving an image for a better feature extraction.

2.Feature extraction:

The pre-processed images are compressed using various analysis methods, for example Principal Component Analysis (PCA) and their dimensionality is deduced. In case of PCA, each galaxy is represented as a row vector. The basic idea is to find the components in the form of eigenvectors of the covariance matrix, which is nothing, but variation is galaxy morphic images, and tell about the amount of variance. Principal component vector basis is calculated for all images and then the coefficients of these images are projected on this basis as a set of galaxy features. Appearance-related features of the galaxy like its elongation, form-factor, convexity and asymmetry index are extracted and analysed. Elongation is defined only for elliptical galaxies as the ratio of difference of the semi-major and semi-minor axis to the sum of the two and form-factor is the ratio of the galaxy area to its perimeter area. In other words, it is the ratio of number of pixels in the galaxy to number of pixels in Canny edge detection, which has an operator that detects edges in images using a multi-stage algorithm.  Convexity is the ratio of the galaxy perimeter to the minimum bounding rectangle perimeter and asymmetry index is obtained by rotating the image by 180 degrees and comparing the pixel intensities of the rotated image and the original image. An ellipse with the convex structure of the remaining pixels was fitted to calculate morphic features. This image is then compressed in such a way that it preserves majority of its originality.


This is the final and the most crucial step in the morphology process. There are various machine learning classifiers that can be trained for classification like the Support Vector Machines (SVM), Random Forest (RF), Convolution Neural Networks (CNN), Locally Weighted Regression (LWR) and Naïve Bayes (NB). These classifiers are trained on test data, the data being the feature-extracted galaxy images, and taught to classify the galaxies. For each of these algorithms, the galaxies are classified into 3 and 7 classifications based on morphic features with PCA and morphic features individually as well as combined. All the methods have been trained with different kinds of galaxy images. When a new galaxy image is generated, it compares it with the set galaxy images with which it is trained and gives the nearest possible classification.


ML algorithms can deal with the millions of images the deep sky surveys yield. Using them in the field makes classification simpler and quicker and error-free as well, saving a lot of time for astronauts to use their time for more intricate analysis in research.

Download our Mobile App

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox