MITB Banner

Top 8 Cybersecurity Datasets For Your Next Machine Learning Project

Share

Machine learning techniques play a critical role in detecting serious threats in the network. A good dataset helps create robust machine learning systems to address various network security problems, malware attacks, phishing, and host intrusion. For instance, the real-world cybersecurity datasets will help you work in projects like network intrusion detection system, network packet inspection system, etc, using machine learning models. 

Here is a list of the 8 top cybersecurity datasets you can use for your next machine learning project.

(The list is in no particular order)

1| ADFA Intrusion Detection Datasets

About: The ADFA Intrusion Detection Datasets are designed for the evaluation by system call based HIDS. The datasets cover both Linux and Windows and help in detecting anomaly-based intrusions on both Linux and Windows. The datasets are used as a benchmarking for traditional Host Based Intrusion Detection System (HIDS).

Know more here.

2| ISOT Botnet and Ransomware Detection Datasets

About: The ISOT Botnet dataset is a combination of several existing publicly available malicious and non-malicious datasets. The ISOT Ransomware Detection dataset consists of over 420 GB of ransomware and benign programmes execution traces. The ISOT HTTP botnet dataset comprises two traffic captures: malicious DNS data for nine different botnets and benign DNS for 19 different well-known software applications.

Know more here.

3| FakeNewsNet

About: FakeNewsNet is a fake news data repository, which contains two comprehensive datasets with diverse features in news content, social context, and spatiotemporal information. The dataset is constructed using an end-to-end system called FakeNewsTracker. The data repository can boost the study of various open research problems related to fake news study.

Know more here.

4| Malicious URLs Dataset

About: The Malicious URLs dataset consists of about 2.4 million URLs (examples) and 3.2 million features. The datasets are available in two types, Matlab and SVM-light. In Matlab format, the file url.mat contains FeatureTypes, a list of column indices for the data matrices that are real-valued features. In SVM-light format, the FeatureTypes is a text file list of feature indices that correspond to real-valued features.  

Know more here.

5| ISOT Cloud Intrusion Detection (ISOT CID) Dataset

About: The ISOT Cloud IDS (ISOT CID) dataset consists of over 8Tb data collected in a real cloud environment and includes network traffic at VM and hypervisor levels, system logs, performance data (e.g. CPU utilisation), and system calls. The ISOT-CID is a collection of different data accumulated from various cloud layers, including guest hosts, hypervisors, and networks. The dataset comprises data with different formats and multiple data sources, including memory dumps, resource (e.g., CPU) utilisation logs, system call traces, system logs, and network traffic.

Know more here.

6| Behavioral Biometric Datasets

About: The ISOT Behavioral Biometric dataset consists of four types of datasets, which are mouse dynamics dataset, mouse gesture dynamics dataset, combined mouse/keystroke dynamics/site actions dataset and mobile keystroke dynamics OTP dataset. The ISOT mouse dynamics dataset consists of mouse dynamics data for 48 users collected over several months. The Mouse Gesture Dynamics dataset consists of genuine gesture data drawn by 41 individuals and forgery data against 25 different individuals.

The Combined Mouse/Keystroke Dynamics/Site Actions dataset consists of the mouse, keystroke, and site actions (menus) for 24 different users visiting a website and using the site freely (in continuous mode; not static). The dataset includes both genuine samples, and attack data, where some of the users tried to forge the sessions of actual users. Lastly, the Mobile Keystroke Dynamics OTP dataset consists of mobile keystroke dynamic data collected from about 100 users providing both a fixed password and an OTP during login.

Know more here.

7| ISOT Fake News Dataset

About: The ISOT Fake News dataset is a compilation of several thousand fake news and truthful articles obtained from different legitimate news sites and sites flagged as unreliable by Politifact.com. The dataset contains two types of articles, fake and real news. This dataset was collected from real-world sources, where truthful articles were obtained by crawling Reuters.com.

Know more here.

8| Dynamic Malware Analysis Kernel and User-Level Calls

About: The Dynamic Malware Analysis Kernel and User-Level Calls dataset contain the data collected from Cuckoo and a kernel driver after running 1000 malicious and 1000 clean samples. The Kernel Driver folder contains subfolders that hold the API-calls from clean and malicious data. 

Know more here.

Share
Picture of Ambika Choudhury

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.