Home

INDEX

  1. Introduction
  2. Deployment
  3. Design
  4. Sniffer
  5. FlowMeter
  6. Security

Welcome to the Vision ML-NIDS documentation! Here you will learn verything about me!

ML-NIDS

A Network Intrusion Detection System (NIDS) is a tool capable of analyze every packet surfing your local network and detect anomalies or attack patterns. This is everything but new, of course… But what if we add some Machine Learning stuff?

The next video shows a full demo of the application.

And here you can see its attack-blocking features!

About the project

As part of my Degree Final Project I tried to develop an innovative tool and, as I am kind of an InfoSec nerd, this NIDS was intended to become my little baby.

This awesome project has two isolated systems: the Machine Learning AI and a full stack server providing a Command & Control system for our NIDS.

The main goal is to train a model using different ML algorithms over a big dataset and use this model to classify a flow of sniffed packets in order to know whether an attack is being performed.

Machine Learning AI

Common NIDS can be classified as Signature Based or Anomaly Based. The first one depends on a set of rules written by the administrator with wich the IDS will filter the sniffed packets. The Anomaly Based one, on the other hand, tries to model normal user behavior and rises alarms whenever something is out of that.

The first version of this project will try to enhance the Signature Based approach, replacing those admin-defined rules with a ML-trained classifier.

From the beginning, the dataset used has been the CSE-CIC-IDS2018. You can find all the information about it in the link. This dataset includes more than 100Gb of .pcaps, so I have reduced the number of packets to minimize the training time waste.

However, the results obtained were not as we expected. Maybe the CSE-CIC-IDS2018 is not as scalable as its creators say. So we decided to create our own dataset, called VISION-IDS2020, which includes SSH and FTP bruteforce attacks and DoS attacks using hydra the hulk tool. This time, the results were much better.

Train your own model

Of course, you may want to train your own model in order to increase the accuracy rate of Vision at your network. There is an .ipynb notebook in the folder model_training/ in order to help you training your model. You will need your own dataset, too.

Note the notebook is designed to be run from a Google Colab server.

Server

The server is written in NodeJS. It includes a REST API and some modules capable of repeatedly sniffing traffic every n seconds, write it on a .pcap file and use the CICFlowMeter to create flows from bunches of packets and write then as a .csv file (which will be the input of our ML Classifier).

I know… I know… There are trillions of better ways to perform capture live traffic but, as I am using the FlowMeter and it needs .pcap files, a completely live capture is just not possible. So I will be happy using packets captured every, lets say… 20 seconds.

And why is that? CICFlowMeter merges related packets (same IP, same port…) and creates a flow. A flow is an object containing statistical information about those related packets, like RTT, amount of bytes transmitted/received, bytes per second… Which is PERFECT for a ML algorithm. We will avoid lots of categorical values replacing them with statistical, numeric data.