Back to Tutorials
tutorialstutorialai

How to Build a SOC Assistant with TensorFlow and Keras 2026

Practical tutorial: Detect threats with AI: building a SOC assistant

Alexia TorresMay 8, 202611 min read2,133 words
This article was generated by Daily Neural Digest's autonomous neural pipeline — multi-source verified, fact-checked, and quality-scored. Learn how it works

Building a SOC Assistant That Actually Works: TensorFlow and Keras in the Age of AI-Driven Security

The security operations center is drowning. Every day, SOC analysts face an impossible firehose of alerts, log entries, and network telemetry—most of it noise, some of it the signal that could stop a breach before it starts. Traditional rule-based systems, the kind that have been the backbone of intrusion detection for decades, are cracking under the weight of modern adversarial tactics. Attackers don't follow static signatures anymore; they morph, they hide, they blend in.

Enter deep learning. Not as a buzzword, but as a practical, deployable layer of intelligence that can sit atop your existing monitoring stack and learn what "normal" actually looks like. In this guide, we're going to build a real-time SOC assistant using TensorFlow and Keras—a neural network trained on historical security data that can flag anomalous traffic patterns the moment they appear. This isn't a toy. This is a production-ready architecture that any engineering team can implement today.

The architecture we'll implement is deceptively simple but deeply effective. It breaks down into three core components: data preprocessing, where raw log data gets cleaned and transformed into a machine-learning-ready format; model training, using TensorFlow [4] and Keras to build a classifier that distinguishes normal from malicious behavior; and real-time detection, where the trained model integrates with live monitoring tools to push instant alerts. We're going to walk through every line of code, every design decision, and every edge case you need to worry about when taking this from a notebook to a live SOC environment.

This approach is particularly relevant as cybersecurity threats become more sophisticated, requiring advanced analytical capabilities beyond traditional rule-based systems. The days of static regex patterns are numbered. The future belongs to models that learn.

The Neural Network Primer You Actually Need

Before we dive into the code, let's talk about why a neural network is the right tool for this job. If you're already comfortable with deep learning fundamentals, feel free to skip ahead—but for the rest of us, a quick refresher is worth the time.

At its core, a neural network is a function approximator. You feed it data—in our case, network traffic features like protocol type, packet size, connection duration—and it learns to map those inputs to an output: normal or anomalous. The "learning" happens through layers of interconnected neurons, each applying a weighted transformation followed by a non-linear activation function. The magic is that the network discovers patterns in the data that no human could explicitly encode.

For a SOC assistant, this is transformative. Traditional intrusion detection systems rely on predefined signatures. If an attacker uses a novel technique, the signature doesn't fire. A neural network, on the other hand, learns the statistical distribution of normal traffic. Anything that falls outside that distribution—even something the model has never seen before—gets flagged. This is anomaly detection at its most powerful.

The architecture we're building uses a feedforward network with three hidden layers: 128, 64, and 32 neurons respectively, each using ReLU activation. The output layer uses a sigmoid activation to produce a probability score between 0 and 1. Anything above 0.5 is classified as anomalous. This is a binary classification problem, but the same principles apply to multi-class scenarios where you might want to classify different types of attacks.

If you want a deeper visual explanation of how these networks actually learn, the video by 3Blue1Brown embedded below is one of the best resources out there. It's the kind of intuition that will serve you well when you start tuning hyperparameters or debugging why your model isn't converging.

Setting Up Your Environment for Production-Grade ML

Let's get practical. You need a Python environment with the right dependencies. This isn't a place to cut corners—version mismatches can silently break your pipeline in production.

The essential packages are TensorFlow, Keras (which now ships as part of TensorFlow), Pandas for data manipulation, and Scikit-Learn for preprocessing utilities like feature scaling. Install them with a single command:

pip install tensorflow pandas scikit-learn

A critical note: ensure you have TensorFlow version 2.10 or later. Earlier versions lack some of the performance optimizations and API stability we rely on here. Pandas and Scikit-Learn should be at their latest stable releases to benefit from the most recent improvements in data handling and preprocessing. If you're deploying on a server with GPU support, consider installing the GPU-enabled version of TensorFlow—training time will drop from minutes to seconds.

For those building more advanced systems, you might also want to explore how vector databases can store embeddings of known attack patterns for similarity search, or how open-source LLMs can augment your SOC assistant with natural language explanation of flagged events. But for now, let's focus on the core detection engine.

From Raw Logs to Training Data: The Preprocessing Pipeline

Data preprocessing is where most projects fail. It's not glamorous, but it's the single most important step in building a reliable model. Garbage in, garbage out has never been more true than in cybersecurity ML.

We start by loading a CSV of network logs. In a real deployment, this data would come from your SIEM, from tools like Zeek or Suricata, or from custom packet capture pipelines. The key is that the dataset must be labeled—each row needs a "label" column indicating whether the traffic was normal (0) or malicious (1). Without labels, you can't do supervised learning. If you don't have labeled data, consider using unsupervised techniques like autoencoders for anomaly detection, but that's a topic for another article.

Here's the preprocessing code:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load dataset
data = pd.read_csv('network_logs.csv')

# Preprocess data: remove unnecessary columns, encode categorical features
data.drop(['timestamp', 'user_id'], axis=1, inplace=True)
data['protocol'] = data['protocol'].map({'tcp': 0, 'udp': 1})

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('label', axis=1), data['label'], test_size=0.2)

# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Let's break down what's happening. We drop timestamp and user_id because they're not predictive features—timestamps are sequential and can confuse the model, and user IDs are high-cardinality categorical variables that would require heavy encoding. We map the protocol column from strings to integers because neural networks can't process text directly. Then we split the data, reserving 20% for testing, and apply standard scaling to normalize all features to zero mean and unit variance. This is critical because features with larger magnitudes (like packet size in bytes) would otherwise dominate the gradient updates during training.

One edge case to watch for: if your dataset is highly imbalanced (99% normal traffic, 1% malicious), your model will learn to predict "normal" for everything and still achieve 99% accuracy. You need to handle this with techniques like class weighting, oversampling the minority class, or using a different loss function. For this tutorial, we assume a reasonably balanced dataset, but in production, always check your class distribution.

Training the Neural Network: Architecture, Compilation, and Convergence

With our data preprocessed, we can define and train the model. This is where TensorFlow and Keras shine—the API is clean, the abstractions are powerful, and the debugging tools are mature.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Define the neural network architecture
model = Sequential([
    Dense(128, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    Dense(64, activation='relu'),
    Dense(32, activation='relu'),
    Dense(1, activation='sigmoid')
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
history = model.fit(X_train_scaled, y_train, epochs=50, batch_size=64, validation_split=0.2)

The architecture is a standard feedforward network. The input layer has a number of neurons equal to the number of features in your dataset (after preprocessing). We use three hidden layers with decreasing width—this is a common pattern that allows the network to learn increasingly abstract representations. ReLU activation is used in the hidden layers because it mitigates the vanishing gradient problem and is computationally efficient. The output layer uses sigmoid to squash the final value into a probability.

We compile with the Adam optimizer, which adapts the learning rate during training and generally converges faster than vanilla stochastic gradient descent. Binary cross-entropy is the appropriate loss function for binary classification. We track accuracy as a metric, but remember: accuracy can be misleading on imbalanced datasets. Always check precision, recall, and the F1 score in production.

Training for 50 epochs with a batch size of 64 is a reasonable starting point. The validation split of 20% means the model will hold out a portion of the training data to evaluate performance after each epoch, helping you detect overfitting early. If you see the validation loss start to increase while training loss continues to decrease, you're overfitting—reduce the number of epochs, add dropout layers, or increase regularization.

For those interested in deeper AI tutorials on model optimization, techniques like learning rate scheduling, early stopping callbacks, and hyperparameter tuning with Keras Tuner can dramatically improve performance. But for a first iteration, this setup will get you a working model.

Deploying Real-Time Anomaly Detection in Your SOC

Training a model is only half the battle. The real value comes when it's integrated into your live monitoring pipeline, analyzing traffic as it flows and pushing alerts when something looks wrong.

Here's a minimal real-time detection loop that listens for incoming data points over a TCP socket:

import socket

def predict_anomalies(data_point):
    # Preprocess the data point (e.g., scaling and encoding categorical variables)
    processed_data = scaler.transform([data_point])

    # Predict using the trained model
    prediction = model.predict(processed_data)
    return prediction[0][0] > 0.5

# Example usage with a real-time monitoring tool like Snort or Suricata
def monitor_network():
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.bind(('localhost', 9999))
    sock.listen(1)

    while True:
        conn, addr = sock.accept()
        data_point = conn.recv(4096)  # Receive network traffic log entry
        if predict_anomalies(data_point):
            print("Anomaly detected!")

This is a simplified example, but it illustrates the core pattern. In production, you'd replace the raw socket with a proper message queue like Kafka or RabbitMQ, and you'd push alerts to a SIEM or a ticketing system instead of printing to stdout. The predict_anomalies function handles preprocessing—applying the same scaler and encoding that you used during training—before making a prediction.

One critical detail: the scaler object must be saved and loaded alongside the model. You can use joblib or pickle to serialize it. Without the scaler, your predictions will be meaningless because the input features won't be normalized to the same scale the model expects.

Production Optimization and the Hard Lessons

Deploying a machine learning model in a SOC environment comes with challenges that don't appear in a Jupyter notebook. Here are the configurations and edge cases you need to address.

Batch Processing: For large datasets, process data in batches to manage memory usage effectively. TensorFlow's model.predict() can handle batches natively—pass a numpy array of multiple rows instead of a single row. This is especially important if you're processing historical logs for retraining.

Asynchronous Processing: Use asynchronous libraries like asyncio for handling multiple network connections efficiently. A blocking socket will become a bottleneck if you're monitoring hundreds of endpoints simultaneously. Consider using an async web framework like FastAPI to expose your model as a REST endpoint.

GPU/CPU Optimization: Depending on your hardware setup, optimize model training and prediction using GPU acceleration or parallel CPU processing. TensorFlow automatically detects GPUs, but you may need to set environment variables to limit memory growth. For inference, CPUs are often sufficient for low-latency requirements, but batch inference on GPUs can handle higher throughput.

Error Handling: Implement robust error handling to manage unexpected issues such as data corruption or network failures. A single malformed log entry shouldn't crash your entire detection pipeline.

try:
    # Model prediction logic here
except Exception as e:
    print(f"Error occurred: {e}")

Security Risks: Ensure that your system is secure against potential threats like prompt injection if using large language models (LLMs). Validate all inputs and sanitize data before processing. If you extend your SOC assistant with natural language capabilities, treat every input as untrusted.

What's Next: Scaling, Retraining, and the Road Ahead

By following this tutorial, you have built a SOC assistant capable of detecting anomalies in real-time network traffic logs. This is a functional, deployable system—but it's also a foundation. The real power comes from iteration.

Future steps could include scaling the system across multiple nodes for distributed processing, integrating additional features like machine learning model retraining with new data, or implementing more sophisticated anomaly detection algorithms like variational autoencoders or temporal convolutional networks. You could also add a feedback loop: when analysts confirm or reject an alert, that label feeds back into the training data, continuously improving the model.

This project demonstrates how AI can be effectively utilized to enhance cybersecurity operations, providing a robust solution against evolving threats. The attackers aren't standing still. Neither should your defenses.


tutorialai
Share this article:

Was this article helpful?

Let us know to improve our AI generation.

Related Articles