DETECT: Anomalies and Events (DE.AE)

Learning Objectives

By the end of this lesson, you will be able to:

Establish behavioral baselines that enable effective anomaly detection
Implement event monitoring systems that provide visibility without overwhelming your team
Create detection rules and alerting mechanisms that minimize false positives while catching real threats
Build basic threat hunting capabilities that scale with your organization’s growth
Develop escalation and response procedures for detected anomalies and events

Introduction: Seeing the Invisible

The best attackers are invisible. They use legitimate credentials, move slowly to avoid detection, and blend their malicious activities with normal business operations. Traditional security controls that focus on prevention—firewalls, antivirus, access controls—miss these sophisticated attacks entirely.

Detection is about seeing the invisible: identifying subtle patterns that indicate something isn’t quite right. It’s noticing that a user is accessing files they’ve never touched before, that network traffic patterns have changed, or that system behavior is slightly different from the established baseline.

For startups, detection presents unique challenges. You need visibility into sophisticated threats but can’t afford to drown your small team in false alarms. You need detection capabilities that work with cloud-native architectures and distributed teams. Most importantly, you need detection that actually leads to effective response—not just more data.

This lesson shows you how to build detection capabilities that provide real security value without overwhelming your startup’s limited resources.

Understanding DE.AE: Anomalies and Events

NIST CSF 2.0 DE.AE Outcomes

DE.AE-01: A baseline of network operations and expected data flows for users and systems is established and managed

DE.AE-02: Detected events are analyzed to understand attack targets and methods

DE.AE-03: Event data are collected and correlated from multiple sources and sensors

DE.AE-04: Impact of events is determined

DE.AE-05: Incident alert thresholds are established

DE.AE-06: The organization’s technology is analyzed to confirm or refute an indicator of compromise (IoC)

DE.AE-07: Detected events are classified and categorized

DE.AE-08: Corrective actions are identified to address weaknesses that enabled an attack

Detection Philosophy for Startups

Quality Over Quantity:

Focus on high-confidence detections that require investigation
Prefer fewer, actionable alerts over comprehensive coverage
Prioritize detection of business-impactful threats
Build detection capabilities gradually and deliberately

Baseline-Driven Detection:

Establish normal behavior patterns before detecting anomalies
Focus on deviations that matter for your specific environment
Use business context to prioritize detection efforts
Continuously refine baselines as business evolves

Automation with Human Oversight:

Automate data collection and initial analysis
Use human judgment for context and decision-making
Create clear escalation paths for detected events
Build in feedback loops to improve detection accuracy

Establishing Behavioral Baselines

Network Baseline Development

Network Traffic Analysis:

## Network Baseline Categories

### Internal Traffic Patterns
- **User-to-Server:** Normal application access patterns
- **Server-to-Server:** Inter-service communication flows
- **Administrative:** Management and maintenance traffic
- **Backup and Sync:** Scheduled data transfer activities

### External Traffic Patterns
- **Web Browsing:** User internet access behaviors
- **SaaS Applications:** Cloud service access patterns
- **Partner Communications:** B2B integrations and APIs
- **Update Services:** Software and security updates

### Traffic Characteristics
- **Volume:** Bytes transferred per time period
- **Frequency:** Connection frequency and timing
- **Protocols:** Application protocols and ports used
- **Geographic:** Source and destination locations
- **Timing:** Business hours vs. off-hours patterns

Baseline Collection Process:

## Network Baseline Methodology

### Week 1-2: Initial Data Collection
- Deploy network monitoring tools
- Collect raw traffic data
- Document known legitimate activities
- Identify major traffic sources and destinations

### Week 3-4: Pattern Analysis
- Analyze traffic by time of day/week
- Identify user and system behavior patterns
- Document normal vs. unusual activities
- Begin developing detection thresholds

### Week 5-6: Validation and Refinement
- Test detection rules against historical data
- Adjust thresholds to minimize false positives
- Document exceptions and special circumstances
- Create baseline documentation

### Week 7-8: Production Deployment
- Deploy refined detection rules
- Monitor for accuracy and effectiveness
- Establish baseline review and update schedule
- Train team on baseline interpretation

User Behavior Baselines

User Activity Patterns:

## User Behavior Analysis Framework

### Access Patterns
- **Login Times:** Typical work hours and locations
- **Application Usage:** Most frequently used applications
- **Data Access:** Files and systems regularly accessed
- **Administrative Actions:** Privilege usage patterns

### Communication Patterns
- **Email Behavior:** Typical recipients and timing
- **File Sharing:** Normal sharing partners and methods
- **Collaboration:** Team interaction patterns
- **External Communications:** Customer and vendor interactions

### Device and Location Patterns
- **Device Usage:** Primary devices and applications
- **Location Patterns:** Office, home, travel locations
- **Network Connections:** WiFi and VPN usage patterns
- **Time Zone Consistency:** Expected geographic locations

### Anomaly Indicators
- **Access Anomalies:** Unusual times, locations, or systems
- **Volume Anomalies:** Significantly more or less activity
- **Pattern Changes:** New behaviors without business justification
- **Privilege Anomalies:** Unusual use of administrative rights

System Behavior Baselines

System Performance Baselines:

## System Baseline Metrics

### Performance Baselines
- **CPU Usage:** Normal utilization patterns
- **Memory Usage:** Typical memory consumption
- **Disk I/O:** Read/write patterns and volumes
- **Network I/O:** Bandwidth utilization patterns

### Application Baselines
- **Response Times:** Normal application performance
- **Error Rates:** Typical error frequency and types
- **User Connections:** Normal concurrent user counts
- **Transaction Volumes:** Typical business transaction patterns

### Security Event Baselines
- **Authentication Events:** Normal login patterns and failures
- **File Access:** Typical file system activity
- **Network Connections:** Normal network communication
- **Process Activity:** Standard process execution patterns

Event Collection and Correlation

Multi-Source Event Collection

Event Source Categories:

## Comprehensive Event Source Inventory

### Infrastructure Events
- **Network Devices:** Routers, switches, firewalls, access points
- **Servers:** Windows, Linux, cloud instances
- **Storage Systems:** File servers, databases, backup systems
- **Virtualization:** Hypervisors, container platforms

### Application Events
- **Web Applications:** Access logs, error logs, transaction logs
- **Business Applications:** CRM, ERP, custom applications
- **Communication:** Email, messaging, collaboration platforms
- **Development Tools:** Version control, CI/CD, deployment systems

### Security Tool Events
- **Endpoint Protection:** Antivirus, EDR, mobile device management
- **Network Security:** IDS/IPS, web filters, email security
- **Identity Systems:** Authentication servers, privilege management
- **Cloud Security:** CSPM, CASB, cloud provider logs

### User Activity Events
- **Workstation Activity:** Login, application usage, file access
- **Mobile Device Activity:** App usage, location, security events
- **SaaS Application Usage:** Cloud application access and activities
- **Physical Access:** Badge readers, security cameras, visitor logs

Event Normalization and Correlation

Event Processing Pipeline:

graph TD
    A[Raw Events] --> B[Collection Agents]
    B --> C[Normalization Engine]
    C --> D[Enrichment Engine]
    D --> E[Correlation Engine]
    E --> F[Alert Generation]
    F --> G[Incident Creation]

Event Correlation Techniques:

## Correlation Rule Examples

### Temporal Correlation
**Rule:** Multiple failed logins followed by successful login
**Logic:** IF failed_logins > 5 AND successful_login WITHIN 10 minutes
**Action:** Alert: Potential credential brute force

### Spatial Correlation
**Rule:** User login from impossible locations
**Logic:** IF login_location_1 AND login_location_2 WITHIN 1 hour AND distance > 500 miles
**Action:** Alert: Impossible travel detected

### Behavioral Correlation
**Rule:** Unusual data download volume
**Logic:** IF data_downloaded > (baseline_average * 3) IN 24 hours
**Action:** Alert: Potential data exfiltration

### Cross-Source Correlation
**Rule:** VPN login without endpoint check-in
**Logic:** IF vpn_login AND NO endpoint_heartbeat WITHIN 30 minutes
**Action:** Alert: Potential compromised credentials

Detection Rule Development

Rule Creation Methodology

Detection Rule Categories:

## Detection Rule Framework

### Signature-Based Rules
- **Known Bad Indicators:** IOCs, malware hashes, C&C domains
- **Attack Patterns:** Specific techniques and tactics (MITRE ATT&CK)
- **Policy Violations:** Explicit security policy breaches
- **Compliance Rules:** Regulatory requirement violations

### Anomaly-Based Rules
- **Statistical Anomalies:** Deviations from statistical baselines
- **Behavioral Anomalies:** Changes in user or system behavior
- **Volume Anomalies:** Unusual data or transaction volumes
- **Temporal Anomalies:** Activities at unusual times

### Threshold-Based Rules
- **Count Thresholds:** Number of events exceeding limits
- **Volume Thresholds:** Data transfer exceeding limits
- **Rate Thresholds:** Event frequency exceeding normal
- **Percentage Thresholds:** Error rates exceeding baselines

### Machine Learning Rules
- **Classification Models:** Categorizing events as normal/abnormal
- **Clustering Models:** Identifying outlier behaviors
- **Prediction Models:** Forecasting potential security events
- **Deep Learning:** Complex pattern recognition

Rule Tuning and Optimization

False Positive Reduction:

## Rule Tuning Process

### Phase 1: Initial Rule Deployment (Monitor Mode)
- Deploy rule in monitoring-only mode
- Collect 2-4 weeks of alert data
- Analyze false positive patterns
- Document legitimate business activities

### Phase 2: Threshold Adjustment
- Adjust thresholds based on false positive analysis
- Add business context to rule conditions
- Create exception lists for known-good activities
- Test adjustments against historical data

### Phase 3: Production Enablement
- Enable rule for active alerting
- Monitor alert volume and accuracy
- Establish alert handling procedures
- Create feedback loop for continuous improvement

### Phase 4: Continuous Optimization
- Regular review of rule effectiveness
- Seasonal and business change adjustments
- New threat adaptation
- Baseline updates and refreshes

Rule Performance Metrics:

Detection Rate: Percentage of actual threats detected
False Positive Rate: Percentage of alerts that are false positives
Alert Volume: Number of alerts generated per day/week
Investigation Time: Average time to investigate alerts
Rule Coverage: Percentage of attack vectors covered

Threat Hunting Capabilities

Proactive Threat Hunting

Hunting Methodology:

## Threat Hunting Process

### 1. Hypothesis Development
- **Threat Intelligence:** Current threats targeting your industry
- **Attack Techniques:** MITRE ATT&CK framework mapping
- **Business Context:** High-value assets and attack paths
- **Historical Analysis:** Previous incidents and near-misses

### 2. Data Collection Planning
- **Data Sources:** Identify relevant log sources and datasets
- **Time Windows:** Define investigation timeframes
- **Search Criteria:** Develop initial search parameters
- **Baseline Comparison:** Plan baseline vs. current comparisons

### 3. Hunt Execution
- **Initial Searches:** Broad searches to identify patterns
- **Iterative Analysis:** Drill down into interesting findings
- **Correlation Analysis:** Connect related events across sources
- **Timeline Construction:** Build event timelines for analysis

### 4. Analysis and Validation
- **False Positive Elimination:** Filter out known-good activities
- **Threat Validation:** Confirm malicious activity indicators
- **Impact Assessment:** Understand scope and potential damage
- **Evidence Collection:** Gather evidence for response actions

### 5. Response and Improvement
- **Incident Response:** Initiate response for confirmed threats
- **Detection Rule Creation:** Create rules for future detection
- **Process Improvement:** Refine hunting procedures
- **Knowledge Sharing:** Document findings and techniques

Hunt Use Cases for Startups

High-Value Hunt Scenarios:

## Startup Threat Hunting Priorities

### Insider Threat Detection
- **Unusual data access patterns**
- **After-hours administrative activities**
- **Anomalous file sharing behaviors**
- **Privilege escalation attempts**

### Advanced Persistent Threats
- **Long-term dormant accounts**
- **Slow data exfiltration patterns**
- **Lateral movement indicators**
- **Command and control communications**

### Cloud Security Hunting
- **Unusual API usage patterns**
- **Privilege escalation in cloud environments**
- **Resource creation anomalies**
- **Data storage access patterns**

### Supply Chain Attacks
- **Vendor access pattern changes**
- **Third-party application anomalies**
- **Software update irregularities**
- **Partner communication changes**

Hunting Tools and Techniques

Hunting Technology Stack:

## Threat Hunting Tool Categories

### Data Analysis Platforms
- **Splunk:** Comprehensive search and analysis
- **Elastic Stack:** Open source analytics platform
- **Microsoft Sentinel:** Cloud-native SIEM with hunting
- **Jupyter Notebooks:** Custom analysis and visualization

### Specialized Hunting Tools
- **YARA Rules:** Malware detection and classification
- **Sigma Rules:** Generic signature format for SIEM systems
- **MISP:** Threat intelligence sharing platform
- **OpenIOC:** Indicator of compromise framework

### Cloud Hunting Tools
- **AWS CloudTrail Insights:** Unusual API activity detection
- **Azure Sentinel Hunting:** Cloud-native hunting capabilities
- **Google Chronicle:** Cloud security analytics platform
- **Panther:** Cloud-native security analytics

### Open Source Tools
- **osquery:** SQL-based system instrumentation
- **Velociraptor:** Advanced digital forensics platform
- **Kolide Fleet:** osquery fleet management
- **Wazuh:** Open source security monitoring platform

Alert Management and Escalation

Alert Classification and Prioritization

Alert Severity Framework:

## Alert Classification System

### Critical Alerts (Immediate Response Required)
- **Active data breach indicators**
- **Successful privilege escalation**
- **Confirmed malware execution**
- **System compromise evidence**

### High Alerts (Response Within 4 Hours)
- **Suspicious privilege usage**
- **Unusual data access patterns**
- **Failed authentication anomalies**
- **Network intrusion attempts**

### Medium Alerts (Response Within 24 Hours)
- **Policy violations**
- **Unusual user behaviors**
- **System performance anomalies**
- **Non-critical security events**

### Low Alerts (Response Within 72 Hours)
- **Information security events**
- **System health notifications**
- **Compliance monitoring alerts**
- **Baseline deviation notifications**

### Alert Enrichment
- **Asset criticality context**
- **User risk scoring**
- **Historical alert context**
- **Business impact assessment**

Escalation Procedures

Alert Response Workflow:

graph TD
    A[Alert Generated] --> B[Initial Triage]
    B --> C{Severity Level}
    C -->|Critical| D[Immediate Response Team]
    C -->|High| E[Security Team Lead]
    C -->|Medium| F[On-Duty Analyst]
    C -->|Low| G[Queue for Review]
    D --> H[Incident Response Process]
    E --> I[Investigation and Analysis]
    F --> I
    G --> J[Batch Processing]
    I --> K[Response Actions]
    J --> K
    K --> L[Documentation and Closure]

Hands-On Exercise: Build Your Detection Program

Step 1: Current Detection Assessment

Existing Detection Capabilities:

Network monitoring: [Yes/No/Partial]
Endpoint detection: [Yes/No/Partial]
User behavior monitoring: [Yes/No/Partial]
Application monitoring: [Yes/No/Partial]

Detection Gaps:

High priority gaps: _______________
Medium priority gaps: _______________
Future enhancement needs: _______________

Step 2: Baseline Development Plan

Critical Baselines to Establish:

_________________ (Timeline: _____ weeks)
_________________ (Timeline: _____ weeks)
_________________ (Timeline: _____ weeks)

Data Sources for Baselines:

Network traffic: _______________
User activities: _______________
System behaviors: _______________
Application usage: _______________

Step 3: Detection Rule Strategy

Initial Detection Rules (Top 5 Priorities):

Rule Development Timeline:

Month 1: Basic signature rules
Month 2: Threshold-based rules
Month 3: Behavioral anomaly rules
Month 4: Advanced correlation rules

Step 4: Alert Management Design

Alert Volume Targets:

Critical alerts: <___ per week
High alerts: <___ per day
Medium alerts: <___ per day
Total alert volume: <___ per day

Response Time Targets:

Critical: _____ minutes
High: _____ hours
Medium: _____ hours
Low: _____ hours

Real-World Example: FinTech Startup Detection Program

Company: 35-employee digital payment platform Challenge: Regulatory requirements, sophisticated threats, limited security team

Phase 1: Foundation (Months 1-3)

Baseline Development:

Network traffic baselines for payment processing
User behavior baselines for financial operations
API usage patterns for partner integrations
Transaction volume and timing baselines

Initial Detection Rules:

Failed authentication anomalies
Unusual transaction patterns
Administrative privilege usage
Network intrusion attempts

Results:

15 detection rules deployed
95% reduction in false positives after tuning
Mean time to detect: 4 hours
Alert volume: 12 actionable alerts/day

Phase 2: Enhancement (Months 4-9)

Advanced Capabilities:

Machine learning anomaly detection
Cross-source event correlation
Automated alert enrichment
Basic threat hunting program

Detection Improvements:

User entity behavior analytics
Financial transaction fraud detection
Advanced persistent threat hunting
Insider threat detection capabilities

Outcomes:

Mean time to detect: 45 minutes
Detected 3 serious threats before impact
100% regulatory compliance in detection
60% improvement in investigation efficiency

Phase 3: Maturation (Months 10-18)

Sophisticated Detection:

Custom machine learning models
Threat intelligence integration
Automated response capabilities
Predictive security analytics

Business Impact:

Zero successful cyberattacks in 12 months
$2.3M in prevented fraud losses
Passed all regulatory security examinations
Customer trust and retention improvements

Investment and ROI:

Annual detection program cost: $120,000
Prevented losses: $2,300,000+
Regulatory compliance value: $500,000
ROI: 2,200% annually

Key Success Factors:

Started with business-critical baselines
Focused on quality over quantity of alerts
Regular tuning and optimization
Strong integration with business processes
Continuous improvement and learning

Key Takeaways

Baselines Enable Detection: You can’t detect anomalies without understanding normal behavior
Quality Over Quantity: Fewer, high-confidence alerts are better than alert floods
Context Is Critical: Business context determines alert priority and response
Continuous Improvement: Detection capabilities must evolve with threats and business changes
Human + Machine: Combine automated detection with human analysis and judgment

Knowledge Check

What’s the most important foundation for anomaly detection?
- A) Advanced machine learning algorithms
- B) Comprehensive log collection
- C) Established behavioral baselines
- D) Real-time monitoring dashboards
How should startups approach alert volume management?
- A) Generate as many alerts as possible for comprehensive coverage
- B) Focus on high-confidence, actionable alerts with business context
- C) Only alert on critical events to avoid noise
- D) Outsource all alert management to security providers
What’s the primary goal of threat hunting for startups?
- A) Replace automated detection systems
- B) Proactively find threats that automated systems missed
- C) Demonstrate security maturity to customers
- D) Keep security team busy during quiet periods

Additional Resources

Next Lesson: DETECT - Continuous Monitoring (DE.CM)
Detection rule examples and templates (coming soon)
Baseline development guides (coming soon)
Threat hunting playbooks for startups (coming soon)

In the next lesson, we’ll explore how to establish continuous security monitoring capabilities that provide ongoing visibility into your startup’s security posture.