Learning Objectives
By the end of this lesson, you will be able to:
- Establish behavioral baselines that enable effective anomaly detection
- Implement event monitoring systems that provide visibility without overwhelming your team
- Create detection rules and alerting mechanisms that minimize false positives while catching real threats
- Build basic threat hunting capabilities that scale with your organization’s growth
- Develop escalation and response procedures for detected anomalies and events
Introduction: Seeing the Invisible
The best attackers are invisible. They use legitimate credentials, move slowly to avoid detection, and blend their malicious activities with normal business operations. Traditional security controls that focus on prevention—firewalls, antivirus, access controls—miss these sophisticated attacks entirely.
Detection is about seeing the invisible: identifying subtle patterns that indicate something isn’t quite right. It’s noticing that a user is accessing files they’ve never touched before, that network traffic patterns have changed, or that system behavior is slightly different from the established baseline.
For startups, detection presents unique challenges. You need visibility into sophisticated threats but can’t afford to drown your small team in false alarms. You need detection capabilities that work with cloud-native architectures and distributed teams. Most importantly, you need detection that actually leads to effective response—not just more data.
This lesson shows you how to build detection capabilities that provide real security value without overwhelming your startup’s limited resources.
Understanding DE.AE: Anomalies and Events
NIST CSF 2.0 DE.AE Outcomes
DE.AE-01: A baseline of network operations and expected data flows for users and systems is established and managed
DE.AE-02: Detected events are analyzed to understand attack targets and methods
DE.AE-03: Event data are collected and correlated from multiple sources and sensors
DE.AE-04: Impact of events is determined
DE.AE-05: Incident alert thresholds are established
DE.AE-06: The organization’s technology is analyzed to confirm or refute an indicator of compromise (IoC)
DE.AE-07: Detected events are classified and categorized
DE.AE-08: Corrective actions are identified to address weaknesses that enabled an attack
Detection Philosophy for Startups
Quality Over Quantity:
- Focus on high-confidence detections that require investigation
- Prefer fewer, actionable alerts over comprehensive coverage
- Prioritize detection of business-impactful threats
- Build detection capabilities gradually and deliberately
Baseline-Driven Detection:
- Establish normal behavior patterns before detecting anomalies
- Focus on deviations that matter for your specific environment
- Use business context to prioritize detection efforts
- Continuously refine baselines as business evolves
Automation with Human Oversight:
- Automate data collection and initial analysis
- Use human judgment for context and decision-making
- Create clear escalation paths for detected events
- Build in feedback loops to improve detection accuracy
Establishing Behavioral Baselines
Network Baseline Development
Network Traffic Analysis:
## Network Baseline Categories
### Internal Traffic Patterns
- **User-to-Server:** Normal application access patterns
- **Server-to-Server:** Inter-service communication flows
- **Administrative:** Management and maintenance traffic
- **Backup and Sync:** Scheduled data transfer activities
### External Traffic Patterns
- **Web Browsing:** User internet access behaviors
- **SaaS Applications:** Cloud service access patterns
- **Partner Communications:** B2B integrations and APIs
- **Update Services:** Software and security updates
### Traffic Characteristics
- **Volume:** Bytes transferred per time period
- **Frequency:** Connection frequency and timing
- **Protocols:** Application protocols and ports used
- **Geographic:** Source and destination locations
- **Timing:** Business hours vs. off-hours patterns
Baseline Collection Process:
## Network Baseline Methodology
### Week 1-2: Initial Data Collection
- Deploy network monitoring tools
- Collect raw traffic data
- Document known legitimate activities
- Identify major traffic sources and destinations
### Week 3-4: Pattern Analysis
- Analyze traffic by time of day/week
- Identify user and system behavior patterns
- Document normal vs. unusual activities
- Begin developing detection thresholds
### Week 5-6: Validation and Refinement
- Test detection rules against historical data
- Adjust thresholds to minimize false positives
- Document exceptions and special circumstances
- Create baseline documentation
### Week 7-8: Production Deployment
- Deploy refined detection rules
- Monitor for accuracy and effectiveness
- Establish baseline review and update schedule
- Train team on baseline interpretation
User Behavior Baselines
User Activity Patterns:
## User Behavior Analysis Framework
### Access Patterns
- **Login Times:** Typical work hours and locations
- **Application Usage:** Most frequently used applications
- **Data Access:** Files and systems regularly accessed
- **Administrative Actions:** Privilege usage patterns
### Communication Patterns
- **Email Behavior:** Typical recipients and timing
- **File Sharing:** Normal sharing partners and methods
- **Collaboration:** Team interaction patterns
- **External Communications:** Customer and vendor interactions
### Device and Location Patterns
- **Device Usage:** Primary devices and applications
- **Location Patterns:** Office, home, travel locations
- **Network Connections:** WiFi and VPN usage patterns
- **Time Zone Consistency:** Expected geographic locations
### Anomaly Indicators
- **Access Anomalies:** Unusual times, locations, or systems
- **Volume Anomalies:** Significantly more or less activity
- **Pattern Changes:** New behaviors without business justification
- **Privilege Anomalies:** Unusual use of administrative rights
System Behavior Baselines
System Performance Baselines:
## System Baseline Metrics
### Performance Baselines
- **CPU Usage:** Normal utilization patterns
- **Memory Usage:** Typical memory consumption
- **Disk I/O:** Read/write patterns and volumes
- **Network I/O:** Bandwidth utilization patterns
### Application Baselines
- **Response Times:** Normal application performance
- **Error Rates:** Typical error frequency and types
- **User Connections:** Normal concurrent user counts
- **Transaction Volumes:** Typical business transaction patterns
### Security Event Baselines
- **Authentication Events:** Normal login patterns and failures
- **File Access:** Typical file system activity
- **Network Connections:** Normal network communication
- **Process Activity:** Standard process execution patterns
Event Collection and Correlation
Multi-Source Event Collection
Event Source Categories:
## Comprehensive Event Source Inventory
### Infrastructure Events
- **Network Devices:** Routers, switches, firewalls, access points
- **Servers:** Windows, Linux, cloud instances
- **Storage Systems:** File servers, databases, backup systems
- **Virtualization:** Hypervisors, container platforms
### Application Events
- **Web Applications:** Access logs, error logs, transaction logs
- **Business Applications:** CRM, ERP, custom applications
- **Communication:** Email, messaging, collaboration platforms
- **Development Tools:** Version control, CI/CD, deployment systems
### Security Tool Events
- **Endpoint Protection:** Antivirus, EDR, mobile device management
- **Network Security:** IDS/IPS, web filters, email security
- **Identity Systems:** Authentication servers, privilege management
- **Cloud Security:** CSPM, CASB, cloud provider logs
### User Activity Events
- **Workstation Activity:** Login, application usage, file access
- **Mobile Device Activity:** App usage, location, security events
- **SaaS Application Usage:** Cloud application access and activities
- **Physical Access:** Badge readers, security cameras, visitor logs
Event Normalization and Correlation
Event Processing Pipeline:
graph TD
A[Raw Events] --> B[Collection Agents]
B --> C[Normalization Engine]
C --> D[Enrichment Engine]
D --> E[Correlation Engine]
E --> F[Alert Generation]
F --> G[Incident Creation]
Event Correlation Techniques:
## Correlation Rule Examples
### Temporal Correlation
**Rule:** Multiple failed logins followed by successful login
**Logic:** IF failed_logins > 5 AND successful_login WITHIN 10 minutes
**Action:** Alert: Potential credential brute force
### Spatial Correlation
**Rule:** User login from impossible locations
**Logic:** IF login_location_1 AND login_location_2 WITHIN 1 hour AND distance > 500 miles
**Action:** Alert: Impossible travel detected
### Behavioral Correlation
**Rule:** Unusual data download volume
**Logic:** IF data_downloaded > (baseline_average * 3) IN 24 hours
**Action:** Alert: Potential data exfiltration
### Cross-Source Correlation
**Rule:** VPN login without endpoint check-in
**Logic:** IF vpn_login AND NO endpoint_heartbeat WITHIN 30 minutes
**Action:** Alert: Potential compromised credentials
Detection Rule Development
Rule Creation Methodology
Detection Rule Categories:
## Detection Rule Framework
### Signature-Based Rules
- **Known Bad Indicators:** IOCs, malware hashes, C&C domains
- **Attack Patterns:** Specific techniques and tactics (MITRE ATT&CK)
- **Policy Violations:** Explicit security policy breaches
- **Compliance Rules:** Regulatory requirement violations
### Anomaly-Based Rules
- **Statistical Anomalies:** Deviations from statistical baselines
- **Behavioral Anomalies:** Changes in user or system behavior
- **Volume Anomalies:** Unusual data or transaction volumes
- **Temporal Anomalies:** Activities at unusual times
### Threshold-Based Rules
- **Count Thresholds:** Number of events exceeding limits
- **Volume Thresholds:** Data transfer exceeding limits
- **Rate Thresholds:** Event frequency exceeding normal
- **Percentage Thresholds:** Error rates exceeding baselines
### Machine Learning Rules
- **Classification Models:** Categorizing events as normal/abnormal
- **Clustering Models:** Identifying outlier behaviors
- **Prediction Models:** Forecasting potential security events
- **Deep Learning:** Complex pattern recognition
Rule Tuning and Optimization
False Positive Reduction:
## Rule Tuning Process
### Phase 1: Initial Rule Deployment (Monitor Mode)
- Deploy rule in monitoring-only mode
- Collect 2-4 weeks of alert data
- Analyze false positive patterns
- Document legitimate business activities
### Phase 2: Threshold Adjustment
- Adjust thresholds based on false positive analysis
- Add business context to rule conditions
- Create exception lists for known-good activities
- Test adjustments against historical data
### Phase 3: Production Enablement
- Enable rule for active alerting
- Monitor alert volume and accuracy
- Establish alert handling procedures
- Create feedback loop for continuous improvement
### Phase 4: Continuous Optimization
- Regular review of rule effectiveness
- Seasonal and business change adjustments
- New threat adaptation
- Baseline updates and refreshes
Rule Performance Metrics:
- Detection Rate: Percentage of actual threats detected
- False Positive Rate: Percentage of alerts that are false positives
- Alert Volume: Number of alerts generated per day/week
- Investigation Time: Average time to investigate alerts
- Rule Coverage: Percentage of attack vectors covered
Threat Hunting Capabilities
Proactive Threat Hunting
Hunting Methodology:
## Threat Hunting Process
### 1. Hypothesis Development
- **Threat Intelligence:** Current threats targeting your industry
- **Attack Techniques:** MITRE ATT&CK framework mapping
- **Business Context:** High-value assets and attack paths
- **Historical Analysis:** Previous incidents and near-misses
### 2. Data Collection Planning
- **Data Sources:** Identify relevant log sources and datasets
- **Time Windows:** Define investigation timeframes
- **Search Criteria:** Develop initial search parameters
- **Baseline Comparison:** Plan baseline vs. current comparisons
### 3. Hunt Execution
- **Initial Searches:** Broad searches to identify patterns
- **Iterative Analysis:** Drill down into interesting findings
- **Correlation Analysis:** Connect related events across sources
- **Timeline Construction:** Build event timelines for analysis
### 4. Analysis and Validation
- **False Positive Elimination:** Filter out known-good activities
- **Threat Validation:** Confirm malicious activity indicators
- **Impact Assessment:** Understand scope and potential damage
- **Evidence Collection:** Gather evidence for response actions
### 5. Response and Improvement
- **Incident Response:** Initiate response for confirmed threats
- **Detection Rule Creation:** Create rules for future detection
- **Process Improvement:** Refine hunting procedures
- **Knowledge Sharing:** Document findings and techniques
Hunt Use Cases for Startups
High-Value Hunt Scenarios:
## Startup Threat Hunting Priorities
### Insider Threat Detection
- **Unusual data access patterns**
- **After-hours administrative activities**
- **Anomalous file sharing behaviors**
- **Privilege escalation attempts**
### Advanced Persistent Threats
- **Long-term dormant accounts**
- **Slow data exfiltration patterns**
- **Lateral movement indicators**
- **Command and control communications**
### Cloud Security Hunting
- **Unusual API usage patterns**
- **Privilege escalation in cloud environments**
- **Resource creation anomalies**
- **Data storage access patterns**
### Supply Chain Attacks
- **Vendor access pattern changes**
- **Third-party application anomalies**
- **Software update irregularities**
- **Partner communication changes**
Hunting Tools and Techniques
Hunting Technology Stack:
## Threat Hunting Tool Categories
### Data Analysis Platforms
- **Splunk:** Comprehensive search and analysis
- **Elastic Stack:** Open source analytics platform
- **Microsoft Sentinel:** Cloud-native SIEM with hunting
- **Jupyter Notebooks:** Custom analysis and visualization
### Specialized Hunting Tools
- **YARA Rules:** Malware detection and classification
- **Sigma Rules:** Generic signature format for SIEM systems
- **MISP:** Threat intelligence sharing platform
- **OpenIOC:** Indicator of compromise framework
### Cloud Hunting Tools
- **AWS CloudTrail Insights:** Unusual API activity detection
- **Azure Sentinel Hunting:** Cloud-native hunting capabilities
- **Google Chronicle:** Cloud security analytics platform
- **Panther:** Cloud-native security analytics
### Open Source Tools
- **osquery:** SQL-based system instrumentation
- **Velociraptor:** Advanced digital forensics platform
- **Kolide Fleet:** osquery fleet management
- **Wazuh:** Open source security monitoring platform
Alert Management and Escalation
Alert Classification and Prioritization
Alert Severity Framework:
## Alert Classification System
### Critical Alerts (Immediate Response Required)
- **Active data breach indicators**
- **Successful privilege escalation**
- **Confirmed malware execution**
- **System compromise evidence**
### High Alerts (Response Within 4 Hours)
- **Suspicious privilege usage**
- **Unusual data access patterns**
- **Failed authentication anomalies**
- **Network intrusion attempts**
### Medium Alerts (Response Within 24 Hours)
- **Policy violations**
- **Unusual user behaviors**
- **System performance anomalies**
- **Non-critical security events**
### Low Alerts (Response Within 72 Hours)
- **Information security events**
- **System health notifications**
- **Compliance monitoring alerts**
- **Baseline deviation notifications**
### Alert Enrichment
- **Asset criticality context**
- **User risk scoring**
- **Historical alert context**
- **Business impact assessment**
Escalation Procedures
Alert Response Workflow:
graph TD
A[Alert Generated] --> B[Initial Triage]
B --> C{Severity Level}
C -->|Critical| D[Immediate Response Team]
C -->|High| E[Security Team Lead]
C -->|Medium| F[On-Duty Analyst]
C -->|Low| G[Queue for Review]
D --> H[Incident Response Process]
E --> I[Investigation and Analysis]
F --> I
G --> J[Batch Processing]
I --> K[Response Actions]
J --> K
K --> L[Documentation and Closure]
Hands-On Exercise: Build Your Detection Program
Step 1: Current Detection Assessment
Existing Detection Capabilities:
- Network monitoring: [Yes/No/Partial]
- Endpoint detection: [Yes/No/Partial]
- User behavior monitoring: [Yes/No/Partial]
- Application monitoring: [Yes/No/Partial]
Detection Gaps:
- High priority gaps: _______________
- Medium priority gaps: _______________
- Future enhancement needs: _______________
Step 2: Baseline Development Plan
Critical Baselines to Establish:
- _________________ (Timeline: _____ weeks)
- _________________ (Timeline: _____ weeks)
- _________________ (Timeline: _____ weeks)
Data Sources for Baselines:
- Network traffic: _______________
- User activities: _______________
- System behaviors: _______________
- Application usage: _______________
Step 3: Detection Rule Strategy
Initial Detection Rules (Top 5 Priorities):
Rule Development Timeline:
- Month 1: Basic signature rules
- Month 2: Threshold-based rules
- Month 3: Behavioral anomaly rules
- Month 4: Advanced correlation rules
Step 4: Alert Management Design
Alert Volume Targets:
- Critical alerts: <___ per week
- High alerts: <___ per day
- Medium alerts: <___ per day
- Total alert volume: <___ per day
Response Time Targets:
- Critical: _____ minutes
- High: _____ hours
- Medium: _____ hours
- Low: _____ hours
Real-World Example: FinTech Startup Detection Program
Company: 35-employee digital payment platform Challenge: Regulatory requirements, sophisticated threats, limited security team
Phase 1: Foundation (Months 1-3)
Baseline Development:
- Network traffic baselines for payment processing
- User behavior baselines for financial operations
- API usage patterns for partner integrations
- Transaction volume and timing baselines
Initial Detection Rules:
- Failed authentication anomalies
- Unusual transaction patterns
- Administrative privilege usage
- Network intrusion attempts
Results:
- 15 detection rules deployed
- 95% reduction in false positives after tuning
- Mean time to detect: 4 hours
- Alert volume: 12 actionable alerts/day
Phase 2: Enhancement (Months 4-9)
Advanced Capabilities:
- Machine learning anomaly detection
- Cross-source event correlation
- Automated alert enrichment
- Basic threat hunting program
Detection Improvements:
- User entity behavior analytics
- Financial transaction fraud detection
- Advanced persistent threat hunting
- Insider threat detection capabilities
Outcomes:
- Mean time to detect: 45 minutes
- Detected 3 serious threats before impact
- 100% regulatory compliance in detection
- 60% improvement in investigation efficiency
Phase 3: Maturation (Months 10-18)
Sophisticated Detection:
- Custom machine learning models
- Threat intelligence integration
- Automated response capabilities
- Predictive security analytics
Business Impact:
- Zero successful cyberattacks in 12 months
- $2.3M in prevented fraud losses
- Passed all regulatory security examinations
- Customer trust and retention improvements
Investment and ROI:
- Annual detection program cost: $120,000
- Prevented losses: $2,300,000+
- Regulatory compliance value: $500,000
- ROI: 2,200% annually
Key Success Factors:
- Started with business-critical baselines
- Focused on quality over quantity of alerts
- Regular tuning and optimization
- Strong integration with business processes
- Continuous improvement and learning
Key Takeaways
- Baselines Enable Detection: You can’t detect anomalies without understanding normal behavior
- Quality Over Quantity: Fewer, high-confidence alerts are better than alert floods
- Context Is Critical: Business context determines alert priority and response
- Continuous Improvement: Detection capabilities must evolve with threats and business changes
- Human + Machine: Combine automated detection with human analysis and judgment
Knowledge Check
-
What’s the most important foundation for anomaly detection?
- A) Advanced machine learning algorithms
- B) Comprehensive log collection
- C) Established behavioral baselines
- D) Real-time monitoring dashboards
-
How should startups approach alert volume management?
- A) Generate as many alerts as possible for comprehensive coverage
- B) Focus on high-confidence, actionable alerts with business context
- C) Only alert on critical events to avoid noise
- D) Outsource all alert management to security providers
-
What’s the primary goal of threat hunting for startups?
- A) Replace automated detection systems
- B) Proactively find threats that automated systems missed
- C) Demonstrate security maturity to customers
- D) Keep security team busy during quiet periods
Additional Resources
- Next Lesson: DETECT - Continuous Monitoring (DE.CM)
- Detection rule examples and templates (coming soon)
- Baseline development guides (coming soon)
- Threat hunting playbooks for startups (coming soon)
In the next lesson, we’ll explore how to establish continuous security monitoring capabilities that provide ongoing visibility into your startup’s security posture.