Data Lake: KQL Jobs & Summary Rules
Not all your data sits in the hot tier. Learn how to query archived data with KQL search jobs and create Summary rule tables for efficient long-term hunting.
Why hunt in archived data?
Sometimes the evidence of an attack is 6 months old. The attacker got in quietly, hid for months, and only now you discovered them. Your recent data shows the damage, but the initial breach? That is buried in archived logs.
The Data lake tier stores historical data cheaply — but it is slower to query than the Analytics tier. You cannot just run a quick KQL query against months of archived data the same way you query recent events.
Instead, you use KQL search jobs (asynchronous queries that run in the background) and Summary rule tables (pre-aggregated data that makes long-term hunting efficient). Together, they let you hunt across months or years of data without breaking the bank.
KQL search jobs
How search jobs work
- Write a KQL query for the data you need (just like a normal query)
- Submit as a search job — the query runs asynchronously in the background
- Results stored in a new table:
{OriginalTable}_SRCH(e.g.,SecurityEvent_SRCH) - Query the results table — fast interactive queries against the pre-computed results
- Results retained for the configured period (default 30 days)
When to use search jobs
| Scenario | Why Search Jobs |
|---|---|
| Historical breach investigation | Query 6+ months of sign-in logs for a compromised account |
| Threat intelligence retrohunt | Check if a newly discovered IOC appeared in historical data |
| Compliance audit | Search archived data for policy violations |
| Long-range pattern analysis | Identify slow-and-low attacks across months of data |
Search job example
SecurityEvent
| where TimeGenerated between (ago(180d) .. ago(90d))
| where EventID == 4625 // Failed logon
| where TargetUserName == "admin"
| summarize FailedCount = count() by SourceIP = IpAddress, bin(TimeGenerated, 1d)
Submit this as a search job → results appear in SecurityEvent_SRCH → query the results table interactively.
Scenario: Tyler retrohunts a new IOC
A threat intelligence report reveals that a C2 IP address (203.0.113.99) was used in attacks targeting tech companies 4 months ago. Tyler at CipherStack needs to check if any device connected to this IP during that period.
The data is in Data lake (Tyler moved network logs older than 90 days to save costs).
Tyler submits a search job:
DeviceNetworkEvents
| where TimeGenerated between (ago(180d) .. ago(90d))
| where RemoteIP == "203.0.113.99"
| project TimeGenerated, DeviceName, LocalIP, RemoteIP, RemotePortThe search job runs for 15 minutes, then results appear in DeviceNetworkEvents_SRCH. Tyler queries the results and finds 3 connections from a developer workstation 4 months ago. Investigation begins.
Summary rule tables
What are summary rules?
Summary rules are scheduled KQL queries that aggregate data into compact summary tables on a recurring basis (e.g., hourly or daily). They pre-compute common hunting and reporting patterns.
Why use summary rules?
| Problem | Summary Rule Solution |
|---|---|
| Querying raw data is slow and expensive | Summary tables are small and fast to query |
| Daily/weekly reports need the same aggregation | Pre-compute once, query many times |
| Hunting patterns repeat | Summarise common patterns (failed logins per user, network connections per device) for instant lookup |
| Data lake queries time out | Summary rules run against Analytics tier before data ages out, storing results in a custom table with configurable retention |
How to create a summary rule
- Write the aggregation query — what data to summarise
- Set the schedule — how often to run (hourly, daily)
- Define the destination table — where to store results (custom table)
- Configure retention — how long to keep summarised data
Example: daily failed login summary
SigninLogs
| where TimeGenerated > ago(1d)
| where ResultType != "0" // Failed sign-ins
| summarize
FailedCount = count(),
UniqueIPs = dcount(IPAddress),
Countries = make_set(LocationDetails.countryOrRegion)
by UserPrincipalName
| where FailedCount > 5
This runs daily and stores results in a summary table. Hunting against the summary is instant — no need to scan millions of raw SigninLogs rows.
| Feature | KQL Search Jobs | Summary Rule Tables |
|---|---|---|
| Purpose | Query historical/archived data on demand | Pre-aggregate data on a schedule for fast lookups |
| Execution | Asynchronous (background) | Scheduled (recurring) |
| Data source | Any table including Data lake | Analytics tier tables (before data ages out) |
| Results storage | _SRCH table (temporary, 30 days default) | Custom summary table (retention configurable, not permanent) |
| Best for | One-time historical investigation, retrohunting | Ongoing aggregation, reporting, fast hunting lookups |
| Cost impact | Pay per query execution | Pay once for the scheduled run; queries against summary are cheap |
Tyler discovers a new IOC that was active 4 months ago. The relevant network data has been moved to the Data lake tier. How should he search for this IOC in historical data?
🎬 Video coming soon
Next up: The final module — hunting with Notebooks and connecting to the Sentinel MCP Server for advanced investigation workflows.