Data Lake: KQL Jobs & Summary Rules

Why hunt in archived data?

Simple explanation

Sometimes the evidence of an attack is 6 months old. The attacker got in quietly, hid for months, and only now you discovered them. Your recent data shows the damage, but the initial breach? That is buried in archived logs.

The Data lake tier stores historical data cheaply — but it is slower to query than the Analytics tier. You cannot just run a quick KQL query against months of archived data the same way you query recent events.

Instead, you use KQL search jobs (asynchronous queries that run in the background) and Summary rule tables (pre-aggregated data that makes long-term hunting efficient). Together, they let you hunt across months or years of data without breaking the bank.

KQL search jobs

How search jobs work

Write a KQL query for the data you need (just like a normal query)
Submit as a search job — the query runs asynchronously in the background
Results stored in a new table: {OriginalTable}_SRCH (e.g., SecurityEvent_SRCH)
Query the results table — fast interactive queries against the pre-computed results
Results retained for the configured period (default 30 days)

When to use search jobs

Scenario	Why Search Jobs
Historical breach investigation	Query 6+ months of sign-in logs for a compromised account
Threat intelligence retrohunt	Check if a newly discovered IOC appeared in historical data
Compliance audit	Search archived data for policy violations
Long-range pattern analysis	Identify slow-and-low attacks across months of data

Search job example

SecurityEvent
| where TimeGenerated between (ago(180d) .. ago(90d))
| where EventID == 4625  // Failed logon
| where TargetUserName == "admin"
| summarize FailedCount = count() by SourceIP = IpAddress, bin(TimeGenerated, 1d)

Submit this as a search job → results appear in SecurityEvent_SRCH → query the results table interactively.

Scenario: Tyler retrohunts a new IOC

A threat intelligence report reveals that a C2 IP address (203.0.113.99) was used in attacks targeting tech companies 4 months ago. Tyler at CipherStack needs to check if any device connected to this IP during that period.

The data is in Data lake (Tyler moved network logs older than 90 days to save costs).

Tyler submits a search job:

DeviceNetworkEvents
| where TimeGenerated between (ago(180d) .. ago(90d))
| where RemoteIP == "203.0.113.99"
| project TimeGenerated, DeviceName, LocalIP, RemoteIP, RemotePort

The search job runs for 15 minutes, then results appear in DeviceNetworkEvents_SRCH. Tyler queries the results and finds 3 connections from a developer workstation 4 months ago. Investigation begins.

Summary rule tables

What are summary rules?

Summary rules are scheduled KQL queries that aggregate data into compact summary tables on a recurring basis (e.g., hourly or daily). They pre-compute common hunting and reporting patterns.

Why use summary rules?

Problem	Summary Rule Solution
Querying raw data is slow and expensive	Summary tables are small and fast to query
Daily/weekly reports need the same aggregation	Pre-compute once, query many times
Hunting patterns repeat	Summarise common patterns (failed logins per user, network connections per device) for instant lookup
Data lake queries time out	Summary rules run against Analytics tier before data ages out, storing results in a custom table with configurable retention

How to create a summary rule

Write the aggregation query — what data to summarise
Set the schedule — how often to run (hourly, daily)
Define the destination table — where to store results (custom table)
Configure retention — how long to keep summarised data

Example: daily failed login summary

SigninLogs
| where TimeGenerated > ago(1d)
| where ResultType != "0"  // Failed sign-ins
| summarize
    FailedCount = count(),
    UniqueIPs = dcount(IPAddress),
    Countries = make_set(LocationDetails.countryOrRegion)
    by UserPrincipalName
| where FailedCount > 5

This runs daily and stores results in a summary table. Hunting against the summary is instant — no need to scan millions of raw SigninLogs rows.

Search jobs are for on-demand historical queries; summary rules are for ongoing pre-aggregation
Feature	KQL Search Jobs	Summary Rule Tables
Purpose	Query historical/archived data on demand	Pre-aggregate data on a schedule for fast lookups
Execution	Asynchronous (background)	Scheduled (recurring)
Data source	Any table including Data lake	Analytics tier tables (before data ages out)
Results storage	_SRCH table (temporary, 30 days default)	Custom summary table (retention configurable, not permanent)
Best for	One-time historical investigation, retrohunting	Ongoing aggregation, reporting, fast hunting lookups
Cost impact	Pay per query execution	Pay once for the scheduled run; queries against summary are cheap

Question

What is a KQL search job and when should you use one?

Click or press Enter to reveal answer

Answer

A KQL search job is an asynchronous query that runs against Data lake (or Analytics) data in the background. Results are stored in a _SRCH table for interactive querying. Use when: retrohunting historical IOCs, investigating historical breaches, or querying large archived datasets.

Click to flip back

Question

What are summary rule tables?

Click or press Enter to reveal answer

Answer

Scheduled KQL queries that aggregate data into compact summary tables on a recurring basis (hourly, daily). They pre-compute common patterns (failed logins, connections) so hunting and reporting queries against the summary are fast and cheap. Ideal for ongoing monitoring across large datasets.

Click to flip back

Question

What is the naming convention for search job result tables?

Click or press Enter to reveal answer

Answer

Search job results are stored in a table named {OriginalTable}_SRCH. For example, a search job against SecurityEvent creates results in SecurityEvent_SRCH. Results are retained for a configured period (default 30 days).

Click to flip back

Knowledge Check

Tyler discovers a new IOC that was active 4 months ago. The relevant network data has been moved to the Data lake tier. How should he search for this IOC in historical data?

Next up: The final module — hunting with Notebooks and connecting to the Sentinel MCP Server for advanced investigation workflows.