Pattern Matching Pipeline
runok uses a 3-layer pipeline to match rule patterns against shell commands: Lexer → Parser → Matcher. This page describes each layer in detail.
Overview
Section titled “Overview”The pipeline has two independent input paths that converge at the Matcher:
Rule pattern path:
- Lexer — splits the pattern string into a list of tokens (literals, wildcards, alternations, etc.)
- Parser — transforms the token list into a structured pattern with command name, flags, and arguments
Command path:
- Command Parser — tokenizes and structurally parses the shell command string into a command name, flags, and positional arguments
Convergence:
- The Matcher receives the structured pattern and the parsed command, compares them, and returns a match result (with optional captures for wrapper placeholder extraction).
Layer 1: Lexer
Section titled “Layer 1: Lexer”Source: src/rules/pattern_lexer.rs
The lexer converts a pattern string into a flat list of LexToken values. It handles:
| Token type | Example | Description |
|---|---|---|
Literal | git | Plain string token |
QuotedLiteral | 'WIP*' | Quoted string (preserves content) |
Alternation | -X|--request | Pipe-separated alternatives (no spaces around |) |
Wildcard | * | Matches any number of arguments |
Negation | !prod | Must not match the value |
NegationAlternation | !POST|PUT | Must not match any of the alternatives |
OpenBracket / CloseBracket | [--verbose] | Optional group markers |
Placeholder | <cmd> | Wrapper command placeholder |
MultiWordAlternation | "npx prettier"|prettier | Multi-word alternatives |
The lexer splits on whitespace, then classifies each token by examining prefix characters (!, <, [, ]), pipe separators, and quote boundaries.
Example
Section titled “Example”Pattern: curl -X|--request POST|PUT *
Lexer output:
[Literal("curl"), Alternation(["-X", "--request"]), Alternation(["POST", "PUT"]), Wildcard]Layer 2: Parser
Section titled “Layer 2: Parser”Source: src/rules/pattern_parser.rs
The parser transforms Vec<LexToken> into a structured Pattern:
struct Pattern { command: CommandPattern, // Command name (Literal, Alternation, or Wildcard) tokens: Vec<PatternToken>, // Argument patterns}Key transformations
Section titled “Key transformations”Flag-value association: When a flag alternation is followed by a non-flag token, the parser combines them into a FlagWithValue:
Alternation(["-X", "--request"]) + Alternation(["POST", "PUT"]) → FlagWithValue { aliases: ["-X", "--request"], value: Alternation(["POST", "PUT"]) }This is how deny: "curl -X POST" teaches runok that -X/--request is a flag that takes a value, not a boolean flag.
Multi-word expansion: parse_multi() expands MultiWordAlternation tokens into multiple independent Pattern instances. For example, "npx prettier"|prettier produces two patterns: one for npx prettier and one for prettier.
Optional groups: [--verbose] is parsed as an Optional token, meaning the pattern matches with or without that flag.
FlagSchema inference
Section titled “FlagSchema inference”The parser’s output drives FlagSchema construction in the rule engine. By examining which patterns have FlagWithValue tokens, runok infers which flags take values and which are boolean — without needing an explicit flag definition file. This is core to the Pattern-Driven Parsing design principle.
Layer 3: Matcher
Section titled “Layer 3: Matcher”Source: src/rules/pattern_matcher.rs
The matcher compares a Pattern against a ParsedCommand and returns whether they match (optionally with captures for placeholder extraction).
Matching algorithm
Section titled “Matching algorithm”The core algorithm in match_tokens_core() uses recursive matching with backtracking:
- Walk through pattern tokens and command tokens in parallel.
- For each pattern token, attempt to match against the current command position.
- On failure, backtrack to the last wildcard and try consuming one more command token.
A step counter (MAX_MATCH_STEPS = 10,000) prevents exponential blowup on pathological patterns.
Matching rules by token type
Section titled “Matching rules by token type”| Pattern token | Matching behavior |
|---|---|
Literal | Skips leading flag tokens to find the first positional argument and matches against it. Order-independent relative to flags; positional order among non-flag tokens is preserved. After --, matches at the current position with no skipping |
Wildcard | Matches zero or more remaining tokens (greedy with backtracking) |
Alternation | Alternations containing any flag alternative (starting with -) scan the entire token list (order-independent). Purely non-flag alternations skip leading flags like Literal. After --, matches at the current position |
FlagWithValue | Scans the entire token list for the flag, then checks the next token matches the value. Also matches =-joined forms (e.g. --flag=value) and fused short flag forms (e.g. -n3). Order-independent: the flag can appear anywhere in the command |
Negation | Positional negation: skips leading flags to find the first positional token and checks it does not equal the value. Flag-only negation (all alternatives start with -): scans the entire token list for the forbidden flag without consuming a positional token. Passes when the flag is absent, including when no tokens remain |
Optional | Tries matching with the optional tokens included, falls back to without |
Placeholder | Captures remaining tokens for wrapper command re-evaluation |
PathRef | Expands definitions.paths entries and matches against them |
Order-independent matching
Section titled “Order-independent matching”Several token types use order-independent matching so that flag placement does not affect rule evaluation:
FlagWithValue: Scans the entire token list for the flag regardless of position.curl -X POST *matches bothcurl -X POST urlandcurl url -X POST.Literaland purely non-flagAlternation: Skip over leading flag tokens (and their values for known value-flags) to find the first positional argument.gh api -X GET *matchesgh -X GET api /reposbecauseapiis found after skipping-X GET. Positional order among non-flag tokens is still preserved.- Flag-containing
Alternationand flag-onlyNegation: Scan all tokens for a matching flag, independent of position. Mixed alternations like-v|verbosealso scan all tokens so that both the flag and non-flag variants can match.
The bare -- separator disables flag-skipping for subsequent Literal and Alternation matches, preserving the POSIX end-of-options semantics.
=-joined flag splitting
Section titled “=-joined flag splitting”FlagWithValue also matches =-joined command tokens (e.g. --sort=value for a pattern --sort value). The splitting only recognizes standard flag forms:
- Short flags:
-X=val(single dash + one character) - Long flags:
--flag=val(double dash)
Non-standard forms like -Denv=prod (Java) or -DFOO=bar (GCC) are not split. These are treated as single tokens because the portion before = (-Denv, -DFOO) is not a standard flag name — it is a fused flag-key combination specific to the command.
Fused short flag splitting
Section titled “Fused short flag splitting”FlagWithValue also matches fused short flag tokens where the value is directly attached to the flag character (e.g. -n3 for a pattern -n *). The splitting extracts the first two bytes as the flag part (-n) and the remainder as the value (3). This is only attempted when:
- The token starts with
-(not--) and is longer than 2 bytes - The second byte is an ASCII character (non-ASCII characters are rejected to avoid slicing inside multi-byte UTF-8 characters)
- The extracted flag matches one of the
FlagWithValuealiases
This design avoids false splits on combined boolean flags like -rf, since splitting only occurs when the pattern explicitly declares the flag takes a value.
Wrapper command matching
Section titled “Wrapper command matching”When a pattern contains a <cmd> placeholder (defined in definitions.wrappers), the matcher captures all tokens at that position. The captured tokens are then reassembled into a command string and recursively evaluated by the rule engine.
Example: With wrapper sudo <cmd>, the command sudo rm -rf / matches, and rm -rf / is captured for recursive evaluation.
Command Parser
Section titled “Command Parser”Source: src/rules/command_parser.rs
The command parser operates on the input command (not the pattern). It provides:
Tokenization
Section titled “Tokenization”tokenize() splits a shell command string into tokens, respecting:
- Single and double quotes
- Backslash escapes
- Empty quoted strings (preserved as empty tokens)
Structural parsing
Section titled “Structural parsing”parse_command() uses the FlagSchema (inferred from rule patterns) to classify tokens:
struct ParsedCommand { command: String, // Command name flags: HashMap<String, Option<String>>, // Recognized flags and their values args: Vec<String>, // Positional arguments raw_tokens: Vec<String>, // Original token list}Unrecognized flags default to boolean (no value consumed). This is a deliberate fallback — runok does not need a complete flag specification because the FlagSchema is derived from what rules actually reference.
Compound command splitting
Section titled “Compound command splitting”extract_commands() uses tree-sitter-bash to parse compound shell expressions and extract individual commands from:
- Pipelines (
cmd1 | cmd2) - Logical operators (
cmd1 && cmd2,cmd1 || cmd2) - Semicolons (
cmd1; cmd2) - Subshells (
(cmd),$(cmd)) - Loops and conditionals (
for,while,if,case) - Redirections (stripped, command extracted)
- Variable assignments with commands