Tags:
If you are consolidating all of the logs from your servers, firewalls, IDS sensors and other devices into text files on a protected server, how are you going to search all that data? Or if you have textual output from analysis or scanning tools, how can you extract just the lines that match at least one of the regular expression patterns from your set of patterns?
Search-TextLog.ps1
The PowerShell script shown below (Search-TextLog.ps1) reads a file containing one or more regular expression patterns (signatures.txt) and compares every line of a log file (iis.log) against every one of those patterns. The script can search any text file of any type, not just log files, as long as the text file has only one entry per line.
Search-TextLog.ps1 -Path IIS.log -PatternsFile Signatures.txt
The patterns file will contain a description of what each regex indicates if a match to that pattern is found. When the search completes, a summary report is shown with the count of matches to each pattern. The output is an array of objects with three properties on which you can filter: Count, Description, and the regex Pattern itself.
Get the script from the SEC505 zip file at BlueTeamPowerShell.com (look in the Day4\IIS\Log_Analysis folder inside the zip). All of my SANS PowerShell scripts are in the public domain. The zip file is a simple download, no email registration required or other hassles. The zip file includes a sample log file to search (iis.log) and a sample patterns file (signatures.txt) with which to search it. (Please note that the sample patterns and their descriptions are just for illustration, please don't get vexed about their details.)
Alternatively, if you use the -ShowMatchedLines switch with the script, the summary will not be shown, but every line from the log which matched at least one pattern will be outputted (and the line will be output only once, no matter how many additional patterns it might have also matched). This is much faster because not every regular expression pattern must be tested.
Search-TextLog.ps1 -Path IIS.log -PatternsFile Signatures.txt -ShowMatchedLines
The Patterns File (signatures.txt)
The file with the regular expression patterns does not have to be named "signatures.txt". In real life, you would have a different patterns file for each type of log you wanted to search, e.g., syslog, web, ftp, smtp, firewall, etc. You will look for different things in different types of files/logs, hence, you'll have different regular expressions for each. The format of each line in the patterns file must be "<regexpattern> <tab> <description>". Blank lines and lines which begin with hashmarks (#) or semicolons (;) are ignored.
If you are going to use the -ShowMatchedLines switch a lot, you can optimize your searches by putting the patterns which will match the most entries near the top of the patterns file.
Performance
PowerShell code is JIT-compiled and object-oriented for friendliness, but the price of this friendliness is slower performance. The above Signatures.txt file has 35 regular expression patterns. Using PowerShell 7.1 on an Intel Core i7 4790 at 3.6GHz, a log file with 2M lines can be searched with this Signatures.txt list at a rate of about 50k lines per second, or about 40 seconds total.
In the zip file above, in the same folder as the Search-TextLog.ps1 script, is another script written in Julia named "JuliaSearchTextLog.jl". This script does the same thing as the PowerShell script and does it (mostly) the same way. Julia is usually faster than both PowerShell and Python. On the same computer with the same Signatures.txt file and the same 2M line log file, the Julia script processes about 82k lines per second, or about 25 seconds total, for a 64% increase.
A compiled executable written in C/C++ or Rust would be even faster. And hand-crafted Assembly, optimized for your particular CPU and GPU, would be faster still.
So are Julia, Rust and Assembly better than PowerShell? The raw performance would be better, but what if you need to customize the source code and you don't know anything about those other languages? There is a trade-off for every choice. PowerShell is optimized for admin friendliness, not maximum performance.
The above Julia script, by the way, outputs its search summary as JSON text. In a PowerShell terminal, then, that means you can convert the JSON to proper objects like this:
julia.exe JuliaSearchTextLog.jl iis.log signatures.text | ConvertFrom-JSON
PowerShell can orchestrate other tools, especially when those other tools output JSON, CSV or XML data. This is where PowerShell shines, when it's used to quickly "glue" together other scripts and executables without a lot of fuss.
History
26.Jul.2021: Added some performance optimizations and wrote the Julia script for illustration.