Wrangle Your PowerShell Transcript Logs with Apache Nifi

Enterprise defenders implementing PowerShell transcript logging face a practical challenge: managing large volumes of flat text files containing significant application noise. This post demonstrates using Apache NiFi to extract relevant data and route it to logging systems like Splunk.

Installation Setup

Download two components:

For Windows systems, the OpenJDK files should be extracted to C:\Java with the bin directory added to environment variables. After extracting NiFi and navigating to its bin directory, executing run-nifi.bat launches the service, accessible at http://localhost:8080/nifi.

Building the Data Flow

The workflow involves connecting three main processors:

GetFile Processor: Reads PowerShell transcript logs from a specified directory recursively. The “Keep Source File” option defaults to false, automatically deleting processed files.

ExtractText Processor: Performs regex pattern matching against file contents. Custom properties define detection patterns for potentially suspicious PowerShell commands, with DOTALL and Multiline modes enabled.

Output Processor: Routes matched results onward. The example uses PutSplunk for direct integration, though PutFile suits development testing.

Data Handling

Matched data flows through AttributesToJSON conversion before transmission. The resulting JSON includes extracted field values corresponding to regex capture groups plus file metadata. In Splunk, custom attribute names from ExtractText become searchable fields.

Key Takeaways

This approach filters transcript logs before ingestion, conserving quota costs. While not production-tested at scale, preliminary testing suggests NiFi efficiently processes large transcript volumes without performance degradation.