top of page

Shrink your windows event logs license costs with ingest actions!

  • Writer: Gabriel Vasseur
    Gabriel Vasseur
  • Oct 27
  • 10 min read

Windows events are a large part of the volume of logs ingested in a lot of splunk deployments. Wouldn't be cool if we could shrink them so they don't eat up so much precious precious license?


In this post I'll walk through how I rebuilt Windows Event Logs (WELs) into a compact, Splunk-friendly format, cuting size by up to 60% without breaking field extractions.


Key takeaways


  • With a few targeted ingest actions and props/transforms tweaks, you can shrink Windows logs dramatically, without losing data.

  • My Conf Manager app is awesome! It was key in seeing before/after diffs, as well as navigating the search-time configuration to figure out what props and transforms needed tweaks.

  • Building on my testing methodology, you may too gain confidence in making this daunting but very rewarding change in your production environment.


The idea


Let’s start with a typical Windows Security 4624 log event and see just how much dead weight it carries:

<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'><System><Provider Name='Microsoft-Windows-Security-Auditing' Guid='{54849625-5478-4994-a5ba-3e3b0328c30d}'/><EventID>4624</EventID><Version>3</Version><Level>0</Level><Task>12544</Task><Opcode>0</Opcode><Keywords>0x8020000000000000</Keywords><TimeCreated SystemTime='2025-10-21T10:18:26.8317410Z'/><EventRecordID>645139</EventRecordID><Correlation ActivityID='{03ee45ab-4261-0006-3f46-ee036142dc01}'/><Execution ProcessID='604' ThreadID='2800'/><Channel>Security</Channel><Computer>Gabs</Computer><Security/></System><EventData><Data Name='SubjectUserSid'>NT AUTHORITY\SYSTEM</Data><Data Name='SubjectUserName'>GABS$</Data><Data Name='SubjectDomainName'>WORKGROUP</Data><Data Name='SubjectLogonId'>0x3e7</Data><Data Name='TargetUserSid'>NT AUTHORITY\SYSTEM</Data><Data Name='TargetUserName'>SYSTEM</Data><Data Name='TargetDomainName'>NT AUTHORITY</Data><Data Name='TargetLogonId'>0x3e7</Data><Data Name='LogonType'>5</Data><Data Name='LogonProcessName'>Advapi</Data><Data Name='AuthenticationPackageName'>Negotiate</Data><Data Name='WorkstationName'>-</Data><Data Name='LogonGuid'>{00000000-0000-0000-0000-000000000000}</Data><Data Name='TransmittedServices'>-</Data><Data Name='LmPackageName'>-</Data><Data Name='KeyLength'>0</Data><Data Name='ProcessId'>0x3f0</Data><Data Name='ProcessName'>C:\Windows\System32\services.exe</Data><Data Name='IpAddress'>-</Data><Data Name='IpPort'>-</Data><Data Name='ImpersonationLevel'>%%1833</Data><Data Name='RestrictedAdminMode'>-</Data><Data Name='RemoteCredentialGuard'>-</Data><Data Name='TargetOutboundUserName'>-</Data><Data Name='TargetOutboundDomainName'>-</Data><Data Name='VirtualAccount'>%%1843</Data><Data Name='TargetLinkedLogonId'>0x0</Data><Data Name='ElevatedToken'>%%1842</Data></EventData></Event>

It is 1823 bytes long.

In our first example we'll aim to lose no information. We can rewrite the event as follows:

<S>Provider_Name=Microsoft-Windows-Security-Auditing<Provider_Guid={54849625-5478-4994-a5ba-3e3b0328c30d}<EventID=4624<Version=3<Level=0<Task=12544<Opcode=0<Keywords=0x8020000000000000<TimeCreated_SystemTime=2025-10-21T10:18:26.8317410Z<EventRecordID=645139<Correlation_ActivityID={03ee45ab-4261-0006-3f46-ee036142dc01}<Execution_ProcessID=604<Execution_ThreadID=2800<Channel=Security<Computer=Gabs</S><ED>SubjectUserSid=NT AUTHORITY\SYSTEM<SubjectUserName=GABS$<SubjectDomainName=WORKGROUP<SubjectLogonId=0x3e7<TargetUserSid=NT AUTHORITY\SYSTEM<TargetUserName=SYSTEM<TargetDomainName=NT AUTHORITY<TargetLogonId=0x3e7<LogonType=5<LogonProcessName=Advapi<AuthenticationPackageName=Negotiate<WorkstationName=-<LogonGuid={00000000-0000-0000-0000-000000000000}<TransmittedServices=-<LmPackageName=-<KeyLength=0<ProcessId=0x3f0<ProcessName=C:\Windows\System32\services.exe<IpAddress=-<IpPort=-<ImpersonationLevel=%%1833<RestrictedAdminMode=-<RemoteCredentialGuard=-<TargetOutboundUserName=-<TargetOutboundDomainName=-<VirtualAccount=%%1843<TargetLinkedLogonId=0x0<ElevatedToken=%%1842</ED>

It is now 1084 bytes. This is a saving of 739 bytes or 40.5%! And no data was lost!


The key idea starts with noticing that this format wastes a lot of space:

<Data Name='key'>value</Data>

My main idea (and the easiest to implement too) is to replace it with:

key=value<

This is still easily and reliably parsable by Splunk at search time because:

  • I wouldn't expect key to ever include an =, so anything until the first = is the key

  • value cannot include a <, otherwise it wouldn't be valid XML. So it's safe to use < as a separator for the next key-value pair.


By the way, have you ever wondered what happens to data with a '<' in a windows event? Microsoft changes it to &lt;. The Splunk windows TA does not try to fix it, so if you ever search data looking for a '<' you'll have to adapt your search accordingly.


The <System>...</System> area is trickier to rewrite, but we can still apply similar principles.


Speaking of the System area... in my opinion it contains a lot of dead weight, which is not really bringing any information about the actual event covered in the <EventData>...</EventData> section. So pushing things further we can rewrite the same event as:

2025-10-21T10:18:26.8317410Z EventID=4624 Keywords=0x8020000000000000 Computer=Gabs<ED>SubjectUserSid=NT AUTHORITY\SYSTEM<SubjectUserName=GABS$<SubjectDomainName=WORKGROUP<SubjectLogonId=0x3e7<TargetUserSid=NT AUTHORITY\SYSTEM<TargetUserName=SYSTEM<TargetDomainName=NT AUTHORITY<TargetLogonId=0x3e7<LogonType=5<LogonProcessName=Advapi<AuthenticationPackageName=Negotiate<WorkstationName=-<LogonGuid={00000000-0000-0000-0000-000000000000}<TransmittedServices=-<LmPackageName=-<KeyLength=0<ProcessId=0x3f0<ProcessName=C:\Windows\System32\services.exe<IpAddress=-<IpPort=-<ImpersonationLevel=%%1833<RestrictedAdminMode=-<RemoteCredentialGuard=-<TargetOutboundUserName=-<TargetOutboundDomainName=-<VirtualAccount=%%1843<TargetLinkedLogonId=0x0<ElevatedToken=%%1842</ED>

This is now 765 bytes, which means a 58% size reduction compared to the original! Sure, some low-value data was lost. If that bothers you, you can stop at the previous step or tweak the approach to save the bits you want to keep.


This new rewrite has a hidden extra advantage. Because whitespace is a hard token delimeter, EventID=4624 now becomes an indexed token... This means you can now run tstats searches that effectively work as if the EventCode was an indexed field...! *mind blown*

EventCode effectively now an indexed field at no extra cost!
EventCode effectively now an indexed field at no extra cost!

And yes we can go further still, although now it's becoming a bit more destructive: if a value is empty or '-' do we really need it in the raw event? We would end up with this:


2025-10-21T10:18:26.8317410Z EventID=4624 Keywords=0x8020000000000000 Computer=Gabs<ED>SubjectUserSid=NT AUTHORITY\SYSTEM<SubjectUserName=GABS$<SubjectDomainName=WORKGROUP<SubjectLogonId=0x3e7<TargetUserSid=NT AUTHORITY\SYSTEM<TargetUserName=SYSTEM<TargetDomainName=NT AUTHORITY<TargetLogonId=0x3e7<LogonType=5<LogonProcessName=Advapi<AuthenticationPackageName=Negotiate<LogonGuid={00000000-0000-0000-0000-000000000000}<KeyLength=0<ProcessId=0x3f0<ProcessName=C:\Windows\System32\services.exe<ImpersonationLevel=%%1833<VirtualAccount=%%1843<TargetLinkedLogonId=0x0<ElevatedToken=%%1842</ED>

Which is 590 bytes. That's a 73% size reduction compared to the original event! Granted, that last step is destructive. You can always implement it only for specific fields for specific EventCodes, though. Something to keep in mind.


Disclaimers


Do this at your own risk!


The moment you save an ingest action it starts modifying your data BEFORE it is written to an index, and once it's in there it can never be changed.


If you want to do something like this, be very careful and run your own testing before, during and after, to convince yourself you're safe. In the rest of this article, I'm going into the methodology I used to convince myself to make the changes and minimise impact on production. It's up to you to define your own tests and methodology to satisfy yourself in your environment.


Of course you'll want to put the props & transforms in place BEFORE you save the ingest actions.


You *should* be able to run historical searches spanning both the before and after eras and the results should be seamlessly what you expect. Again, I'm giving zero guarantees.


Ingest actions cause your indexers to do more work at index time. This might have performance implications. Be warned.


With that out of the way...


Step 1: Ingest Actions


The first step is to decide how you want your data to end up looking. This will dictate what regular expressions to use in your ingest actions, and the work you have to do to make sure fields are still extracted as before.


The easiest step is to play in the search bar. You can safely do this in your production environment, either with your actual data or with run-anywhere searches such as this one:


| stats count as _raw
| eval _raw="<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'><System><Provider Name='Microsoft-Windows-Security-Auditing' Guid='{54849625-5478-4994-a5ba-3e3b0328c30d}'/><EventID>4624</EventID><Version>3</Version><Level>0</Level><Task>12544</Task><Opcode>0</Opcode><Keywords>0x8020000000000000</Keywords><TimeCreated SystemTime='2025-10-21T10:18:26.8317410Z'/><EventRecordID>645139</EventRecordID><Correlation ActivityID='{03ee45ab-4261-0006-3f46-ee036142dc01}'/><Execution ProcessID='604' ThreadID='2800'/><Channel>Security</Channel><Computer>Gabs</Computer><Security/></System><EventData><Data Name='SubjectUserSid'>NT AUTHORITY\SYSTEM</Data><Data Name='SubjectUserName'>GABS$</Data><Data Name='SubjectDomainName'>WORKGROUP</Data><Data Name='SubjectLogonId'>0x3e7</Data><Data Name='TargetUserSid'>NT AUTHORITY\SYSTEM</Data><Data Name='TargetUserName'>SYSTEM</Data><Data Name='TargetDomainName'>NT AUTHORITY</Data><Data Name='TargetLogonId'>0x3e7</Data><Data Name='LogonType'>5</Data><Data Name='LogonProcessName'>Advapi</Data><Data Name='AuthenticationPackageName'>Negotiate</Data><Data Name='WorkstationName'>-</Data><Data Name='LogonGuid'>{00000000-0000-0000-0000-000000000000}</Data><Data Name='TransmittedServices'>-</Data><Data Name='LmPackageName'>-</Data><Data Name='KeyLength'>0</Data><Data Name='ProcessId'>0x3f0</Data><Data Name='ProcessName'>C:\Windows\System32\services.exe</Data><Data Name='IpAddress'>-</Data><Data Name='IpPort'>-</Data><Data Name='ImpersonationLevel'>%%1833</Data><Data Name='RestrictedAdminMode'>-</Data><Data Name='RemoteCredentialGuard'>-</Data><Data Name='TargetOutboundUserName'>-</Data><Data Name='TargetOutboundDomainName'>-</Data><Data Name='VirtualAccount'>%%1843</Data><Data Name='TargetLinkedLogonId'>0x0</Data><Data Name='ElevatedToken'>%%1842</Data></EventData></Event>"
| rex mode=sed "s/^<Event\b[^>]*>(.)/\1/"

``` SYSTEM ZERO LOSS ```
| rex mode=sed "s/<(\w+) (\w+)='([^'>]*)'(.*?\/>)/<\1_\2>\3<\/\1_\2><\1\4/g"
| rex mode=sed "s/<(\w+) (\w+)='([^'>]*)'(.*?\/>)/<\1_\2>\3<\/\1_\2><\1\4/g"
| rex mode=sed "s/<(\w+) (\w+)='([^'>]*)'(.*?\/>)/<\1_\2>\3<\/\1_\2><\1\4/g"
| rex mode=sed "s/<\w+\/>//g"
| rex mode=sed "s/<(\w+)>([^<]*)<\/\w+>/\1=\2</g"
| rex mode=sed "s/<System>/<S>/"
| rex mode=sed "s/<<\/System>/<\/S>/"

``` SYSTEM OPTIMISED ```
```| rex mode=sed "s/(<System><Provider [^>]+EventSourceName='([^']+)')/ProviderSrc='\2' \1/"
| rex mode=sed "s/(<System>.{0,200}<EventID (\w+)='([^']+)')/\2='\3' \1/"
| rex mode=sed "s/<System><Provider Name='Microsoft-Windows-PowerShell.{0,100}?><EventID>([^<]+)<.{0,200}?><Keywords>([^<]+)<\/Keywords>.{0,50}?<TimeCreated SystemTime='([^']+)'.{0,50}?><EventRecordID>([^<]+).{0,250}?><Computer>([^<]+).{0,150}?<\/System>/\3 EventID=\1 EventRecordID=\4 Keywords=\2 Computer=\5/"
| rex mode=sed "s/^(.*?)<System><Provider Name='([^']*)'.{0,200}?><EventID[^>]*>([^<]+)<.{0,200}?><Keywords>([^<]+)<\/Keywords>.{0,50}?<TimeCreated SystemTime='([^']+)'.{0,250}?><Channel>(?:System|Application)<\/Channel.{0,50}<Computer>([^<]+).{0,150}?<\/System>/\5 EventID=\3 Provider='\2' Keywords=\4 \1Computer=\6/"
| rex mode=sed "s/<System>.{0,200}?><EventID>([^<]+)<.{0,200}?><Keywords>([^<]+)<\/Keywords>.{0,50}?<TimeCreated SystemTime='([^']+)'.{0,300}?><Computer>([^<]+).{0,150}?<\/System>/\3 EventID=\1 Keywords=\2 Computer=\4/"
```

```THE REST```
| rex mode=sed "s/<Data Name='(\w+)'>([^<]*?)<\/Data>/\1=\2</g"
| rex mode=sed "s/(.)<\/Event>/\1/"
| rex mode=sed "s/<?<(\/?)EventData>/<\1ED>/g"

This gives you both the "zero-loss" and the "optimised" approaches (tweak the comments accordingly). You can use either as a starting point or just ignore the whole <system>...</system> area and only implement the more straightforward substitutions, that's already a significant saving.


Note: "mask" ingest action can only replace things, not just remove things. So you'll often see (.) in the regex which captures a single character and \1 in the replace expression which refers to the captured character.


One thing you'll want to make sure of, before you're at the stage of enabling your ingest actions (after you've done all the testing we're about to go into) is: have the sourcetype specified explicitly in your inputs.conf, for instance:

[WinEventLog://Security]
source = XmlWinEventLog:Security
sourcetype = XmlWinEventLog
renderXml = True

Even if you think you don't need to do this, it will probably save you a world of pain wondering why your ingest actions do not work. (source/sourcetype rewrite are *evil*, best to get them right at the source).


Step 2: setup the test environment


This is how I did it.


I set up a windows machine with:

  • sysmon installed and configured

  • tweaked group policies to enable logging of powershell and other specific events such as accounts added to security group and new process created with command line (4688) etc.

  • anything else you can do to generate events that will be similar to the XmlWinEventLog events you have in production

  • the free splunk trial installed

  • my Conf Manager app: don't forget the dependency and to enable the K.O. updating searches.

  • the windows TA

  • the sysmon TA


Step 3: get a baseline for testing


I do a few actions designed to generate at least some specific events I'm interested in:

  • reboot the machine

  • screw up my password when logging in at least once

  • run some basic commands in powershell

  • create a new account and make it an administrator

  • etc.


I have dashboards I am sharing below to make the following steps much easier and scalable, so don't panic. But let's assume a manual approach just to go through the idea:


In splunk, I randomly pick up one event of each of the key EventCodes I'm worried about across sysmon, powershell, security, application and system logs. For each event I write a very simple splunk search containing just the precise timestamp of the event (as it appears in _raw) and optionally some other specific keywords featuring in the event. The point of the search is to return only that one event, and it should work even after my ingest actions have run, even if the field extractions are all screwed. For each search, I save all the fields it returns (including all the extracted and enriched and calculated fields) in a BEFORE lookup.


Here's an example:

source=XmlWinEventLog:Security 4624 2025-06-13T12:03:05.7947746Z 
| sort - _indextime
| head 1
| table *
| outputlookup before_security_4624.csv

Note the use of _indextime to get only the most recently ingested version of the event.


Step 4: Implement the ingest actions


Implement the ingest actions for sourcetype=XmlWinEventLog. You can copy-paste the regex and the replace expression from the s/.../.../ expression of the sed commands you were playing with in Step 1. Here's an example:


ree

Apply and save.


Now we need to re-ingest all the events again.

  • stop splunk

  • cd to $SPLUNK_HOME

  • cd to .\var\lib\splunk\modinputs\WinEventLog\

  • delete all the files there (del * or rm *)

  • start splunk


Hopefully that should ingest again all the same events, but this time the ingest actions will rewrite them before indexing.


Step 5: Search-time configuration iterations


Now we can re-run the same search as earlier, but this time saving the results to an AFTER lookup:

source=XmlWinEventLog:Security 4624 2025-06-13T12:03:05.7947746Z
| sort - _indextime
| head 1
| table *
| outputlookup after_security_4624.csv

Then I can use my Conf Manager app to compare the two lookups. Go to Tools > Diff search results. Enter | inputlookup before_security_4624.csv for the old SPL and | inputlookup after_security_4624.csv for the new SPL.


Here's an example for another event code:

ree

Let's list a number of interest things to notice:

  • _raw is smaller! This means our ingest actions worked. You can see in this example we went for the loss-less approach

  • a number of fields, such as dest and process_name are missing from the after lookup, this is because the fields extractions do not know yet how to extract them.

  • some internal fields have changed, such as _bkt, _indextime, etc. This is because of the re-ingest and is not related to what's in the event

  • automatically generated fields like punct can't help but be different.


Now we need to figure out what props and transforms we need to add to restore the fields extractions. Go to Conf Manager, Search K.O.s > Search time configuration. Search for the sourcetype:

ree

This will show you all the search-time configuration in the order it's applied at search-time. For instance:

ree

Notice:

  • the REPORT-0xml_block_extract prop is listing a few transforms names, among which is eventdata_xml_block (vertical green arrow)

  • the eventdata_xml_block transform definition is listed below (horizontal green arrow)

  • its purpose is to extract the EventData_Xml field from _raw

  • the next prop is REPORT-0xml_kv_extract and among others it lists the eventdata_xml_data transform (blue arrows)...

  • ...whose purpose is clearly to extract the key value pairs from the traditional format of the <EventData>...</EventData> area of the old-school windows logs


So a good place to start would be to add to this so that the new style of event also gets the EventData_Xml field extracted. And while we're at it let's add the fields extractions inside it.


This is what that could look like in props:

REPORT-0xml_block_extract_gabs = eventdata_xml_block_gabs

REPORT-0xml_kv_extract = system_props_xml_kv, system_props_xml_attributes, eventdata_xml_data, eventdata_xml_data_gabs, rendering_info_xml_data

And in transforms:

[eventdata_xml_block_gabs]
FORMAT = EventData_Xml::$1
MV_ADD = 1
REGEX = (?ms)<ED>(.*?)<\/ED>

[eventdata_xml_data_gabs]
CLEAN_KEYS = 0
FORMAT = $1::$2
MV_ADD = 1
REGEX = (?<=<|^)(\w+)=([^<]*)(?:<|$)
SOURCE_KEY = EventData_Xml

Similar will be needed for System_Props_Xml.


After making search-time config changes, simply repeat Step 5, until all the fields appear as close to the original as possible. If you have to tweak the ingest actions (to be more agressive, or because you took out too much (for instance the first time I didn't appreciate that Keywords is important for some events)), then repeat from Step 3 (after first disabling the ingest actions).


To complement the approach you should also explore the props and transforms in the windows and sysmon TAs and ask yourself: would this still work with the new format?


Step 6: list all the changes needed


These are the local/props.conf and local/transforms.conf I ended up with, for inspiration only. I do not guarantee they are fully compatible with the ingest actions suggested above as I went through a number of iterations and frankly I can't remember.


Windows TA props:

[XmlWinEventLog]
REPORT-0xml_kv_extract = system_props_xml_kv,system_props_xml_attributes,system_props_xml_gabs,eventdata_xml_data,eventdata_xml_data_gabs,rendering_info_xml_data
EVAL-TimeCreated = coalesce( TimeCreated, TimeCreated_SystemTime)
REPORT-0xml_block_extract_gabs = system_xml_block_gabs,eventdata_xml_block_gabs

[source::XmlWinEventLog:Microsoft-Windows-PowerShell/Operational]
EXTRACT-dest_for_microsoft_windows_powershell_gabs = Computer=(?<dest>[^<]+)<
REPORT-contextinfo_fields_gabs = contextinfo_fields_extraction_gabs

[source::XmlWinEventLog:Security]
EXTRACT-1IpAddress_for_windows_security_from_xml_gabs = [<>]IpAddress=(?!\:\:1)(?!127\.0\.0\.1)(?<src_ip>[^\<]+)<
EXTRACT-dest_for_windows_security_from_xml_gabs = EventID=(5156|5157) .*[<>]DestAddress=(?<dest>(?<dest_ip>[^<]+))<
EXTRACT-app_for_windows_security_from_xml_gabs = EventID=(5156|5157) .*[<>]Application=(?<app>[^<]+)<
EXTRACT-dest_for_windows_security_4798_gabs = EventID=4798 .*[<>]TargetDomainName=(?<dest>[^<]+)<
EXTRACT-rule_for_windows_security_from_xml_gabs = EventID=(5156|5157) .*[<>]FilterRTID=(?<rule>[^<]+)<
EXTRACT-src_ip_for_windows_security_from_xml_gabs = EventID=(5156|5157) .*[<>]SourceAddress=(?<src_ip>[^<]+)<
EXTRACT-dest_port_for_windows_security_from_xml_gabs = [<>]DestPort=(?<dest_port>[^<]+)<
EXTRACT-object_attrs_for_windows_security_from_xml_gabs = [<>]RuleName=(?<object_attrs>[^<]+)<
EXTRACT-process_for_windows_security_from_xml_gabs = [<>]ProcessName=(?<process>[^<]+)<
EXTRACT-new_process_for_windows_security_from_xml_gabs = [<>]NewProcessName=(?<new_process>[^<]+)<
EXTRACT-parent_process_for_windows_security_from_xml_gabs = [<>]ParentProcessName=(?<parent_process>[^<]+)<
EXTRACT-new_process_id_for_windows_security_from_xml_gabs = [<>]NewProcessId=(?<new_process_id>[^<]+)<
EXTRACT-process_id_for_windows_security_from_xml_gabs = [<>](ProcessI(d|D))=(?<process_id>[^<]+)<
EXTRACT-process_command_line_for_xml_gabs = [<>]CommandLine=(?<Process_Command_Line>[^<]+)<
EXTRACT-dest_for_windows_security_gabs = Computer=(?<dest>[^<]+)<

[source::XmlWinEventLog:System]
EXTRACT-bestmatch_for_windows_system_xml_gabs = Computer=(?<dest>[^<]+)<

Windows TA transforms:

[rendering_info_xml_data_gabs]
CLEAN_KEYS = 0
FORMAT = $1::$2
MV_ADD = 1
REGEX = (?<=<\s|^)(\w+)=(?s)([^<]*)<\s
SOURCE_KEY = RenderingInfo_Xml

[contextinfo_fields_extraction_gabs]
CLEAN_KEYS = 0
FORMAT = $1::$2
MV_ADD = 1
REGEX = (?<=[\r\n]|^)(\w+)=([^\r\n]*)

[system_xml_block_gabs]
FORMAT = System_Props_Xml::$1
MV_ADD = 1
REGEX = ^\d{4}\S+ (.*?)<ED>

[eventdata_xml_block_gabs]
FORMAT = EventData_Xml::$1
MV_ADD = 1
REGEX = (?ms)<ED>(.*?)<\/ED>

[eventdata_xml_data_gabs]
CLEAN_KEYS = 0
FORMAT = $1::$2
MV_ADD = 1
REGEX = (?<=<|^)(\w+)=([^<]*)(?:<|$)
SOURCE_KEY = EventData_Xml

[system_props_xml_gabs]
CLEAN_KEYS = 0
FORMAT = $1::$2
MV_ADD = 1
REGEX = (?:^| )(\w+)='?([^'=]+?)'?(?= \w+=|$)
SOURCE_KEY = System_Props_Xml

Sysmon TA props:

[source::XmlWinEventLog:Microsoft-Windows-Sysmon/Operational]
EVAL-EventChannel = coalesce( EventChannel, Channel )


Step 7: check compatibility with old


Repeat Step 2 and 3 to have a fresh test environment.

Do NOT add the ingest actions. And no need to re-ingest the logs.

Instead, add all search-time tweaks listeed in Step 6.

Do step5: does it look the same?


This is testing that the old and the new can cohabit and are not causing issues.


This means you can now safely implement the search-time config in production.


Step 8: one last check


If you're paranoid (and you should be), run step 4 and 5 again in the new environment: everything should be perfect without any tweaks needed.


Here's an example for sysmon's 13:

ree
ree
ree

Notice:

  • EventData_Xml and System_Props_Xml are still there but smaller.

  • Ambigous fields such as Guid, Name, ProcessID are gone. They sound important but actually they were just fluff from the <System>...</System>. For instance, ProcessID is nothing to do with ProcessId! Instead we now have clearer fields like Provider_Guid, Provider_Name, and Execution_ProcessID

Once you've checked as many different EventCode as you can, that's about as prepared as you can be.


You are now ready to implement the ingest actions in production. (Assuming you've already implemented the search time configuration in the previous step). Then check all your use cases: they should still work happily.


Good luck!


Dashboards


As I mentioned, here are the dashboards I made for myself to be able to test as many events as possible across all the different WELs without going insane. They also recap the steps for you so should be easy to follow. Start with wel_rewrite_test_generator.xml and follow its instructions.



Again, they are provided without any guarantee or support. Just as inspiration. The idea is: if you know what you're doing, they'll just save you time and you can turn them into whatever you need. And if you don't know what you're doing, hopefully they won't work for you and keep you safe from yourself.



Conclusion


With a bit of regex foo, you can cut Windows Event Logs license usage by about half, if not more. That means:

  • you can ingest more logs without more license

  • and/or you can keep the same logs longer without more storage


With a rigorous testing methodology, you can reduce the risks of this major change to an acceptable level and make the transition successfully with minimal impact to production.

Comments


©2021 by Gabriel Vasseur. Proudly created with Wix.com

bottom of page