Sample Header Ad - 728x90

Transforming nested arrays in Office 365 logs into table

2 votes
1 answer
73 views
I want to release a simple tool based on jq to transform some common Office 365 Unified Audit Log events into a tabular report format, but having challenges with the way certain key arrays are nested. In particular, when i get down to Folders[] that contain sets of Ids, Paths, and FolderItems[] that contain rows of message IDs and sizes, I can't figure out a way to make the related values from the arrays stay in sync / collate - instead I am getting massive combinations of every value as though I'm unintentionally iterating through them. Here's some sample data:
{"CreationTime":"2024-02-06T12:13:14","Id":"abcdabcd-1234-1234-5555-888888888888","Operation":"MailItemsAccessed","ResultStatus":"Succeeded","UserId":"admin@example.com","ClientIPAddress":"5.5.5.5","Folders":[{"FolderItems":[{"InternetMessageId":"","SizeInBytes":12345},{"InternetMessageId":"","SizeInBytes":11122},{"InternetMessageId":"","SizeInBytes":88888}],"Id":"EEEEEEEE","Path":"\\Outbox"},{"FolderItems":[{"InternetMessageId":"","SizeInBytes":44444},{"InternetMessageId":"","SizeInBytes":100000},{"InternetMessageId":"","SizeInBytes":109000},{"InternetMessageId":"","SizeInBytes":22000},{"InternetMessageId":"","SizeInBytes":333333}],"Id":"FFFFFFFFFFFFFFFFFAB","Path":"\\Inbox"}]}
{"CreationTime":"2024-02-06T20:00:00","Id":"abcdabcd-1234-1234-6666-9999999999999","Operation":"MailItemsAccessed","ResultStatus":"Succeeded","UserId":"other@other.cc","ClientIPAddress":"7.7.7.7","Folders":{"FolderItems":[{"InternetMessageId":"","SizeInBytes":77777},{"InternetMessageId":"","SizeInBytes":888888},{"InternetMessageId":"","SizeInBytes":99999}],"Id":"12341234","Path":"\\Temp"}}
Desired output: | CreationTime | Id | UserId | ClientIPAddress | FolderId | FolderPath | InternetMessageId | SizeInBytes | | ------------------- | ------------------------------------ | ----------------- | --------------- | ------------------- | ---------- | ------------------------------ | ----------- | | 2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | admin@example.com | 5.5.5.5 | EEEEEEEE | \\Outbox | abc1234@aaabbbccc.whatever.com | 12345 | | 2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | admin@example.com | 5.5.5.5 | EEEEEEEE | \\Outbox | cccccc@aaabbbccc.whatever.com | 11122 | | 2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | admin@example.com | 5.5.5.5 | EEEEEEEE | \\Outbox | final@host.gmail.com | 88888 | | 2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | admin@example.com | 5.5.5.5 | FFFFFFFFFFFFFFFFFAB | \\Inbox | otherid@host.com | 44444 | | 2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | admin@example.com | 5.5.5.5 | FFFFFFFFFFFFFFFFFAB | \\Inbox | furtherhost@xyz.host2.com | 100000 | | 2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | admin@example.com | 5.5.5.5 | FFFFFFFFFFFFFFFFFAB | \\Inbox | id7@bbb.outlook.com | 109000 | | 2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | admin@example.com | 5.5.5.5 | FFFFFFFFFFFFFFFFFAB | \\Inbox | other@hosthost.com | 22000 | | 2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | admin@example.com | 5.5.5.5 | FFFFFFFFFFFFFFFFFAB | \\Inbox | junk@b8b8b8.newhost.com | 333333 | | 2024-02-06T20:00:00 | 12341234 | other@other.cc | 7.7.7.7 | 12341234 | \\Temp | example@aaabbbccc.whatever.com | 77777 | | 2024-02-06T20:00:00 | 12341234 | other@other.cc | 7.7.7.7 | 12341234 | \\Temp | cccccc4@aaabbbccc.whatever.com | 888888 | | 2024-02-06T20:00:00 | 12341234 | other@other.cc | 7.7.7.7 | 12341234 | \\Temp | final3@host.gmail.com | 99999 | Note that the .Folders element can sometimes come in string format but that I was able to easily conditionally load using fromjson. For example:
[...]"Folders": "[{\"FolderItems\":[{\"InternetMessageId\":\""Fo\",\"SizeInBytes\":12345},[...]
Code so far:
cat | jq '
    if has("Folders") then
        if(.Folders | type=="string") and .Folders != "" then .Folders |= fromjson  end |
        if(.Folders | type=="string") and .Folders == "" then .Folders = null end
    end | .' |     # works up to here at least
    jq '
if has("Item") then .Item |= (if type=="string" and .!="" then fromjson else {} end) else .Item|={}  end |
    if has("Item") then
            if .Item | has("Id") then .ItemId = .Item.Id else .ItemId={} end |
            if .Item | has("ParentFolder") then
                .ItemParentFolderId=.Item.ParentFolder.Id? |
                    .ItemParentFolderPath=.Item.ParentFolder.Path? |
                    .ItemParentFolderName=.Item.ParentFolder.Name?
            end
        end | . ' | cat # works up to here at least
    jq '
    if has("Folders") then
        if (.Folders | select(type=="array")) then
            .Folders[].Id? |
            .FoldersPath=.Folders[].Path? |
            .FoldersFolderItems=.Folders[].FolderItems?
        else . end
    end
    ' |
jq -r '. | (.TimeGenerated // .CreationTime) as $EventTime |
.ClientIP = if .ClientIP == "" then null else .ClientIP end |
.ClientIP_ = if .ClientIP_ == "" then null else .ClientIP_ end |
.Client_IPAddress = if .Client_IPAddress == "" then null else .Client_IPAddress end |
.ClientIPAddress = if .ClientIPAddress == "" then null else .ClientIPAddress end |
.ActorIpAddress = if .ActorIpAddress == "" then null else .ActorIpAddress end |
(.ClientIP // .ClientIP_ // .Client_IPAddress // .ClientIPAddress // .ActorIpAddress) as $IPAddress |
(.UserId // .UserId_) as $LogonUser |
.FFIIMI as $InternetMessageId |
.FFISIB as $SizeInBytes |
{EventTime: $EventTime, IPAddress: $IPAddress, LogonUser: $LogonUser, InternetMessageId: $InternetMessageId, SizeInBytes: $SizeInBytes} + . |
[.Id, .EventTime, .IPAddress, .LogonUser, .MailboxOwnerUPN, .Operation, .InternetMessageId, .SizeInBytes] | @csv'
Asked by whitepaws (121 rep)
Feb 11, 2024, 06:31 AM
Last activity: Feb 8, 2025, 12:06 AM