Authored by Joshua Wright | josh@willhackforsushi.com
JSON has become an increasingly important file format in many areas: as a computer programming data source, as a flexible data structure for engineering projects, and as a logging format for many enterprise security tools. To work effectively with JSON data, we need a tool that parses and extracts information that is useful to us as analysts. One of the most powerful tools for processing JSON data is JQ.
In its simplest form, JQ is a JSON beautifier, turning ugly JSON content:
slingshot:~$ cat package.json {"name": "metroezpark","version": "1.0.0","description": "Metro EZ Park Gate Status","main": "server.js","dependencies": {"npm": "^6.0.1","socket.io": "^2.0.1"},"devDependencies": {}, "scripts": {"test": "echo \ "Error: no test specified\" && exit 1","start": "node server.js"},"author": "", "license": "ISC"}
... into something nicer to look at:
slingshot:~$ cat package.json | jq { "name": "metroezpark", "version": "1.0.0", "description": "Metro EZ Park Gate Status", "main": "server.js", "dependencies": { "npm": "^6.0.1", "socket.io": "^2.0.1" }, "devDependencies": {}, "scripts": { "test": "echo \"Error: no test specified\" && exit 1", "start": "node server.js" }, "author": "", "license": "ISC" }
Zeek and JSON
Zeek (formerly Bro) is a network security monitoring system. Among other things, it allows us to take a packet capture and summarize the network events into several different log files. By default, Zeek exports the logging data in a tab-delimited format. With a little tweaking, Zeek can also export logs in JSON format:
slingshot:~$ bro -Cr merged.pcap -e 'redef LogAscii::use_json=T;' slingshot:~$ head -1 conn.log {"ts":1554410064.698965,"uid":"CMreaf3tGGK2whbqhh","id.orig_h":"192.168.144.130","id.orig_p" :64277, "id.resp_h":"192.168.144.2","id.resp_p":53,"proto":"udp","service":"dns","duration": 0.320463,"orig_bytes" :94,"resp_bytes":316,"conn_state":"SF","missed_bytes":0,"history":"Dd","orig_pkts":2, "orig_ip_bytes":150, "resp_pkts":2,"resp_ip_bytes":372,"tunnel_parents":[]}
Once the Zeek logs are in JSON format, we're ready to start extracting data using JQ!
JQ and Zeek Object Access
Zeek's conn.log
file is used to summarize TCP/UDP/ICMP connections. We can use JQ to examine the fields in the connection objects:
slingshot:~$ head -1 conn.log | jq { "ts": 1554410064.698965, "uid": "CMreaf3tGGK2whbqhh", "id.orig_h": "192.168.144.130", "id.orig_p": 64277, "id.resp_h": "192.168.144.2", "id.resp_p": 53, "proto": "udp", "service": "dns", "duration": 0.320463, "orig_bytes": 94, "resp_bytes": 316, "conn_state": "SF", "missed_bytes": 0, "history": "Dd", "orig_pkts": 2, "orig_ip_bytes": 150, "resp_pkts": 2, "resp_ip_bytes": 372, "tunnel_parents": [] }
I used head -1
here just to look at the first conn.log
record. The Zeek log summarizes the connection including source and destination addresses, ports, protocol (TCP, UDP, or ICMP), service (DNS, HTTP, etc.), packets transferred, bytes exchanged, and more.
With JQ you can select specific records from the Zeek log in your query. For example, to obtain the duration value for all connections, add the '.duration'
argument:
slingshot:~$ head -10 conn.log | jq '.duration' 0.320463 0.000602 0.000923 0.00061 0.000602 0.00106 0.271645 0.000756 0.001645 0.001305
(For brevity, I've limited the output here to 10 records. We'll change that shortly.)
Notice here that I've taken the field name duration
and added a dot (.
) to the field name to reference it with JQ (.duration
). This is necessary to access the object member in the JSON record produced by Zeek.
So far, you might wonder if this is terribly useful, since we could probably accomplish similar functionality with grep
. However, consider adding additional fields to the query:
slingshot:~$ head -10 conn.log | jq -j '.duration, ", ", .proto, "\n"' 0.320463, udp 0.000602, udp 0.001859, udp 0.000654, udp 0.019871, udp 0.001863, udp 0.000951, udp 0.037681, tcp 0.000341, tcp 0.00068, udp
Here I've added an argument to jq
-- -j
, which causes the output to be joined together without adding a newline. I've also added a delimiter of ", "
and a newline at the end of the query.
This isn't terribly useful yet, particularly without adding the originating and responding IP addresses. Because Zeek includes a .
in these field names, the syntax for accessing these members is a little different:
slingshot:~$ head -10 conn.log | jq -j '.duration, ", ", .proto, ", ", \ .["id.orig_h"], ":", .["id.orig_p"], ", ", \ .["id.resp_h"], ":", .["id.resp_p"], "\n"' 0.320463, udp, 192.168.144.130:64277, 192.168.144.2:53 0.000602, udp, 192.168.144.130:55106, 192.168.144.2:53 0.001859, udp, 192.168.144.130:53881, 192.168.144.2:53 0.000654, udp, 192.168.144.130:53785, 192.168.144.2:53 0.019871, udp, 192.168.144.130:60696, 192.168.144.2:53 0.001863, udp, 192.168.144.130:59251, 192.168.144.2:53 0.000951, udp, 192.168.144.130:58172, 192.168.144.2:53 0.037681, tcp, 192.168.52.130:49965, 216.58.217.35:443 0.000341, tcp, 192.168.52.130:49960, 173.194.152.39:80 0.00068, udp, 192.168.52.130:57233, 192.168.52.2:53
Note that in this example I've broken up this long command into multiple lines with a backslash at the end of each line. If you type this on one long line, omit the backslashes.
In order to reference JSON object fields that include a .
, we have to use the familiar leading-dot syntax, followed by square brackets and quotation marks. (For example, accessing id.orig_h
shown above is denoted as .["id.orig_h"]
.
Now that you know the basics of accessing Zeek JSON objects with JQ, let's take a look at using functions.
JQ Functions and Zeek
The JQ select
function allows us to perform a Boolean operation on an identified field, returning the record if the operation returns true. For example, we can select all of the records where the number of response bytes (resp_bytes
) is greater or less than a specified value:
sans@slingshot:~$ cat conn.log | jq 'select(.resp_bytes > 300000)' { "ts": 1555622865.402479, "uid": "ClfquL1f58gwiJGY32", "id.orig_h": "192.168.52.132", "id.orig_p": 8, "id.resp_h": "13.107.21.200", "id.resp_p": 0, "proto": "icmp", "duration": 20135.048521, "orig_bytes": 607936, "resp_bytes": 600096, "conn_state": "OTH", "missed_bytes": 0, "orig_pkts": 18998, "orig_ip_bytes": 1139880, "resp_pkts": 18753, "resp_ip_bytes": 1125180, "tunnel_parents": [] }
The Boolean expression accepts and and or modifiers to add additional query elements. Here we apply a similar query, limiting the results to TCP streams:
sans@slingshot:~$ cat conn.log | jq 'select(.resp_bytes > 100000 and .proto == "tcp")' { "ts": 1555622836.884612, "uid": "CHq8ln1G4itLTu76d2", "id.orig_h": "192.168.52.130", "id.orig_p": 49970, "id.resp_h": "216.58.217.54", "id.resp_p": 443, "proto": "tcp", "service": "ssl", "duration": 9.978087, "orig_bytes": 1276, "resp_bytes": 296403, "conn_state": "SF", "missed_bytes": 0, "history": "ShADadFf", "orig_pkts": 98, "orig_ip_bytes": 5208, "resp_pkts": 225, "resp_ip_bytes": 305407, "tunnel_parents": [] }
Another useful JQ feature is the sort_by
function, allowing you to sort the query results in the order in a predictable order. In this example I sort the results of the Zeek log by the stream duration:
slingshot:~$ cat conn.log | jq -s 'sort_by(.duration)' [ { "ts": 1555622836.134114, "uid": "C6qkkP27JNdwc63GKf", "id.orig_h": "192.168.52.130", "id.orig_p": 49960, "id.resp_h": "173.194.152.39", "id.resp_p": 80, "proto": "tcp", "duration": 0.000341, "orig_bytes": 0, "resp_bytes": 0, "conn_state": "SF", "missed_bytes": 0, "history": "fAFa", "orig_pkts": 2, "orig_ip_bytes": 80, "resp_pkts": 2, "resp_ip_bytes": 80, "tunnel_parents": [] }, { "ts": 1554410461.862738, "uid": "Chblb02SMQnKBiPPdg", "id.orig_h": "192.168.144.130", "id.orig_p": 55106, ... snip
If you only want the first record you can pipe the results (within the JQ expression) to .[0]
as an indexed object (e.g. 0 is first record, 1 would follow, etc.)
slingshot:~$ cat conn.log | jq -s 'sort_by(.duration) | .[0]' { "ts": 1555622836.134114, "uid": "C6qkkP27JNdwc63GKf", "id.orig_h": "192.168.52.130", "id.orig_p": 49960, "id.resp_h": "173.194.152.39", "id.resp_p": 80, "proto": "tcp", "duration": 0.000341, "orig_bytes": 0, "resp_bytes": 0, "conn_state": "SF", "missed_bytes": 0, "history": "fAFa", "orig_pkts": 2, "orig_ip_bytes": 80, "resp_pkts": 2, "resp_ip_bytes": 80, "tunnel_parents": [] }
In this sort order, JQ is showing us the smallest duration first. Piping the results to the reverse
function will reverse the sort order:
slingshot:~$ cat conn.log | jq -s 'sort_by(.duration) | reverse | .[0]' { "ts": 1554410064.698965, "uid": "CMreaf3tGGK2whbqhh", "id.orig_h": "192.168.144.130", "id.orig_p": 64277, "id.resp_h": "192.168.144.2", "id.resp_p": 53, "proto": "udp", "service": "dns", "duration": 0.320463, "orig_bytes": 94, "resp_bytes": 316, "conn_state": "SF", "missed_bytes": 0, "history": "Dd", "orig_pkts": 2, "orig_ip_bytes": 150, "resp_pkts": 2, "resp_ip_bytes": 372, "tunnel_parents": [] }
Conclusion
JQ is a powerful tool, and a little experience in using it to access JSON data can be a valuable asset in your toolbox. The best way to learn JQ is to experiment: one terminal open with a JSON file and jq
, and a browser window open to the JQ manual.
Got ideas or questions about JQ? Leave a comment or email me.