Transforming Logs Like a Pro with Vector.dev#
Vector is a high-performance observability data pipeline that enables you to collect, transform, and route logs with ease. In this post, we’ll walk through how to use Vector to transform logs for better observability and operational efficiency.
🧠 Why Log Transformation Matters#
Log data is often messy, inconsistent, or too verbose. Before sending logs to your SIEM, log aggregator, or storage system, it’s often necessary to:
- Normalize field names and formats
- Redact sensitive information
- Enrich logs with metadata
- Reduce noise by filtering unimportant logs
⚙️ What is Vector?#
Vector.dev is a lightweight, ultra-fast tool written in Rust that lets you build streaming data pipelines. Key features:
- Sources: Where data comes from (e.g., files, syslog, journald, Kubernetes)
- Transforms: Modify logs using VRL (Vector Remap Language)
- Sinks: Where logs go (e.g., Elasticsearch, Loki, S3, Kafka)
Prerequisites#
While not required to finish this walkthrough, this article assumes you have some knowledge of Kubernetes and Helm.
Tools used in this article#
## 🛠️ Gathering logs from Kubernetes pods
For this walkthrough, we will deploy a dev keycloak instance to gather logs from.
We will use the vector kubernetes logs source to get logs from our pods.
Create a k3d cluster#
- Create our k3d cluster
k3d cluster create keycloak-cluster
Deploy and Configure Keycloak#
- Deploy Keycloak in dev mode and ensure it is streaming logs in JSON format.
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install keycloak bitnami/keycloak -n keycloak --create-namespace --set logging.output=json --set logging.level=debug
- We can now take a look at the JSON logs from Keycloak to get an idea of what data we are getting back. It could take a few minutes for Keycloak to start up and stream logs.
kubectl logs -n keycloak -l app.kubernetes.io/name=keycloak -c keycloak | jq .
output:
{
"timestamp": "2025-04-22T15:11:48.22487318Z",
"sequence": 60413,
"loggerClassName": "org.jboss.logging.Logger",
"loggerName": "org.keycloak.transaction.JtaTransactionWrapper",
"level": "DEBUG",
"message": "JtaTransactionWrapper end. Request Context: HTTP GET /realms/master",
"threadName": "executor-thread-1",
"threadId": 24,
"mdc": {},
"ndc": "",
"hostName": "keycloak-0",
"processName": "/opt/bitnami/java/bin/java",
"processId": 1
}
As we see the logs output from keycloak is not bad, but we can add some useful information.
Deploy and Configure Vector#
- We now can deploy vector to our cluster to start gathering logs. I will be using the Vector helm chart to deploy Vector. You may see an error
Error: INSTALLATION FAILED: 1 error occurred: * Service "vector" is invalid: spec.ports: Required valuethis can be safely ignored.
helm repo add vector https://helm.vector.dev
helm repo update
cat <<-'VALUES' > values.yaml
role: Agent
customConfig:
data_dir: "vector-console-data/"
sources:
pod_logs:
type: "kubernetes_logs"
sinks:
standard_out:
type: "console"
inputs:
- pod_logs
encoding:
codec: "json"
extraVolumeMounts:
- name: vector-console-data
mountPath: /vector-console-data
extraVolumes:
- name: vector-console-data
emptyDir: {}
VALUES
helm install vector vector/vector -n vector --create-namespace --values values.yaml
- In the values above we specify that we want to get logs from kubernetes_pods vector source and send those logs to console which we can see when we get the logs from vector pods.
- After waiting a couple of minutes we can get the pod logs of out vector pod and see that it is indeed grabbing the pods logs from all pods running in our cluster.
kubectl logs -n vector --selector=app.kubernetes.io/name=vector | jq .
{
"file": "/var/log/pods/kube-system_coredns-ccb96694c-fpqvr_cf31755d-9f6f-4749-bbe0-2dec6b633769/coredns/0.log",
"kubernetes": {
"container_id": "containerd://1066c3be6fa76983bb72231bc9975e3c7c65d28dc6bc60abf7168b9f289b0539",
"container_image": "rancher/mirrored-coredns-coredns:1.12.0",
"container_image_id": "docker.io/rancher/mirrored-coredns-coredns@sha256:82979ddf442c593027a57239ad90616deb874e90c365d1a96ad508c2104bdea5",
"container_name": "coredns",
"namespace_labels": {
"kubernetes.io/metadata.name": "kube-system"
},
"node_labels": {
"beta.kubernetes.io/arch": "arm64",
"beta.kubernetes.io/instance-type": "k3s",
"beta.kubernetes.io/os": "linux",
"kubernetes.io/arch": "arm64",
"kubernetes.io/hostname": "k3d-keycloak-cluster-server-0",
"kubernetes.io/os": "linux",
"node-role.kubernetes.io/control-plane": "true",
"node-role.kubernetes.io/master": "true",
"node.kubernetes.io/instance-type": "k3s"
},
"pod_ip": "10.42.0.3",
"pod_ips": [
"10.42.0.3"
],
"pod_labels": {
"k8s-app": "kube-dns",
"pod-template-hash": "ccb96694c"
},
"pod_name": "coredns-ccb96694c-fpqvr",
"pod_namespace": "kube-system",
"pod_node_name": "k3d-keycloak-cluster-server-0",
"pod_owner": "ReplicaSet/coredns-ccb96694c",
"pod_uid": "cf31755d-9f6f-4749-bbe0-2dec6b633769"
},
"message": "[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server",
"source_type": "kubernetes_logs",
"stream": "stdout",
"timestamp": "2025-04-22T14:13:27.503695712Z"
}
{
"file": "/var/log/pods/keycloak_keycloak-0_4b293248-6e0e-46d3-ab3d-75dfc0d379e8/keycloak/0.log",
"kubernetes": {
"container_id": "containerd://364e92023c3c31ae10497683cfe66ef91c45e5f559dafeedb1f5ad702c3d1d7e",
"container_image": "docker.io/bitnami/keycloak:26.2.0-debian-12-r2",
"container_image_id": "docker.io/bitnami/keycloak@sha256:eb39d4ec77208b724167d183a89c37612edd8efb3e6c0395ad5abb608d52362b",
"container_name": "keycloak",
"namespace_labels": {
"kubernetes.io/metadata.name": "keycloak",
"name": "keycloak"
},
"node_labels": {
"beta.kubernetes.io/arch": "arm64",
"beta.kubernetes.io/instance-type": "k3s",
"beta.kubernetes.io/os": "linux",
"kubernetes.io/arch": "arm64",
"kubernetes.io/hostname": "k3d-keycloak-cluster-server-0",
"kubernetes.io/os": "linux",
"node-role.kubernetes.io/control-plane": "true",
"node-role.kubernetes.io/master": "true",
"node.kubernetes.io/instance-type": "k3s"
},
"pod_annotations": {
"checksum/configmap-env-vars": "f0fa0260c40367946f68087eb6c2ea8768ce554818b5e2475e10d14a1ba47240",
"checksum/secrets": "83224322b8ddbe1a5a3114b90e493f7b2522a29956749247581ea63241a2abfb"
},
"pod_ip": "10.42.0.7",
"pod_ips": [
"10.42.0.7"
],
"pod_labels": {
"app.kubernetes.io/app-version": "26.2.0",
"app.kubernetes.io/component": "keycloak",
"app.kubernetes.io/instance": "keycloak",
"app.kubernetes.io/managed-by": "Helm",
"app.kubernetes.io/name": "keycloak",
"app.kubernetes.io/version": "26.2.0",
"apps.kubernetes.io/pod-index": "0",
"controller-revision-hash": "keycloak-654c49d649",
"helm.sh/chart": "keycloak-24.5.7",
"statefulset.kubernetes.io/pod-name": "keycloak-0"
},
"pod_name": "keycloak-0",
"pod_namespace": "keycloak",
"pod_node_name": "k3d-keycloak-cluster-server-0",
"pod_owner": "StatefulSet/keycloak",
"pod_uid": "4b293248-6e0e-46d3-ab3d-75dfc0d379e8"
},
"message": "{\"timestamp\":\"2025-04-22T14:11:38.202612884Z\",\"sequence\":57936,\"loggerClassName\":\"org.jboss.logging.Logger\",\"loggerName\":\"org.keycloak.transaction.JtaTransactionWrapper\",\"level\":\"DEBUG\",\"message\":\"JtaTransactionWrapper end. Request Context: HTTP GET /realms/master\",\"threadName\":\"executor-thread-1\",\"threadId\":24,\"mdc\":{},\"ndc\":\"\",\"hostName\":\"keycloak-0\",\"processName\":\"/opt/bitnami/java/bin/java\",\"processId\":1}",
"source_type": "kubernetes_logs",
"stream": "stdout",
"timestamp": "2025-04-22T14:11:38.202753467Z"
}
- As you can now see we get much more data in our logs, but some of this data is not useful to us and we are getting logs back from all pods.
- Lets filter this out and get just keycloak logs.
- We will add a filter to grab logs from only the keycloak pod based on labels.
cat <<-'VALUES' > values.yaml
role: Agent
customConfig:
data_dir: "vector-console-data/"
sources:
pod_logs:
type: "kubernetes_logs"
transforms:
keycloak_logs:
type: "filter"
inputs:
- pod_logs
condition: '.kubernetes.pod_name == "keycloak-0"'
sinks:
standard_out:
type: "console"
inputs:
- keycloak_logs
encoding:
codec: "json"
extraVolumeMounts:
- name: vector-console-data
mountPath: /vector-console-data
extraVolumes:
- name: vector-console-data
emptyDir: {}
VALUES
helm upgrade --install vector vector/vector -n vector --create-namespace --values values.yaml
kubectl rollout restart daemonset vector -n vector
- We can now see that we are getting logs from the keycloak pod. This can take a second for vector to come back up.
kubectl logs -n vector --selector=app.kubernetes.io/name=vector | jq .
{
"file": "/var/log/pods/keycloak_keycloak-0_4b293248-6e0e-46d3-ab3d-75dfc0d379e8/keycloak/0.log",
"kubernetes": {
"container_id": "containerd://364e92023c3c31ae10497683cfe66ef91c45e5f559dafeedb1f5ad702c3d1d7e",
"container_image": "docker.io/bitnami/keycloak:26.2.0-debian-12-r2",
"container_image_id": "docker.io/bitnami/keycloak@sha256:eb39d4ec77208b724167d183a89c37612edd8efb3e6c0395ad5abb608d52362b",
"container_name": "keycloak",
"namespace_labels": {
"kubernetes.io/metadata.name": "keycloak",
"name": "keycloak"
},
"node_labels": {
"beta.kubernetes.io/arch": "arm64",
"beta.kubernetes.io/instance-type": "k3s",
"beta.kubernetes.io/os": "linux",
"kubernetes.io/arch": "arm64",
"kubernetes.io/hostname": "k3d-keycloak-cluster-server-0",
"kubernetes.io/os": "linux",
"node-role.kubernetes.io/control-plane": "true",
"node-role.kubernetes.io/master": "true",
"node.kubernetes.io/instance-type": "k3s"
},
"pod_annotations": {
"checksum/configmap-env-vars": "f0fa0260c40367946f68087eb6c2ea8768ce554818b5e2475e10d14a1ba47240",
"checksum/secrets": "83224322b8ddbe1a5a3114b90e493f7b2522a29956749247581ea63241a2abfb"
},
"pod_ip": "10.42.0.7",
"pod_ips": [
"10.42.0.7"
],
"pod_labels": {
"app.kubernetes.io/app-version": "26.2.0",
"app.kubernetes.io/component": "keycloak",
"app.kubernetes.io/instance": "keycloak",
"app.kubernetes.io/managed-by": "Helm",
"app.kubernetes.io/name": "keycloak",
"app.kubernetes.io/version": "26.2.0",
"apps.kubernetes.io/pod-index": "0",
"controller-revision-hash": "keycloak-654c49d649",
"helm.sh/chart": "keycloak-24.5.7",
"statefulset.kubernetes.io/pod-name": "keycloak-0"
},
"pod_name": "keycloak-0",
"pod_namespace": "keycloak",
"pod_node_name": "k3d-keycloak-cluster-server-0",
"pod_owner": "StatefulSet/keycloak",
"pod_uid": "4b293248-6e0e-46d3-ab3d-75dfc0d379e8"
},
"message": "{\"timestamp\":\"2025-04-22T14:36:39.317766759Z\",\"sequence\":59036,\"loggerClassName\":\"org.slf4j.impl.Slf4jLogger\",\"loggerName\":\"org.jgroups.protocols.dns.DNS_PING\",\"level\":\"DEBUG\",\"message\":\"keycloak-0-28856: sending discovery requests to hosts [10.42.0.7:0] on ports [7800 .. 7810]\",\"threadName\":\"jgroups-9,keycloak-0-28856\",\"threadId\":169,\"mdc\":{},\"ndc\":\"\",\"hostName\":\"keycloak-0\",\"processName\":\"/opt/bitnami/java/bin/java\",\"processId\":1}",
"source_type": "kubernetes_logs",
"stream": "stdout",
"timestamp": "2025-04-22T14:36:39.317930801Z"
}
- We can see that we still have a lot of data bloat and have to filter out a lot of noise.
- We can filter logs with vector remap language
cat <<-'VALUES' > values.yaml
role: Agent
customConfig:
data_dir: "vector-console-data/"
sources:
pod_logs:
type: "kubernetes_logs"
transforms:
keycloak_logs:
type: "filter"
inputs:
- pod_logs
condition: '.kubernetes.pod_name == "keycloak-0"'
keycloak_logs_filtered:
type: remap
inputs:
- keycloak_logs
source: |
.pod_ip = .kubernetes.pod_ip
.node_name = .kubernetes.node_labels."kubernetes.io/hostname"
.message = parse_json!(.message)
sinks:
standard_out:
type: "console"
inputs:
- keycloak_logs_filtered
encoding:
codec: "json"
only_fields:
- message
- pod_ip
- node_name
extraVolumeMounts:
- name: vector-console-data
mountPath: /vector-console-data
extraVolumes:
- name: vector-console-data
emptyDir: {}
VALUES
helm upgrade --install vector vector/vector -n vector --create-namespace --values values.yaml
kubectl rollout restart daemonset vector -n vector
- As you can see we have transformed the naming of our log field and only have the message, pod ip, and node name.
{
"message": {
"hostName": "keycloak-0",
"level": "DEBUG",
"loggerClassName": "org.jboss.logging.Logger",
"loggerName": "org.keycloak.transaction.JtaTransactionWrapper",
"mdc": {},
"message": "JtaTransactionWrapper commit. Request Context: HTTP GET /realms/master",
"ndc": "",
"processId": 1,
"processName": "/opt/bitnami/java/bin/java",
"sequence": 59680,
"threadId": 24,
"threadName": "executor-thread-1",
"timestamp": "2025-04-22T14:52:58.218502337Z"
},
"node_name": "k3d-keycloak-cluster-server-0",
"pod_ip": "10.42.0.7"
}
{
"message": {
"hostName": "keycloak-0",
"level": "DEBUG",
"loggerClassName": "org.jboss.logging.Logger",
"loggerName": "org.keycloak.transaction.JtaTransactionWrapper",
"mdc": {},
"message": "JtaTransactionWrapper end. Request Context: HTTP GET /realms/master",
"ndc": "",
"processId": 1,
"processName": "/opt/bitnami/java/bin/java",
"sequence": 59681,
"threadId": 24,
"threadName": "executor-thread-1",
"timestamp": "2025-04-22T14:52:58.218646421Z"
},
"node_name": "k3d-keycloak-cluster-server-0",
"pod_ip": "10.42.0.7"
}
- We can now use this same method to standarize logs across various applications throughout your orgs.
Cleaning Up#
k3d cluster delete keycloak-cluster
Conclusion#
We have seen how to use Vector to transform logs for better observability and operational efficiency. Vector is a very powerfull tool and can be extended way beyond what we have seen here to completely transform the way your organization handles data.
Whats Next?#
I will be adding more content to this series such as getting vector data into loki and visible with grafana. Check back soon for more content.
- Learn more about Vector
- Learn more about Kubernetes
- Learn more about Helm

