Volga Topics
The system offers a set of built-in volga topics that applications can consume.
Events
An item on an event topic is an event record. It's a JSON object with the following structure:
Definition of events sent on volga streams to tenants.
Name | Type | Description |
---|---|---|
event | string | The name of the specific event. |
occurred-at | date-and-time | Time when the event occurred. |
id | string | Unique identifier of the event instance. |
tenant | string | The event occurred in the context of this tenant. |
data | JSON Object | Event-specific data. |
Alerts
An item on an alert topic is an alert record. It's a JSON object with the following structure:
Definition of alerts sent on volga streams to tenants.
Name | Type | Description |
---|---|---|
alert | string | The name of the specific alert. |
time | date-and-time | The time the alert was generated. |
id | string | The site unique id of the alert. |
site | name | The name of the site where the alert was generated. |
severity | enumeration
| The severity of the alert.
|
description | string | The description of the reason behind the alert. |
expiry-time | date-and-time | After this date and time this alert should be considered expired. |
cleared | boolean | Is true if the alert has been cleared, otherwise false. |
data | JSON Object | Alert-specific data. |
Threshold alerts
For threshold alerts like: container-layer-threshold-reached
, ephemeral-volume-threshold-reached
, persistent-volume-threshold-reached
and disk-threshold-reached
the levels are:
- CRITICAL: 100%
- MAJOR: 90%
- WARNING: 80%
Tenant specific events
An item on a tenant specific events topic is an event record. It's a JSON object described in the Events section.
Topic: system:scheduler-events
Available on every site.
This topic contains scheduler related events on the local site.
Event: application-status-changed
This event is generated when the oper-status
of an application
changes. The oper-status
of an application is an aggregated
status of all service instances in the application.
In some cases, this event may be generated even if the oper-status
hasn't been changed. Clients need to be prepared for that.
Event-specific data:
Name | Type | Description |
---|---|---|
application | name | The name of the application. |
oper-status | enumeration
| The oper-status of the application.
|
application-version | string | The version of the application. |
application-deployment | string | The name of the application deployment to which the application belongs. |
site | name | The name of the site where the application is running. |
Event: container-completed
This event is generated when an init container has completed successfully.
Event-specific data:
Name | Type | Description |
---|---|---|
application | name | The name of the application. |
application-version | name | The version of the application. |
service-name | name | The name of the service. |
service-instance | name | The name of the service instance. |
container-name | name | The name of the container. |
container-id | string | The internal id of the container. |
container-image | name | The name of the image that the container is running. |
site | name | The name of the site where the event was generated. |
hostname | name | The name of the host where the event was generated. |
Event: container-failed
This event is generated when the system detects that a container has failed; either because it exited, or because its startup or liveness probe failed.
Event-specific data:
Name | Type | Description |
---|---|---|
application | name | The name of the application. |
application-version | name | The version of the application. |
service-name | name | The name of the service. |
service-instance | name | The name of the service instance. |
container-name | name | The name of the container. |
container-id | string | The internal id of the container. |
container-image | name | The name of the image that the container is running. |
site | name | The name of the site where the event was generated. |
hostname | name | The name of the host where the event was generated. |
reason | enumeration
|
|
restart-after | duration | A duration in years, days, hours, minutes and seconds. Format is [<digits>y][<digits>d][<digits>m][<digits>s] .Examples: 1y2d5h , 5h or 10m30s Time until the container is restarted. |
Event: container-ready
This event is generated when a container's readiness probe is successful. If no readiness probe is configured, this event is generated as soon as the container has started.
Event-specific data:
Name | Type | Description |
---|---|---|
application | name | The name of the application. |
application-version | name | The version of the application. |
service-name | name | The name of the service. |
service-instance | name | The name of the service instance. |
container-name | name | The name of the container. |
container-id | string | The internal id of the container. |
container-image | name | The name of the image that the container is running. |
site | name | The name of the site where the event was generated. |
hostname | name | The name of the host where the event was generated. |
Event: container-starting
This event is generated when the container has been created, and just before it is started.
Event-specific data:
Name | Type | Description |
---|---|---|
application | name | The name of the application. |
application-version | name | The version of the application. |
service-name | name | The name of the service. |
service-instance | name | The name of the service instance. |
container-name | name | The name of the container. |
container-id | string | The internal id of the container. |
container-image | name | The name of the image that the container is running. |
site | name | The name of the site where the event was generated. |
hostname | name | The name of the host where the event was generated. |
Event: container-stopped
This event is generated when the system has stopped the container for any reason.
Event-specific data:
Name | Type | Description |
---|---|---|
application | name | The name of the application. |
application-version | name | The version of the application. |
service-name | name | The name of the service. |
service-instance | name | The name of the service instance. |
container-name | name | The name of the container. |
container-id | string | The internal id of the container. |
container-image | name | The name of the image that the container is running. |
site | name | The name of the site where the event was generated. |
hostname | name | The name of the host where the event was generated. |
Event: container-unready
This event is generated when a container's readiness probe fails.
Event-specific data:
Name | Type | Description |
---|---|---|
application | name | The name of the application. |
application-version | name | The version of the application. |
service-name | name | The name of the service. |
service-instance | name | The name of the service instance. |
container-name | name | The name of the container. |
container-id | string | The internal id of the container. |
container-image | name | The name of the image that the container is running. |
site | name | The name of the site where the event was generated. |
hostname | name | The name of the host where the event was generated. |
Event: service-instance-creation-failed
This event is generated when no service instances could be created for an application.
Event-specific data:
Name | Type | Description |
---|---|---|
application | name | The name of the application. |
application-version | string | The version of the application. |
service-name | name | The name of the service. |
site | name | The name of the site where the event was generated. |
error-message | string | Additional information about the failure to create service instances. |
Event: service-instance-failed
This event is generated when some container in a service failed, or when the service instance could not be scheduled or started.
Event-specific data:
Name | Type | Description |
---|---|---|
application | name | The name of the application. |
application-version | string | The version of the application. |
service-name | name | The name of the service. |
service-instance | name | The name of the service instance. |
site | name | The name of the site where the event was generated. |
hostname | name | The name of the host where the event was generated. |
application-ips | array of ip-address | The IP addresses assigned to this service instance on the application network. |
ingress-ips | array of ip-address | Ingress IP addresses assigned to this service instance. |
container-name | name | The name of the failing container. |
reason | enumeration
|
|
Event: service-instance-ready
This event is generated when all containers in a service are ready, i.e., their readiness probes succeeded.
Event-specific data:
Name | Type | Description |
---|---|---|
application | name | The name of the application. |
application-version | string | The version of the application. |
service-name | name | The name of the service. |
service-instance | name | The name of the service instance. |
site | name | The name of the site where the event was generated. |
hostname | name | The name of the host where the event was generated. |
application-ips | array of ip-address | The IP addresses assigned to this service instance on the application network. |
ingress-ips | array of ip-address | Ingress IP addresses assigned to this service instance. |
Event: service-instance-starting
This event is generated just before the containers in a service instance are started.
Event-specific data:
Name | Type | Description |
---|---|---|
application | name | The name of the application. |
application-version | string | The version of the application. |
service-name | name | The name of the service. |
service-instance | name | The name of the service instance. |
site | name | The name of the site where the event was generated. |
hostname | name | The name of the host where the event was generated. |
application-ips | array of ip-address | The IP addresses assigned to this service instance on the application network. |
ingress-ips | array of ip-address | Ingress IP addresses assigned to this service instance. |
Event: service-instance-stopped
This event is generated when all containers in a service instance have been stopped.
Event-specific data:
Name | Type | Description |
---|---|---|
application | name | The name of the application. |
application-version | string | The version of the application. |
service-name | name | The name of the service. |
service-instance | name | The name of the service instance. |
site | name | The name of the site where the event was generated. |
hostname | name | The name of the host where the event was generated. |
application-ips | array of ip-address | The IP addresses assigned to this service instance on the application network. |
ingress-ips | array of ip-address | Ingress IP addresses assigned to this service instance. |
reason | enumeration
|
|
Event: service-instance-unready
This event is generated when one or more containers in a service are unready; i.e., their readiness probes failed.
Event-specific data:
Name | Type | Description |
---|---|---|
application | name | The name of the application. |
application-version | string | The version of the application. |
service-name | name | The name of the service. |
service-instance | name | The name of the service instance. |
site | name | The name of the site where the event was generated. |
hostname | name | The name of the host where the event was generated. |
application-ips | array of ip-address | The IP addresses assigned to this service instance on the application network. |
ingress-ips | array of ip-address | Ingress IP addresses assigned to this service instance. |
Event: service-instance-updated
This event is generated when a service instance's specification has been updated.
Event-specific data:
Name | Type | Description |
---|---|---|
application | name | The name of the application. |
application-version | string | The version of the application. |
service-name | name | The name of the service. |
service-instance | name | The name of the service instance. |
site | name | The name of the site where the event was generated. |
hostname | name | The name of the host where the event was generated. |
application-ips | array of ip-address | The IP addresses assigned to this service instance on the application network. |
ingress-ips | array of ip-address | Ingress IP addresses assigned to this service instance. |
Topic: system:deployment-events
Available only on the top site.
This topic contains application deployment related events.
Event: application-summary-status-changed
Sent when a site has reported that an application deployed to the
site has changed its oper-status
.
Event-specific data:
Name | Type | Description |
---|---|---|
application-deployment | name | The name of the application deployment. |
application | name | The name of the application. |
application-version | string | The version of the application. |
is-canary | empty | Present if the event relates to deployment to a canary site. |
sites | array of name | The affected sites. |
oper-status | enumeration
| This is an aggregated status of the application.
|
error-message | string | Gives additional information when oper-status is error . |
Event: deployment-failed
Sent when a set of sites have reported that the application has failed to be deployed at the sites.
These sites are moved to the deploy-failed
list in the
application deployment's state.
Event-specific data:
Name | Type | Description |
---|---|---|
application-deployment | name | The name of the application deployment. |
application | name | The name of the application. |
application-version | string | The version of the application. |
is-canary | empty | Present if the event relates to deployment to a canary site. |
sites | array of name | The affected sites. |
error-message | string | Error message describing the failure. |
Event: deployment-initiated
Sent when a set of sites have been instructed to deploy a specific version of an application.
These sites are moved to the deploying-to
list in the
application deployment's state.
Event-specific data:
Name | Type | Description |
---|---|---|
application-deployment | name | The name of the application deployment. |
application | name | The name of the application. |
application-version | string | The version of the application. |
is-canary | empty | Present if the event relates to deployment to a canary site. |
sites | array of name | The affected sites. |
Event: deployment-sites-removed
Sent when an application no longer is deployed to set of sites.
When this event is sent, the sites have been instructed to remove the application.
Event-specific data:
Name | Type | Description |
---|---|---|
application-deployment | name | The name of the application deployment. |
application | name | The name of the application. |
application-version | string | The version of the application. |
is-canary | empty | Present if the event relates to deployment to a canary site. |
sites | array of name | The affected sites. |
Event: deployment-succeeded
Sent when a set of sites have reported that the application has
been successfully deployed at the sites. Successfully deployed
means that all service instances are in state running
.
These sites are moved to the deployed-to
list in the
application deployment's state.
Event-specific data:
Name | Type | Description |
---|---|---|
application-deployment | name | The name of the application deployment. |
application | name | The name of the application. |
application-version | string | The version of the application. |
is-canary | empty | Present if the event relates to deployment to a canary site. |
sites | array of name | The affected sites. |
Topic: system:config-events
Available only on the top site.
This topic contains configuration changes events.
Event: config-created
Sent when a new config object was created.
Event-specific data:
Name | Type | Description |
---|---|---|
username | string | Identifies the user that initiated the config change. |
object | name | Kind of config object. |
path | string | Identifier of uniqe object. |
Event: config-deleted
Sent when a config object was deleted.
Event-specific data:
Name | Type | Description |
---|---|---|
username | string | Identifies the user that initiated the config change. |
object | name | Kind of config object. |
path | string | Identifier of uniqe object. |
Event: config-updated
Sent when a config object was updated.
Event-specific data:
Name | Type | Description |
---|---|---|
username | string | Identifies the user that initiated the config change. |
object | name | Kind of config object. |
path | string | Identifier of uniqe object. |
Topic: system:events
Available only on the top site.
This topic contains important events from the system.
Event: site-connected
This event is generated at the top site when an edge site connects.
Event-specific data:
Name | Type | Description |
---|---|---|
site | name |
Event: site-disconnected
This event is generated at the top site when the connection to an edge site is lost.
Event-specific data:
Name | Type | Description |
---|---|---|
site | name |
Tenant specific application metrics
Topic: system:application-metrics
Available on every site except on top sites.
This topic contains application related metrics for the tenant.
JSON Object
The system samples metrics related to cpu, memory and network traffic once every 10 seconds. Every 30 seconds, related samples are collected into this object.
Metrics are sampled on three different levels, per container, per service instance or for the whole application on a host.
Name | Type | Description |
---|---|---|
site | name | The name of the site where the metrics originated from. |
tenant | name | The name of the tenant. |
entries | array of JSON Object see metrics-entry JSON Object | A set of samples for this tenant. |
metrics-entry JSON Object
Name | Type | Description |
---|---|---|
time | date-and-time | The time the sample was taken. |
host | string | The host where the sample was taken. |
application | name | The name of the application that was sampled. |
per-container ORper-service ORper-application | JSON Object see per-container-metricsJSON Object see per-service-metricsJSON Object see per-application-metrics |
per-container-metrics JSON Object
Name | Type | Description |
---|---|---|
service-instance | name | The name of the service instance. |
container | name | The name of the container. |
memory | JSON Object see memory-metric JSON Object | Memory metric data. |
cpu | JSON Object see cpu-metric JSON Object | CPU metric data. |
container-layer | JSON Object see disk-metric JSON Object | Container layer storage metric data. This section of metrics is only available if the underlying file system has support for quota . |
memory-metric JSON Object
Name | Type | Description |
---|---|---|
used | uint64 | The memory used by the container in bytes, at the time the sample was taken. |
total | uint64 | The total memory available in bytes for the container. |
percentage-used | percent | A percent value with up to 2 fractional digits. For example 12.77% .Percentage of memory used by the container divided by total |
cpu-metric JSON Object
Name | Type | Description |
---|---|---|
nanoseconds | uint64 | The total number of CPU nanoseconds used by the container. |
cpus | decimal64 | The CPUs limit for the container. I.e. the maximum number of CPUs used by the container. |
shares | uint16 | The CPU shares limit for the container. |
percentage-used | percent | A percent value with up to 2 fractional digits. For example 12.77% .Percentage of CPU used in relation to limits. |
disk-metric JSON Object
Name | Type | Description |
---|---|---|
size | uint64 | Total number of 1K-blocks where K is 1024 bytes. |
used | gauge64 | Number of used 1K-blocks where K is 1024 bytes. |
free | gauge64 | Number of available (free) 1K-blocks where K is 1024 bytes. |
percentage-used | percent | A percent value with up to 2 fractional digits. For example 12.77% .Percentage of used divided by size. |
per-service-metrics JSON Object
Name | Type | Description |
---|---|---|
service-instance | name | The name of the service instance. |
ephemeral-volumes | array of JSON Object see disk-volume-metric JSON Object | Ephemeral volume storage parameters. Ephemeral volume parameters are only available for volumes where the the underlying file system has support for quota . |
persistent-volumes | array of JSON Object see disk-volume-metric JSON Object | Persistent volume storage parameters. Persistent volume parameters are only available for volumes where the the underlying file system has support for quota . |
disk-volume-metric JSON Object
Name | Type | Description |
---|---|---|
size | uint64 | Total number of 1K-blocks where K is 1024 bytes. |
used | gauge64 | Number of used 1K-blocks where K is 1024 bytes. |
free | gauge64 | Number of available (free) 1K-blocks where K is 1024 bytes. |
percentage-used | percent | A percent value with up to 2 fractional digits. For example 12.77% .Percentage of used divided by size. |
volume-name | string |
per-application-metrics JSON Object
Per-application metrics, aggregated for all service instances that are part of the same application.
Name | Type | Description |
---|---|---|
gateway-network | JSON Object see gateway-network-metrics JSON Object | |
hosts | array of JSON Object see host-metric JSON Object | Resource metrics aggregated per application per host. |
gateway-network-metrics JSON Object
Name | Type | Description |
---|---|---|
tx-packets | uint64 | Transmitted external traffic on the gateway network, i.e., traffic leaving the host. Note that transmitted traffic is counted before bandwidth restrictions are applied. This is usually not a problem for protocols with flow-control, but in case of an application sending uncontrollably large amount of traffic with the destination outside the host the traffic is included here. |
tx-bytes | uint64 | Transmitted external traffic on the gateway network, i.e., traffic leaving the host. The value represents frame payload, Layer 2 header not included. Note that transmitted traffic is counted before bandwidth restrictions are applied. This is usually not a problem for protocols with flow-control, but in case of an application sending uncontrollably large amount of traffic with the destination outside the host the traffic is included here. |
rx-packets | uint64 | Received external traffic on the gateway network, i.e., traffic originated outside the host. Only the traffic allowed by the firewall is included, i.e. traffic on the open ingress ports and traffic on established connections. Note that received traffic is counted before bandwidth restrictions are applied. This is usually not a problem for protocols with flow-control, but in case of an external party sending uncontrollably large amount of traffic to the application the traffic is included here. |
rx-bytes | uint64 | Received external traffic on the gateway network, i.e., traffic originated outside the host. The value represents frame payload, Layer 2 header not included. Only the traffic allowed by the firewall is included, i.e. traffic on the open ingress ports and traffic on established connections. Note that received traffic is counted before bandwidth restrictions are applied. This is usually not a problem for protocols with flow-control, but in case of an external party sending uncontrollably large amount of traffic to the application the traffic is included here. |
tx-packets-per-second | uint64 | Intensity of transmitted external traffic in packets per second. This value is the average intensity over the interval between the last two reported samples. Hence it is not reported when there is not enough recent samples available. |
tx-bytes-per-second | uint64 | Intensity of transmitted external traffic in bytes per second. This value is the average intensity over the interval between the last two reported samples. Hence it is not reported when there is not enough recent samples available. |
rx-packets-per-second | uint64 | Intensity of received external traffic in packets per second. This value is the average intensity over the interval between the last two reported samples. Hence it is not reported when there is not enough recent samples available. |
rx-bytes-per-second | uint64 | Intensity of received external traffic in bytes per second. This value is the average intensity over the interval between the last two reported samples. Hence it is not reported when there is not enough recent samples available. |
upstream-bandwidth-utilization | decimal64 | If an upstream-bandwidth-per-host is configured for this application,then this value indicates the fraction of the available bandwidth used by the application. This value is based on the tx-bytes-per-second metric. |
downstream-bandwidth-utilization | decimal64 | If an downstream-bandwidth-per-host is configured for thisapplication, then this value indicates the fraction of the available bandwidth used by the application. This value is based on the rx-bytes-per-second metric. |
host-metric JSON Object
Name | Type | Description |
---|---|---|
host | string | The application metrics are from this host. |
memory-percentage-used | percent | A percent value with up to 2 fractional digits. For example 12.77% .Percentage of aggregated memory used by the application on this host in relation to total available memory. |
cpu-percentage-used | percent | A percent value with up to 2 fractional digits. For example 12.77% .Percentage of maximum CPU used among all containers for the application on this host in relation to total available CPUs. |
disk-percentage-used | percent | A percent value with up to 2 fractional digits. For example 12.77% .Percentage of maximum disk used among all container layers, ephemeral and persistent volumes with quota support for the application onthis host. This metric is not available if no filesystem, for this application, does support quota . |
Tenant specific audit trail log
The audit trail log records all operations performed by a tenant. The log includes the access token, if provided, the operation, and any parameters provided.
In order to protect sensitive data, e.g., tokens and secrets, all
such data is hashed using a tenant-specific HMAC before being
logged. To search for some specific sensitive data in the logs, e.g.,
operations performed using a specific access token, the plain text
version of the data can be hashed using the strongbox/audit/hmac
operation.
Audit trail logs are streamed upwards from edge sites to the top site. This allows inspection of audit logs even if a site is compromised.
Topic: system:audit-trail-log
Available on every site.
This log contains all authenticated requests for the tenant.
JSON Object
Name | Type | Description |
---|---|---|
occurred-at | date-and-time | Time when access occurred. |
request-time-ms | gauge64 | Time to process request in milliseconds. |
user | string | User or Approle that performed the access. |
path | string | Path that was accessed |
query | string | Any supplied URL query parameters. |
method | string | HTTP method that was used. |
status | uint32 | HTTP response status. |
status-info | string | Text representation of status. |
request-parameters | JSON Object | The parameters included in the request, if any, ie, request body. |
client-ip | string | Address from which the client accessed the host. |
x-forwarded-for | string | If the request went through a load balance the clients real ip may appear as x-forwarded-for. It may be a list of addresses where the first address should be the clients original address. |
x-real-ip | string | If the request went through a load balance the clients real ip may appear as x-real-ip. It may be a list of addresses where the first address should be the clients original address. |
site | string | Site where the access occurred. |
host | string | Host on which the access occurred. |
tenant | name | Tenant that performed the access. |
token | string | Hashed representation of the access token. It can be used when identifying a specific session, and all accesses/operations using the same token. |
accessor | string | Token accessor that can be used to terminate an ongoing session. |
from | string | The 'from' request header from the originating HTTP request. |
user-agent | string | User-agent that performed request, eg, curl, firefox etc. |
Tenant specific logs
An item on a log topic is a string.
Topic: system:container-logs:CONTAINER-ID
Available on the host where the container is running.
There is one topic per running container. It contains all output from standard output and standard error from the container.
The CONTAINER-ID
is a string on the format
APPLICATIONNAME.SERVICENAME-IX.CONTAINERNAME
, where IX is a numeric
index, one per service instance (replica).
The default behavior, if a container is rescheduled to another host, is to
delete the container log topic for the previous container. However, if
container-log-archive
is set to true
in the application specification, the
container log will instead be appended with a timestamp and become read only
in this case.
An archived container log will continue to exist on the original host for
container-log-archive-days
.
Topic: system:logs
Available on every site.
This topic contains tenant-specific log info generated by the avassa system. Each log item is on the format
<LEVEL> (TENANT) DATE TIME HOSTNAME SRCFILE PID
Where LEVEL
is one of EMERGENCY
, ALERT
, CRITICAL
, ERROR
,
WARNING
, NOTICE
, INFO
, DEBUG
.
Tenant specific alerts
An alert item for a tenant specific topic is an alert record, which are JSON objects described in the Alerts section.
Topic: system:alerts
Available on every site.
This topic contains important alerts from the system.
Alert: application-error
Alert signaling when an application ends up in an erroneous
oper-status
.
Alert-specific data:
Name | Type | Description |
---|---|---|
application | name | The name of the application. |
application-version | string | The version of the application. Note that this is the application version triggering the change of the application oper-state . |
application-deployment | string | The name of the application deployment to which the application belongs. |
error-message | string | Additional information about the application failure. |
Alert: container-layer-threshold-reached
Alert signaling that a container-layer reached an alerting threshold.
Alert-specific data:
Name | Type | Description |
---|---|---|
hostname | name | The hostname of the host where the alert arose. |
application | name | The name of the application. |
service-instance | name | The name of the service instance. |
container | name | The name of the container. |
size | uint64 | Total number of 1K-blocks where K is 1024 bytes. |
used | gauge64 | Number of used 1K-blocks where K is 1024 bytes. |
free | gauge64 | Number of available (free) 1K-blocks where K is 1024 bytes. |
percentage-used | percent | A percent value with up to 2 fractional digits. For example 12.77% .Percentage of used divided by size. |
Alert: custom-alert
A custom alert issued and controlled by the tenant.
Alert-specific data:
Name | Type | Description |
---|---|---|
custom-id | string | The custom identifier of the alert. This could be a unique identifier of the entity effected by the fault causing the alert. As an example it could be a unique identifier of the failing service instance (replica) of an application on a site. |
custom-name | string | The custom name of the specific alert. This should be a unique name describing the alert. |
application | name | The name of the application. This attribute will only be available if the alert or clear operation has been executed from within the application container using an approle . |
service-instance | name | The name of the service instance id. This attribute will only be available if the alert or clear operation has been executed from within the application container using an approle . |
Alert: ephemeral-volume-threshold-reached
Alert signaling that a ephemeral volume reached an alerting threshold.
Alert-specific data:
Name | Type | Description |
---|---|---|
hostname | name | The hostname of the host where the alert arose. |
application | name | The name of the application. |
service-instance | name | The name of the service instance. |
size | uint64 | Total number of 1K-blocks where K is 1024 bytes. |
used | gauge64 | Number of used 1K-blocks where K is 1024 bytes. |
free | gauge64 | Number of available (free) 1K-blocks where K is 1024 bytes. |
percentage-used | percent | A percent value with up to 2 fractional digits. For example 12.77% .Percentage of used divided by size. |
volume-name | string | The ephemeral volume name. |
Alert: invalid-auto-cert-configuration
An auto-cert has been configured with a TTL that exceeds either the renew-threshold or the activate-threshold of the CA certificate.
Alert-specific data:
Name | Type | Description |
---|---|---|
secret-name | name | Name of secret with invalid auto-cert configuration. |
ca-name | name | Name of CA used to generate certificate. |
Alert: persistent-volume-threshold-reached
Alert signaling that a persistent volume reached an alerting threshold.
Alert-specific data:
Name | Type | Description |
---|---|---|
hostname | string | The hostname of the host where the alert arose. |
application | name | The name of the application. |
service-instance | name | The name of the service instance. |
size | uint64 | Total number of 1K-blocks where K is 1024 bytes. |
used | gauge64 | Number of used 1K-blocks where K is 1024 bytes. |
free | gauge64 | Number of available (free) 1K-blocks where K is 1024 bytes. |
percentage-used | percent | A percent value with up to 2 fractional digits. For example 12.77% .Percentage of used divided by size. |
volume-name | string | The persistent volume name. |
Alert: unwrap-failure
Issued when multiple unwraps of a secret is attempted. It may be an indication of a security breach.
Alert-specific data:
Name | Type | Description |
---|---|---|
id | name | Id of the token subject to multiple unwraps. |
meta | string | |
peer-ip | ip-address | Peer IP of this unwrap attempt. |
successful-peer-ip | ip-address | Peer IP of successful unwrap (first attempted unwrap) |
successful-time | date-and-time | Time of successful unwrap. |
Topic: system:notifications - DEPRECATED
This topic is deprecated and will be removed in a future release. Use system:alerts instead.
Alert: application-error
Alert signaling when an application ends up in an erroneous
oper-status
.
Alert-specific data:
Name | Type | Description |
---|---|---|
application | name | The name of the application. |
application-version | string | The version of the application. Note that this is the application version triggering the change of the application oper-state . |
application-deployment | string | The name of the application deployment to which the application belongs. |
error-message | string | Additional information about the application failure. |
Alert: container-layer-threshold-reached
Alert signaling that a container-layer reached an alerting threshold.
Alert-specific data:
Name | Type | Description |
---|---|---|
hostname | name | The hostname of the host where the alert arose. |
application | name | The name of the application. |
service-instance | name | The name of the service instance. |
container | name | The name of the container. |
size | uint64 | Total number of 1K-blocks where K is 1024 bytes. |
used | gauge64 | Number of used 1K-blocks where K is 1024 bytes. |
free | gauge64 | Number of available (free) 1K-blocks where K is 1024 bytes. |
percentage-used | percent | A percent value with up to 2 fractional digits. For example 12.77% .Percentage of used divided by size. |
Alert: ephemeral-volume-threshold-reached
Alert signaling that a ephemeral volume reached an alerting threshold.
Alert-specific data:
Name | Type | Description |
---|---|---|
hostname | name | The hostname of the host where the alert arose. |
application | name | The name of the application. |
service-instance | name | The name of the service instance. |
size | uint64 | Total number of 1K-blocks where K is 1024 bytes. |
used | gauge64 | Number of used 1K-blocks where K is 1024 bytes. |
free | gauge64 | Number of available (free) 1K-blocks where K is 1024 bytes. |
percentage-used | percent | A percent value with up to 2 fractional digits. For example 12.77% .Percentage of used divided by size. |
volume-name | string | The ephemeral volume name. |
Alert: persistent-volume-threshold-reached
Alert signaling that a persistent volume reached an alerting threshold.
Alert-specific data:
Name | Type | Description |
---|---|---|
hostname | string | The hostname of the host where the alert arose. |
application | name | The name of the application. |
service-instance | name | The name of the service instance. |
size | uint64 | Total number of 1K-blocks where K is 1024 bytes. |
used | gauge64 | Number of used 1K-blocks where K is 1024 bytes. |
free | gauge64 | Number of available (free) 1K-blocks where K is 1024 bytes. |
percentage-used | percent | A percent value with up to 2 fractional digits. For example 12.77% .Percentage of used divided by size. |
volume-name | string | The persistent volume name. |
Alert: unwrap-failure
Issued when multiple unwraps of a secret is attempted. It may be an indication of a security breach.
Alert-specific data:
Name | Type | Description |
---|---|---|
id | name | Id of the token subject to multiple unwraps. |
meta | string | |
peer-ip | ip-address | Peer IP of this unwrap attempt. |
successful-peer-ip | ip-address | Peer IP of successful unwrap (first attempted unwrap) |
successful-time | date-and-time | Time of successful unwrap. |
Site provider specific events
An item on a topic readable by site providers is an event record. It's a JSON object described in the Events section.
Topic: system:all-scheduler-events
Available on every site.
This topic contains the union of all scheduler related events on the local site for all tenants and are readable only by site providers.
The events are the same as for the topic
system:scheduler-events
.
Site provider specific audit trail log
Topic: system:unauthenticated-audit-trail-log
Available only on the top site.
This log is available to the top site provider, and contains all
unauthenticated requests to the system. The format is the same as for
the system:audit-trail-log
.
Site provider specific host metrics
This topic contains host-related metrics.
Topic: system:host-metrics
Available on every site, but only readable by sys
tenant on top sites.
JSON Object
The system samples host metrics related to cpu, memory and disk once every 30 seconds.
Name | Type | Description |
---|---|---|
time | date-and-time | The time the sample was taken. |
site | name | The name of the site where the event was generated. |
cluster-hostname | name | The cluster hostname of the host where the event was generated. |
hostname | domain-name | The hostname of the host where the event was generated. |
cpu | JSON Object see cpu-params JSON Object | |
memory | JSON Object see mem-params JSON Object | |
loadavg | JSON Object see loadavg-params JSON Object | |
disk | array of JSON Object see disk-entry JSON Object | A set of disk usage parameters for file systems used by either the Edge Enforcer (supd) or by any application managed by the Edge Enforcer. All file systems listed are from inside the Edge Enforcer container. Metrics with mount points reported as CONTAINER-ROOT indicates thatit reports the Edge Enforcer containers root file system usage, not the hosts root file system. |
disk-io | array of JSON Object see disk-io-entry JSON Object | A set of disk io metrics for disk partitions used by either the Edge Enforcer (supd) or by any application managed by the Edge Enforcer. |
cpus | array of JSON Object see mpstat-cpu-entry JSON Object | CPU related statistics. Entries can be either aggregated for all CPUs or per CPU. |
cpu-params JSON Object
CPU metrics taken from /proc/cpuinfo.
Name | Type | Description |
---|---|---|
vcpus | uint32 | Total number of CPUs. |
mem-params JSON Object
Memory metrics taken from /proc/meminfo.
Name | Type | Description |
---|---|---|
total | uint64 | Total usable RAM memory in bytes. |
free | uint64 | Free usable RAM memory in bytes. |
available | uint64 | An estimate of how much RAM memory in bytes is available for starting new applications, without swapping. |
used | uint64 | Used RAM memory in bytes, calculated as: (MemTotal + SwapTotal) - (MemFree + SwapFree) - Buffers - (Cached + SReclaimable). |
loadavg-params JSON Object
Average load metrics taken from /proc/loadavg.
Metrics avg1, avg5 and avg15 reflects the avarage load of processes:
- queued for execution
- executed
- sleeping while being uninterruptible, typically waiting for I/O over time periods of 1, 5 and 15 minutes respectively.
Name | Type | Description |
---|---|---|
avg1 | decimal64 | Avarage load of processes last minute. |
avg5 | decimal64 | Avarage load of processes last 5 minutes. |
avg15 | decimal64 | Avarage load of processes last 15 minutes. |
running | gauge64 | Number of currently runnable kernel scheduling entities (processes, threads). |
total | gauge64 | Number of kernel scheduling entities (processes, threads) that currently exist on the system. |
disk-entry JSON Object
Name | Type | Description |
---|---|---|
filesystem | string | The source of the mount point, usually a device. |
type | string | File system type. |
size | uint64 | Total number of 1K-blocks where K is 1024 bytes. |
used | gauge64 | Number of used 1K-blocks where K is 1024 bytes. |
free | gauge64 | Number of available (free) 1K-blocks where K is 1024 bytes. |
percentage-used | percent | A percent value with up to 2 fractional digits. For example 12.77% .Percentage of used divided by size. |
mount | string | The mount point. |
disk-io-entry JSON Object
Name | Type | Description |
---|---|---|
device-name | string | The device name. |
reads-completed | gauge64 | The total number of reads completed successfully. |
sectors-read | gauge64 | The total number of sectors read successfully. |
time-spent-reading | gauge64 | This is the total number of milliseconds spent by all reads. |
writes-completed | gauge64 | The total number of writes completed successfully. |
sectors-written | gauge64 | The total number of sectors written successfully. |
time-spent-writing | gauge64 | This is the total number of milliseconds spent by all writes. |
ios-in-progress | gauge64 | This is the number of I/Os currently in progress. |
time-spent-on-io | gauge64 | This is the total number of milliseconds spent doing I/Os, i.e. when time when ios-in-progress is non-zero |
mpstat-cpu-entry JSON Object
Name | Type | Description |
---|---|---|
cpu | string | CPU number or all . |
usr | decimal64 | Percentage of CPU utilization that occurred while executing at the user level (application). |
nice | decimal64 | Percentage of CPU utilization that occurred while executing at the user level with nice priority. |
sys | decimal64 | Percentage of CPU utilization that occurred while executing at the system level (kernel). Note that this does not include time spent servicing hardware and software interrupts. |
iowait | decimal64 | Percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request. |
irq | decimal64 | Percentage of time spent by the CPU or CPUs to service hardware interrupts. |
soft | decimal64 | Percentage of time spent by the CPU or CPUs to service software interrupts. |
steal | decimal64 | Percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor. |
guest | decimal64 | Percentage of time spent by the CPU or CPUs to run a virtual processor. |
idle | decimal64 | Percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request. |
Site provider specific supd metrics
This topic contains supd-related metrics.
Topic: system:supd-metrics
Available on every site, but only readable by sys
tenant on top sites.
JSON Object
The system samples metrics on supd container to cpu and memory once every 30 seconds.
Name | Type | Description |
---|---|---|
time | date-and-time | The time the sample was taken. |
site | name | The name of the site where the event was generated. |
cluster-hostname | name | The cluster hostname of the host where the event was generated. |
hostname | domain-name | The hostname of the host where the event was generated. |
cpu | JSON Object see supd-cpu-metric JSON Object | CPU metric data. |
memory | JSON Object see supd-memory-metric JSON Object | Memory metric data. |
long-gcs | array of JSON Object see gc-metric JSON Object | Long GC times. |
large-heaps | array of JSON Object see gc-metric JSON Object | Large heap sizes when GC. |
long-schedules | array of JSON Object see schedule-metric JSON Object | Long schedule times. |
busy-ports | array of JSON Object see busy-port JSON Object | Busy ports. |
busy-dist-ports | array of JSON Object see busy-port JSON Object | Busy distribution ports. |
supd-cpu-metric JSON Object
Name | Type | Description |
---|---|---|
nanoseconds | uint64 | The total CPU usage of all tasks in the SUPD container. The value is measured in nano seconds. |
cpus | decimal64 | The CPUs limit for the SUPD container. |
percentage-used | percent | A percent value with up to 2 fractional digits. For example 12.77% .Percentage of CPU used by the SUPD container in relation to limits. |
supd-memory-metric JSON Object
Name | Type | Description |
---|---|---|
used | uint64 | The memory used by the SUPD container in bytes, at the time the sample was taken. |
total | uint64 | The total memory available in bytes for the SUPD container. |
percentage-used | percent | A percent value with up to 2 fractional digits. For example 12.77% .Percentage of memory used by the SUPD container divided by total. |
gc-metric JSON Object
Name | Type | Description |
---|---|---|
pid | string | The suspended erlang process identifier. |
registered-name | string | The suspended erlang process's registered name. |
milliseconds | uint64 | The GC time in milliseconds. |
heap-size | uint64 | The size of the used part of the heap. |
heap-block-size | uint64 | The size of the memory block used for storing the heap and the stack. |
old-heap-size | uint64 | The size of the used part of the old heap. |
old-heap-block-size | uint64 | The size of the memory block used for storing the old heap. |
stack-size | uint64 | The size of the stack. |
mbuf-size | uint64 | The combined size of message buffers associated with the process. |
schedule-metric JSON Object
Name | Type | Description |
---|---|---|
id | string | The erlang process or port identifier. |
registered-name | string | The erlang process's registered name. |
milliseconds | uint64 | The schedule time in milliseconds. |
busy-port JSON Object
Name | Type | Description |
---|---|---|
pid | string | The suspended erlang process identifier. |
registered-name | string | The suspended erlang process's registered name. |
port | string | The busy port identifier. |
Site provider specific alerts
An alert item for a site provider specific topic is an alert record, which are JSON objects described in the Alerts section.
Topic: system:site-alerts
Available on every site.
This topic contains important site-related alerts from the system.
Alert: disk-threshold-reached
Alert signaling that a disk reached an alerting threshold.
Alert-specific data:
Name | Type | Description |
---|---|---|
cluster-hostname | name | The cluster hostname of the host where the alert arose. |
hostname | domain-name | The hostname of the host where the alert arose. |
filesystem | string | The source of the mount point, usually a device. |
type | string | File system type. |
size | uint64 | Total number of 1K-blocks where K is 1024 bytes. |
used | gauge64 | Number of used 1K-blocks where K is 1024 bytes. |
free | gauge64 | Number of available (free) 1K-blocks where K is 1024 bytes. |
percentage-used | percent | A percent value with up to 2 fractional digits. For example 12.77% .Percentage of used divided by size. |
mount | string | The mount point. |
Alert: host-down
Alert signaling that a host is down.
Alert-specific data:
Name | Type | Description |
---|---|---|
cluster-hostname | name | The cluster hostname of the host where the alert arose. |
hostname | domain-name | The hostname of the host where the alert arose. |
Alert: host-in-disaster-mode
Alert signaling that a host ended up in disaster mode.
Alert-specific data:
Name | Type | Description |
---|---|---|
cluster-hostname | name | The cluster hostname of the host where the alert arose. |
hostname | domain-name | The hostname of the host where the alert arose. |
Alert: host-in-distress
Alert signaling that a host ended up in distress.
Alert-specific data:
Name | Type | Description |
---|---|---|
cluster-hostname | name | The cluster hostname of the host where the alert arose. |
hostname | domain-name | The hostname of the host where the alert arose. |
distress | enumeration
| Indicating the level of distress in the host.
|
distress-info | string | Detail information of the distress. |
Alert: no-space-left-on-disk
Alert signaling that a host has no space left on a critical disk.
Alert-specific data:
Name | Type | Description |
---|---|---|
cluster-hostname | name | The cluster hostname of the host where the alert arose. |
hostname | domain-name | The hostname of the host where the alert arose. |
Alert: supd-down
Alert signaling that supd is down.
Alert-specific data:
Name | Type | Description |
---|---|---|
cluster-hostname | name | The cluster hostname of the host where the alert arose. |
hostname | domain-name | The hostname of the host where the alert arose. |