While there are many legacy alarm and notification services that exist in conventional
centralized monitoring systems, quick response to critical issues is often not possible
due to manual processing of these alarms and events. In most cases, network problems
generate local logs or raise error messages to the network management system, where
backend processing or manual analysis are performed on collected data. The lack of
detecting the problems in early stages or of a quick response to critical issues may
cause a network outage or long delays in recovery. This article provides a solution
for Smart Edge monitoring which can be used to build rapid response logic and
improve network reliability.
Edge Computing
Edge monitoring is related to the more generic concept of Edge Computing but
focuses on measuring and monitoring network quality. Edge Computing is relatively
new concept emerging from Cloud Computing which consists of shifting the computing
application, data and services away from centralized nodes towards the edge of the
network. Shifting computing resources near the edge has several advantages,
especially for certain applications that suffer from poor scalability of the centralized
paradigm. Some of Edge Computing advantages are: 1) reduces network latency and
produces faster response time 2) better use of resource, reduces the cost of scalability,
and makes faster data delivery 3) lowering dependency on the corporate data center as
a single point of failure in the infrastructure, hence improving service availability[1].
Enhanced Network Reliability
with Edge Monitoring on
Industrial Routers
Network monitoring applications typically
consist of remote data collection of many
nodes across geo-distributed locations and
performing backend analytic programs.
This type of application can also benefit
from Edge computing, especially from
faster response time and better scalability.
The following sections are dedicated to
exploring Edge monitoring in more
detail and to look into some of the
implementation challenges.
Smart Edge Monitoring
Applying Edge Computing usually requires
some application redesign consideration.
Whether it is a monitoring application or a
manufacturing automation application, we
should decide what part of the application
logic can be pushed to the edge of network
and what part remains at the central
location. This could be a hard decision for
applications requiring heavy analytic power
capacity. For example, we should find out
how much of the analytic computation can
be pushed to the edge of the network,
considering available hardware resources
on the edge devices.
For monitoring applications, redesigning
the application is a somewhat easier task.
Considering the huge amount of data that
could be collected from many devices
in the network, moving part of that
application closer to the source of
information just make sense. On the other
hand there are some demands for allowing
the use of custom logic and third party
applications on edge devices. This comes from
the fact that in every network the end user has
the best knowledge about the system, pitfalls
and maintenance procedures. So they might
be interested in participating in network
monitoring and recovery action plans. This
brings some design challenges that we look
at, in the article.
Network Performance Metrics
• Bandwidth is the maximum rate that
information can be transferred and
measured in bits/second
• Throughput is the actual rate that
information is transferred
• Latency refers to the amount of time
(usually measured in milliseconds) it
takes for data to travel from one location
to another across a network. It is also
referred to as delay, because the
software is often waiting to execute
some function while data travels back
and forth across the network.
• Jitter is defined as a variation in the
delay of received packets. The sending
side transmits packets in a continuous
stream and spaces them evenly apart.
But due to network congestion,
improper queuing, or configuration
errors, the delay between packets can
vary instead of remaining constant
• Error Rate is the number of corrupted
packets expressed as a percentage or
fraction of the total sent
Network Monitoring
Network Monitoring is the process ofmeasuring and analyzing the values of
these performance metrics. NPMs are
measured by various network monitoring
technologies. The common measuring and
monitoring techniques are active, passive
and SNMP based monitoring.
The Active Monitoring method obtains the
current status of the network by setting
up a test machine at the point which
one wishes to measure, and then sending
traffic from one machine to another. NPMs
can be measured simply by using tools such
as “ping” and “traceroute”. In this method
test traffic may impose a burden on the
network.
Passive Monitoring methods obtain
the current status of the network by
capturing live traffic on the network.
Passive Monitoring can monitor the
network without additional traffic burden.
In the SNMP based method, the SNMP
agent running on the device collects
various measurements and makes them
available to the Network Management
System (NMS).
White paper | Enhanced Network Reliability with Edge Monitoring on Industrial Routers | 20 April 2017
2
There are various standard RFCs for remote
network monitoring such as the Remote
Network Monitoring (RMON) MIB[4]. The
SNMP based solution is easy to use and
scales up well with the number of the
nodes in the network.
Industrial routers play a key role in
mission critical systems, whether it
is power system management,
transportation or industrial automation,
the network must be stable and reliable to
run critical applications. Outages and
downtime are NOT an option, and this is
a key requirement of mission-critical
connected systems.
As discussed with conventional
monitoring, network operators use
different techniques to collect the data,
but this is usually done in reaction to a
system error after the fact. Having access
to various data on the edge device,enables
Smart Edge Monitoring to provide a
proactive solution that can prevent, help
troubleshoot and even predict difficult
network failures. The table above
summarizes the pros/cons of each method.
The goal for Smart Edge Monitoring is to
shift part of the analytic logic closer to the
source of information where it has access
to system data.
This type of monitoring can be done
in different modes:
Analytic mode: In this mode, the goal is
to collect and analyze as much related
data as possible in response to an error
condition to help the investigation later,
such as collecting CPU usage, operational
temperature, error key words in logs,
background traffic, etc.
Fault Isolation mode: In this mode, in
addition to above the application will do
a best effort to isolate the critical failure
before it can destabilize the entire network,
like disabling suspected ports, protocols
or software features.
Fault Prediction mode: In this mode, in
addition to above the application will try
to predict a failure and provide proper
warnings and alarms about ongoing
problems, by finding any patterns (i.e. time
related, traffic type, and hardware related)
in fault conditions.
For example in analytic mode, CPU usage
could be the subject of smart monitoring
where the monitoring application tries to
find the underlying issue by analyzing
the system logs and correlating with the
current operation and the task that occupy
the CPU the most. This gives the network
operator the chance to find the root cause
before the system becomes unavailable.
Having access to various data on the edge device, enables Smart
Edge Monitoring to provide a proactive solution that can prevent,
help troubleshoot and even predict difficult network failures.
The table above summarizes the pros/cons of each method.
Monitoring Method Mechanism Pros/Cons
Active Monitoring Generate test traffic periodically or
on-demand and measure the performance.
Backend analytic process.
Not scalable
Passive Monitoring Capture the current traffic and analyze the
performance. Backend analytic process.
Not scalable
SNMP based Monitoring Using existing SNMP agent to
collect measurements and analyze the
performance. Backend analytic process.
Scalable and limited to specific measurements
Smart Edge Monitoring
(proactive monitoring)
Shift some of the monitoring application
on the edge device, collect and analyze
performance and failures
Scalable, more efficient, can prevent
or predict faults
White paper | Enhanced Network Reliability with Edge Monitoring on Industrial Routers | 20 April 2017
3
In other example, a protocol state change
can be the subject of smart monitoring
where the monitoring application tries
to find and isolate the issue related to a
topology state change and flapping
condition. This can be done by receiving
the state change notification from the core
application and checking if it passes a
threshold level. In this case the monitoring
application can isolate the problem by
disabling a port. This action could prevent
the faulty unit from destabilizing the
entire network.
Integrating Edge Monitoring
into Industrial Routers
Using Smart Edge Monitoring, operatorscan capitalize on remote monitoring
applications with ongoing analysis of
system data. With this, system performance
trends are revealed, system failure can
be predicted and prevented in advance of
any alarm sign. However there are some
challenges to integrate such functionality,
especially if we consider a solution where
the end users can also develop and deploy
monitoring application based on their needs
and requirements. This requires a solution
to provide an ecosystem of development
tools, mass deployment and configuration
with end to end security. In this section we
only look into the technology stack needed
for this integration on target device.
The basic requirement of deploying a
custom application on any edge device is
to make sure that it has no negative impact
on device core functionality by providing
proper resource isolation. This could be
isolation for things like CPU, disk storage
and memory. There are well known
technologies in Linux based systems for
this such as Virtual Machine (VM)
or Linux Container (LXC)[5] solutions.
One of the obstacles for any custom or third
party monitoring application is access to
system data. The need for sharing system
data with third party applications is often
not considered in typical products. So some
application redesign is required to provide
proper access to system data and real
time events.
The picture above shows a proposed
integration solution with VM as the base
platform where the custom application is
deployed in its own container. The VM
provides the maximum resource isolation
and the container provides application
packaging. The internal data bus provides
access to system data via an Embedded
Monitoring Agent residing in the core
software. A publisher/subscriber protocol
like Distributed Data Services (DDS)[6] or
similar protocol is a good option to emulate
a data bus where the target application can
communicate with Monitoring Agent.
The DDS protocol provides auto discovery
for publisher and the subscriber nodes,
without a need for any configuration.
The Embedded Monitoring Agent is part
of the core software running on the host
Linux. It provides access to system real time
data by publishing events such as the
following examples. A registered application
can subscribe to these event groups and
receives the events in real time.
The applications can be developed using
any Linux scripting language like Perl or
structured language like C/C++/Java.
The applications can register with the
Embedded Monitoring Agent to receive
system events. They can request a
command to be executed in response to a
system event. The second table below
provides some command examples. All the
commands issued by the applications must
be authorized before they are executed.
The applications can communicate with a
central management application via any
point-to-point protocol such as HTTP
or MQTT[7].
The RUGGEDCOM RX1400 product is an
ideal platform to be used for Smart Edge
Monitoring. The virtualization solution
is already supported on this platform,
known as the Virtual Processing Engine
(VPE) feature. The VPE feature provides
a platform for hosting third party
applications. The picture above illustrates a
sample monitoring application that was
developed on a RUGGEDCOM RX1400 device,
to collect and analyze CPU, memory and
application information using above design
solution. The information is displayed in a
WEB page running on VPE.
Conclusion
Industrial routers are deployed in mission
critical applications where network outages
are not an option and every attempt must
be done to prevent or isolate network
problems. In this article we discussed
Smart Edge Monitoring as a proactive
solution which can facilitate troubleshooting
complex issues or prevent and even
predict network failures. We also looked
at some of the challenges for integrating
this functionality on industrial routers.
The information provided in this document contains merely general descriptions or
characteristics of performance which in case of actual use do not always apply as
described or which may change as a result of further development of the products.
An obligation to provide the respective characteristics shall only exist if expressly
agreed in the terms of contract.
In order to protect plants, systems, machines and networks against cyber threats, it
is necessary to implement – and continuously maintain – a holistic, state-of-the-art
industrial security concept. Siemens’ products and solutions only form one element
of such a concept. For more information about industrial security, please visit
www.siemens.com/industrialsecurity
siemens.com/ruggedcom
Siemens AG
Process Industries and Drives
Process Automation
Postfach 48 48
90026 Nürnberg
Germany
Siemens Canada Limited
300 Applewood Crescent
Concord, Ontario, L4K 5C7
Canada
© Siemens AG 2017
Subject to change without prior notice
Whitepaper
Produced in Canada
References
[1] H.H. Pang, and Kian-Lee T., “Authentication Query
Results in Edge Computing,” 20th Conference on Data
Engineering 2004
[2] Network Performance Metrics
[3] https://tools.ietf.org/html/rfc6703
[4] Remote Network Monitoring (RMON). RFC3273
[5] Linux Containers. Linuxcontainer.org
[6] Data Distribution Services from OMG
[7] MQ Telemetry Transport from http://mqtt.org/
Figure 2: Sample Edge Monitoring
application running on RUGGEDCOM
RX1400 + VPE1400.
White paper | Enhanced Network Reliability with Edge Monitoring on Industrial Routers | 20 April 2017