Juniper Paragon Insights (formerly HealthBot) User guide

  • Hello! I'm a chat assistant familiar with the contents of the Paragon Insights User Guide. I've read the document and am ready to answer your questions about Paragon Insights. This guide covers various aspects of the application, including health monitoring, root cause analysis, data collection methods, rule creation, playbooks, machine learning techniques, and management of users, devices, alerts and backups.
  • What is closed-loop automation in Paragon Insights?
    What data collection methods does Paragon Insights support?
    What is a playbook in Paragon Insights?
Paragon Insights User Guide
Published
2023-07-26
RELEASE
4.3.0
Juniper Networks, Inc.
1133 Innovaon Way
Sunnyvale, California 94089
USA
408-745-2000
www.juniper.net
Juniper Networks, the Juniper Networks logo, Juniper, and Junos are registered trademarks of Juniper Networks, Inc.
in the United States and other countries. All other trademarks, service marks, registered marks, or registered service
marks are the property of their respecve owners.
Juniper Networks assumes no responsibility for any inaccuracies in this document. Juniper Networks reserves the right
to change, modify, transfer, or otherwise revise this publicaon without noce.
Paragon Insights User Guide
4.3.0
Copyright © 2023 Juniper Networks, Inc. All rights reserved.
The informaon in this document is current as of the date on the tle page.
YEAR 2000 NOTICE
Juniper Networks hardware and soware products are Year 2000 compliant. Junos OS has no known me-related
limitaons through the year 2038. However, the NTP applicaon is known to have some diculty in the year 2036.
END USER LICENSE AGREEMENT
The Juniper Networks product that is the subject of this technical documentaon consists of (or is intended for use
with) Juniper Networks soware. Use of such soware is subject to the terms and condions of the End User License
Agreement ("EULA") posted at hps://support.juniper.net/support/eula/. By downloading, installing or using such
soware, you agree to the terms and condions of that EULA.
ii
Table of Contents
About This Guide | viii
1
Introducon to Paragon Insights
Paragon Insights Overview | 2
Paragon Insights Concepts | 6
Paragon Insights Data Collecon Methods | 7
Paragon Insights Topics | 9
Paragon Insights Rules - Basics | 10
Paragon Insights Rules - Deep Dive | 12
Paragon Insights Playbooks | 31
Paragon Insights Tagging | 33
Overview | 33
Types of Tagging | 40
Add a Tagging Prole | 48
Apply a Tagging Prole | 53
Delete a Tagging Prole | 55
Paragon Insights Time Series Database (TSDB) | 57
Paragon Insights Machine Learning (ML) | 65
Paragon Insights Machine Learning Overview | 65
Understanding Paragon Insights Anomaly Detecon | 66
Understanding Paragon Insights Outlier Detecon | 68
Understanding Paragon Insights Predict | 72
Paragon Insights Rule Examples | 73
Frequency Proles and Oset Time | 87
Frequency Proles | 87
iii
Oset Time Unit | 94
2
Paragon Insights Management and Monitoring
Manage Paragon Insights Users and Groups | 107
Manage Devices, Device Groups, and Network Groups | 121
Adding a Device | 122
Eding a Device | 129
Adding a Device Group | 129
Eding a Device Group | 136
Conguring a Retenon Policy for the Time Series Database | 136
Adding a Network Group | 137
Eding a Network Group | 140
Paragon Insights Rules and Playbooks | 141
Add a Pre-Dened Rule | 141
Create a New Rule Using the Paragon Insights GUI | 142
Edit a Rule | 159
Add a Pre-Dened Playbook | 159
Create a New Playbook Using the Paragon Insights GUI | 160
Edit a Playbook | 161
Clone a Playbook | 162
Manage Playbook Instances | 163
Monitor Device and Network Health | 172
Dashboard | 172
Health | 183
Network Health | 192
Graphs Page | 192
Understand Resources and Dependencies | 206
iv
About the Resources Page | 209
Add Resources for Root Cause Analysis | 212
Congure Dependency in Resources | 215
Example Conguraon: OSPF Resource and Dependency | 221
Edit Resources and Dependencies | 232
Edit a Resource | 232
Edit Resource Dependency | 233
Filter Resources | 234
Upload Resources | 235
Download Resources | 236
Clone Resources | 236
Delete Resources and Dependencies | 237
Delete a Resource | 238
Delete Resource Dependency | 239
Monitor Network Device Health Using Grafana | 239
Grafana Overview | 239
Access the Grafana UI | 240
Run a Query | 240
View Prepopulated Graphs | 242
Back Up and Restore Grafana Data | 243
Understanding Acon Engine Workows | 244
Manage Acon Engine Workows | 244
Alerts and Nocaons | 252
Generate Alert Nocaons | 252
Congure a Nocaon Prole | 253
Enable Alert Nocaons for a Device Group or Network Group | 259
v
Manage Alerts Using Alert Manager | 260
Viewing Alerts | 260
Manage Individual Alerts | 262
Congure Alert Blackouts | 263
Stream Sensor and Field Data from Paragon Insights | 264
Congure the Nocaon Type for Publishing | 264
Publish Data for a Device Group or Network Group | 267
Generate Reports | 268
Use Exim4 for E-Mails | 282
Congure the Exim4 Agent to Send E-mail | 283
Manage Audit Logs | 284
Filter Audit Logs | 284
Export Audit Logs | 285
Paragon Insights Commands and Audit Logs | 286
Congure a Secure Data Connecon for Paragon Insights Devices | 286
Congure Data Summarizaon | 290
Modify the UDA, UDF, and Workow Engines | 299
Commit or Roll Back Conguraon Changes in Paragon Insights | 307
Logs for Paragon Insights Services | 309
Troubleshoong | 312
Paragon Insights Conguraon – Backup and Restore | 321
Back Up the Conguraon | 322
Restore the Conguraon | 322
Backup or Restore the Time Series Database (TSDB) | 323
3
License Management
Paragon Insights Licensing Overview | 326
View, Add, or Delete Paragon Insights Licenses | 326
vi
Add a Paragon Insights License | 327
Delete a Paragon Insights License | 328
View Paragon Insights Licensing Features | 328
View Status and Details of Paragon Insights License | 329
vii
About This Guide
Use this guide to understand the features you can congure and the tasks you can perform from the
Paragon Insights (formerly HealthBot) GUI.
viii
1
CHAPTER
Introducon to Paragon Insights
Paragon Insights Overview | 2
Paragon Insights Concepts | 6
Paragon Insights Tagging | 33
Paragon Insights Time Series Database (TSDB) | 57
Paragon Insights Machine Learning (ML) | 65
Frequency Proles and Oset Time | 87
Paragon Insights Overview
IN THIS SECTION
Main Components of Paragon Insights | 2
Closed-Loop Automaon | 4
Benets of Paragon Insights | 6
Paragon Insights (formerly HealthBot) is a highly automated and programmable device-level diagnoscs
and network analycs tool that provides consistent and coherent operaonal intelligence across
network deployments. Paragon Insights integrates mulple data collecon methods (such as Junos
Telemetry Interface (JTI), NETCONF, syslog, and SNMP) to aggregate and correlate large volumes of
me-sensive telemetry data, thereby providing a muldimensional and predicve view of the network.
Addionally, Paragon Insights translates troubleshoong, maintenance, and real-me analycs into an
intuive user experience to give network operators aconable insights into the health of individual
devices and of the overall network.
Main Components of Paragon Insights
Paragon Insights consists of two main components:
Health Monitoring, to view an abstracted, hierarchical representaon of device and network-level
health, and dene the health parameters of key network elements through customizable key
performance indicators (KPIs), rules, and playbooks. A playbook is a collecon of rules. You can
create a playbook and apply the playbook to a device group or a network group. For more
informaon on rules and playbooks, see "Paragon Insights Rules and Playbooks" on page 141.
Root Cause Analysis, which helps you nd the root cause of a device or network-level issue when
Paragon Insights detects a problem with a network element.
Paragon Insights Health Monitoring
The Challenge
2
With increasing data trac generated by cloud-nave applicaons and emerging technologies, service
providers and enterprises need a network analycs soluon to analyze volumes of telemetry data, oer
insights into overall network health, and produce aconable intelligence. While although telemetry-
based techniques have existed for years, the growing number of protocols, data formats, and KPIs from
diverse networking devices has made data analysis complex and costly. Tradional CLI-based interfaces
require specialized skills to extract business value from telemetry data, creang a barrier to entry for
network analycs
How Paragon Insights Health Monitoring Helps
By aggregang and correlang raw telemetry data from mulple sources, the Paragon Insights Health
Monitoring component provides a muldimensional view of network health that reports current status,
as well as projected threats to the network infrastructure and its workloads.
Health status determinaon is ghtly integrated with the Paragon Insights RCA component, which can
make use of system log data received from the network and its devices. Paragon Insights Health
Monitoring provides status indicators that alert you when a network resource is currently operang
outside a user-dened performance policy. Paragon Insights Health Monitoring does a risk analysis using
historical trends and predicts whether a resource may be unhealthy in the future. Paragon Insights
Health Monitoring not only oers a fully customizable view of the current health of network elements,
but also automacally iniates remedial acons based on predened service level agreements (SLAs).
Dening the health of a network element, such as broadband network gateway (BNG), provider edge
(PE), core, and leaf-spine, is highly contextual. Each element plays a dierent role in a network, with
unique KPIs to monitor. Given that there is no single denion for network health across all use cases,
Paragon Insights provides a highly customizable framework to allow you to dene your own health
proles.
Paragon Insights Root Cause Analysis
The Challenge
For some network issues, it can be challenging for network operators to gure out what caused a
networking device to stop working properly. In such cases, an operator must call on a specialist (with
knowledge built from years of experience) to troubleshoot the problem and nd the root cause.
How Paragon Insights RCA Helps
The Paragon Insights RCA component simplies the process of nding the root cause of a network issue.
Paragon Insights’s RCA captures the troubleshoong knowledge of specialists and has a knowledge base
in the form of Paragon Insights rules. These rules are evaluated either on demand by a specic trigger or
periodically in the background to ascertain the health of a networking component, such as roung
protocol, system, interface, or chassis, on the device.
To illustrate the benets of Paragon Insights RCA, let us consider the problem of OSPF apping. Figure 1
on page 4 highlights the workow sequence involved in debugging OSPF apping. A network
3
operator troubleshoong this issue would need to perform manual debugging steps for each le (step)
of the workow sequence in order to nd the root cause of the OSPF apping. On the other hand,
theParagon Insights RCA applicaon troubleshoots the issue automacally by using an RCA bot. The
RCA bot tracks all of the telemetry data collected by the Paragon Insights and translates the informaon
into graphical status indicators (displayed in the Paragon Insights web GUI) that correlate to dierent
parts of the workow sequence shown in Figure 1 on page 4.
Figure 1: High-level workow to debug OSPF-apping
When you congure Paragon Insights, each le of the workow sequence (shown in Figure 1 on page 4)
can be dened by one or more rules. For example, the RPD-OSPF le could be dened as two rule
condions: one to check if "hello-transmied" counters are incremenng and the other to check if
"hello-received" counters are incremenng. Based on these user-dened rules, Paragon Insights provides
status indicators, alarm nocaons, and an alarm management tool to inform and alert you of specic
network condions that could lead to OSPF apping.
By isolang a problem area in the workow, Paragon Insights RCA proacvely guides you in determining
the appropriate correcve acon to take to x a pending issue or avoid a potenal one.
Closed-Loop Automaon
Paragon Insights oers closed-loop automaon. The automaon workow can be divided into seven
main steps (see Figure 2 on page 5):
1. Dene—The user denes the health parameters of key network elements through customizable key
performance indicators (KPIs), rules, and playbooks, by using the tools provided by Paragon Insights.
4
2. Collect—Paragon Insights collects rule-based telemetry data from mulple devices using the
collecon methods specied for the dierent network devices.
3. Store—Paragon Insights stores me-sensive telemetry data in a me-series database (TSDB). This
allows users to query, perform operaons on, and write new data back to the database, days, or even
weeks aer the inial storage.
4. Analyze—Paragon Insights analyzes telemetry data based on the specied KPIs, rules, and playbooks.
5. Visualize—Paragon Insights provides mulple ways for you to visualize the aggregated telemetry data
through its web-based GUI to gain aconable and predicve insights into the health of your devices
and the overall network.
6. Nofy—Paragon Insights noes you through the GUI and nocaon alarms when problems in
individual devices or in the network are detected.
7. Act Paragon Insights performs user-dened acons to help resolve and proacvely prevent
network problems.
Figure 2: Paragon Insights Closed-Loop Automaon Workow
5
Benets of Paragon Insights
Customizaon—Provides a framework to dene and customize health proles, allowing truly
aconable insights for the specic device or network being monitored.
Automaon—Automates root cause analysis and log le analysis, streamlines diagnosc workows,
and provides self-healing and remediaon capabilies.
Greater network visibility—Provides advanced muldimensional analycs across network elements,
giving you a clearer understanding of network behavior to establish operaonal benchmarks, improve
resource planning, and minimize service downme.
Intuive GUI—Oers an intuive web-based GUI for policy management and easy data consumpon.
Open integraon—Lowers the barrier of entry for telemetry and analycs by providing open source
data pipelines, nocaon capabilies, and third-party device support.
Mulple data collecon methods—Includes support for JTI, OpenCong, NETCONF, CLI, Syslog,
NetFlow, and SNMP.
RELATED DOCUMENTATION
Paragon Insights Geng Started Guide
Paragon Insights Concepts
IN THIS SECTION
Paragon Insights Data Collecon Methods | 7
Paragon Insights Topics | 9
Paragon Insights Rules - Basics | 10
Paragon Insights Rules - Deep Dive | 12
Paragon Insights Playbooks | 31
6
Paragon Insights (formerly HealthBot) is a highly programmable telemetry-based analycs applicaon.
With it, you can diagnose and root cause network issues, detect network anomalies, predict potenal
network issues, and create real-me remedies for any issues that come up.
To accomplish this, network devices and Paragon Insights have to be congured to send and receive
large amounts of data, respecvely. Device conguraon is covered throughout this and other secons
of the guide.
Conguring Paragon Insights, or any applicaon, to read and react to incoming telemetry data requires a
language that describes several elements that are specic to the systems and data under analysis. This
type of language is called a Domain Specic Language (DSL), i.e., a language that is specic to one
domain. Any DSL is built to help answer quesons. For Paragon Insights, these quesons are:
Q: What components make up the systems that are sending data?
A: Network devices are made up of memory, cpu, interfaces, protocols and so on. In Paragon Insights,
these are called "Paragon Insights Topics" on page 9.
Q: How do we gather, lter, process, and analyze all of this incoming telemetry data?
A: Paragon Insights uses "Paragon Insights Rules - Basics" on page 10 that consist of informaon
blocks called sensors, elds, variables, triggers, and more.
Q: How do we determine what to look for?
A: It depends on the problem you want to solve or the queson you want to answer. Paragon
Insights uses "Paragon Insights Playbooks" on page 31 to create collecons of specic rules and
apply them to specic groups of devices in order accomplish specic goals. For example, part of the
system-kpis-playbook can alert a user when system memory usage crosses a user-dened threshold.
This secon covers these key concepts and more, which you need to understand before using Paragon
Insights.
Paragon Insights Data Collecon Methods
IN THIS SECTION
Data Collecon - ’Push’ Model | 8
Data Collecon - ’Pull’ Model | 8
7
In order to provide visibility into the state of your network devices, Paragon Insights rst needs to
collect their telemetry data and other status informaon. It does this using sensors.
Paragon Insights supports sensors that “push” data from the device to Paragon Insights and sensors that
require Paragon Insights to “pull” data from the device using periodic polling.
Data Collecon - ’Push’ Model
As the number of objects in the network, and the metrics they generate, have grown, gathering
operaonal stascs for monitoring the health of a network has become an ever-increasing challenge.
Tradional ’pull’ data-gathering models, like SNMP and the CLI, require addional processing to
periodically poll the network element, and can directly limit scaling.
The ’push’ model overcomes these limits by delivering data asynchronously, which eliminates polling.
With this model, the Paragon Insights server can make a single request to a network device to stream
periodic updates. As a result, the ’push’ model is highly scalable and can support the monitoring of
thousands of objects in a network. Junos devices support this model in the form of the Junos Telemetry
Interface (JTI).
Paragon Insights currently supports ve ‘push’ ingest types.
Nave GPB
• NetFlow
• sFlow
OpenCong
• Syslog
These push-model data collecon—or
ingest
—methods are explained in detail in the Paragon Insights
Data Ingest Guide.
Data Collecon - ’Pull’ Model
While the ’push’ model is the preferred approach for its eciency and scalability, there are sll cases
where the ’pull’ data collecon model is appropriate. With the ’pull’ model, Paragon Insights requests
data from network devices at periodic intervals.
Paragon Insights currently supports two ‘pull’ ingest types.
iAgent (CLI/NETCONF)
• SNMP
8
These pull-model data collecon—or
ingest
—methods are explained in detail in the Paragon Insights
Data Ingest Guide.
Paragon Insights Topics
Network devices are made up of a number of components and systems from CPUs and memory to
interfaces and protocol stacks and more. In Paragon Insights, a topic is the construct used to address
those dierent device components. The Topic block is used to create name spaces that dene what
needs to be modeled. Each Topic block is made up of one or more Rule blocks which, in turn, consist of
the Field blocks, Funcon blocks, Trigger blocks, etc. See "Paragon Insights Rules - Deep Dive" on page
12 for details. Each rule created in Paragon Insights must be part of a topic. Juniper has curated a
number of these system components into a list of Topics such as:
• chassis
• class-of-service
• external
rewall
• interfaces
• kernel
• linecard
• logical-systems
• protocol
roung-opons
• security
• service
• system
You can create sub-topics underneath any of the Juniper topic names by appending
.<sub-topic>
to the
topic name. For example, kernel.tcpip or system.cpu.
Any pre-dened rules provided by Juniper t within one of the Juniper topics with the excepon of
external
, The
external
topic is reserved for user-created rules. In the Paragon Insights web GUI, when
you create a new rule, the Topics eld is automacally populated with the
external
topic name.
9
Paragon Insights Rules - Basics
Paragon Insights’ primary funcon is collecng and reacng to telemetry data from network devices.
Dening how to collect the data, and how to react to it, is the role of a
rule
.
Paragon Insights ships with a set of default rules, which can be seen on the Conguraon > Rules page
of the Paragon Insights GUI, as well as in GitHub in the healthbot-rules repository. You can also create
your own rules.
The structure of a Paragon Insights rule looks like this:
To keep rules organized, Paragon Insights organizes them into
topics
. Topics can be very general, like
system, or they can be more granular, like protocol.bgp. Each topic contains one or more rules.
As described above, a
rule
contains all the details and instrucons to dene how to collect and handle
the data. Each rule contains the following required elements:
The
sensor
denes the parameters for collecng the data. This typically includes which data
collecon method to use (as discussed above in "Paragon Insights Data Collecon Methods" on page
7), some guidance on which data to ingest, and how oen to push or pull the data. In any given rule,
a sensor can be dened directly within the rule or it can be referenced from another rule.
Example: Using the SNMP sensor, poll the network device every 60 seconds to collect all the
device data in the Juniper SNMP MIB table jnxOperangTable.
The sensor typically ingests a large set of data, so
elds
provide a way to lter or manipulate that
data, allowing you to idenfy and isolate the specic pieces of informaon you care about. Fields can
also act as placeholder values, like a stac threshold value, to help the system perform data analysis.
10
Example: Extract, isolate, and store the jnxOperang15MinLoadAvg (CPU 15-minute average
ulizaon) value from the SNMP table specied above in the sensor.
Triggers
periodically bring together the elds with other elements to compare data and determine
current device status. A trigger includes one or more ’when-then’ statements, which include the
parameters that dene how device status is visualized on the health pages.
Example: Every 90 seconds, check the CPU 15min average ulizaon value, and if it goes above a
dened threshold, set the device’s status to red on the device health page and display a message
showing the current value.
The rule can also contain the following oponal elements:
Vectors
allow you to leverage exisng elements to avoid the need to repeatedly congure the same
elements across mulple rules.
Examples: A rule with a congured sensor, plus a vector to a second sensor from another rule; a
rule with no sensors, and vectors to elds from other rules
Variables
can be used to provide addional supporng parameters needed by the required elements
above.
Examples: The string “ge-0/0/0”, used within a eld collecng status for all interfaces, to lter the
data down to just the one interface; an integer, such as “80”, referenced in a eld to use as a stac
threshold value
Funcons
allow you to provide instrucons (in the form of a Python script) on how to further interact
with data, and how to react to certain events.
Examples: A rule that monitors input and output packet counts, using a funcon to compare the
count values; a rule that monitors system storage, invoking a funcon to cleanup temp and log
les if storage ulizaon goes above a dened threshold
NOTE: Rules, on their own, don’t actually do anything. To make use of rules you need to add
them to "Paragon Insights Playbooks" on page 31.
11
Paragon Insights Rules - Deep Dive
IN THIS SECTION
Rules | 13
Sensors | 16
Fields | 16
Vectors | 18
Variables | 19
Funcons | 19
Triggers | 20
Tagging | 24
Rule Properes | 24
Pre/Post Acon | 24
Mulple Sensors per Device | 25
Sensor Precedence | 28
A rule is a package of components, or blocks, needed to extract specic informaon from the network
or from a Junos device. Rules conform to a specically tailored domain specic language (DSL) for
analycs applicaons. The DSL is designed to allow rules to capture:
The minimum set of input data that the rule needs to be able to operate
The minimum set of telemetry sensors that need to be congured on the device(s)
The elds of interest from the congured sensors
The reporng or polling frequency
The set of triggers that operate on the collected data
The condions or evaluaons needed for triggers to kick in
The acons or nocaons that need to be performed when a trigger kicks in
The details around rules, topics and playbooks are presented in the following secons.
12
/