Juniper Paragon Insights (formerly HealthBot) User guide

Paragon Insights User Guide

Published

2023-07-26

RELEASE

4.3.0

Juniper Networks, Inc.

1133 Innovaon Way

Sunnyvale, California 94089

USA

408-745-2000

www.juniper.net

Juniper Networks, the Juniper Networks logo, Juniper, and Junos are registered trademarks of Juniper Networks, Inc.

in the United States and other countries. All other trademarks, service marks, registered marks, or registered service

marks are the property of their respecve owners.

Juniper Networks assumes no responsibility for any inaccuracies in this document. Juniper Networks reserves the right

to change, modify, transfer, or otherwise revise this publicaon without noce.

Paragon Insights User Guide

4.3.0

The informaon in this document is current as of the date on the tle page.

YEAR 2000 NOTICE

Juniper Networks hardware and soware products are Year 2000 compliant. Junos OS has no known me-related

limitaons through the year 2038. However, the NTP applicaon is known to have some diculty in the year 2036.

END USER LICENSE AGREEMENT

The Juniper Networks product that is the subject of this technical documentaon consists of (or is intended for use

with) Juniper Networks soware. Use of such soware is subject to the terms and condions of the End User License

Agreement ("EULA") posted at hps://support.juniper.net/support/eula/. By downloading, installing or using such

soware, you agree to the terms and condions of that EULA.

ii

Table of Contents

About This Guide | viii

1

Introducon to Paragon Insights

Paragon Insights Overview | 2

Paragon Insights Concepts | 6

Paragon Insights Data Collecon Methods | 7

Paragon Insights Topics | 9

Paragon Insights Rules - Basics | 10

Paragon Insights Rules - Deep Dive | 12

Paragon Insights Playbooks | 31

Paragon Insights Tagging | 33

Overview | 33

Types of Tagging | 40

Add a Tagging Prole | 48

Apply a Tagging Prole | 53

Delete a Tagging Prole | 55

Paragon Insights Time Series Database (TSDB) | 57

Paragon Insights Machine Learning (ML) | 65

Paragon Insights Machine Learning Overview | 65

Understanding Paragon Insights Anomaly Detecon | 66

Understanding Paragon Insights Outlier Detecon | 68

Understanding Paragon Insights Predict | 72

Paragon Insights Rule Examples | 73

Frequency Proles and Oset Time | 87

Frequency Proles | 87

iii

Oset Time Unit | 94

2

Paragon Insights Management and Monitoring

Manage Paragon Insights Users and Groups | 107

Manage Devices, Device Groups, and Network Groups | 121

Adding a Device | 122

Eding a Device | 129

Adding a Device Group | 129

Eding a Device Group | 136

Conguring a Retenon Policy for the Time Series Database | 136

Adding a Network Group | 137

Eding a Network Group | 140

Paragon Insights Rules and Playbooks | 141

Add a Pre-Dened Rule | 141

Create a New Rule Using the Paragon Insights GUI | 142

Edit a Rule | 159

Add a Pre-Dened Playbook | 159

Create a New Playbook Using the Paragon Insights GUI | 160

Edit a Playbook | 161

Clone a Playbook | 162

Manage Playbook Instances | 163

Monitor Device and Network Health | 172

Dashboard | 172

Health | 183

Network Health | 192

Graphs Page | 192

Understand Resources and Dependencies | 206

iv

About the Resources Page | 209

Add Resources for Root Cause Analysis | 212

Congure Dependency in Resources | 215

Example Conguraon: OSPF Resource and Dependency | 221

Edit Resources and Dependencies | 232

Edit a Resource | 232

Edit Resource Dependency | 233

Filter Resources | 234

Upload Resources | 235

Download Resources | 236

Clone Resources | 236

Delete Resources and Dependencies | 237

Delete a Resource | 238

Delete Resource Dependency | 239

Monitor Network Device Health Using Grafana | 239

Grafana Overview | 239

Access the Grafana UI | 240

Run a Query | 240

View Prepopulated Graphs | 242

Back Up and Restore Grafana Data | 243

Understanding Acon Engine Workows | 244

Manage Acon Engine Workows | 244

Alerts and Nocaons | 252

Generate Alert Nocaons | 252

Congure a Nocaon Prole | 253

Enable Alert Nocaons for a Device Group or Network Group | 259

v

Manage Alerts Using Alert Manager | 260

Viewing Alerts | 260

Manage Individual Alerts | 262

Congure Alert Blackouts | 263

Stream Sensor and Field Data from Paragon Insights | 264

Congure the Nocaon Type for Publishing | 264

Publish Data for a Device Group or Network Group | 267

Generate Reports | 268

Use Exim4 for E-Mails | 282

Congure the Exim4 Agent to Send E-mail | 283

Manage Audit Logs | 284

Filter Audit Logs | 284

Export Audit Logs | 285

Paragon Insights Commands and Audit Logs | 286

Congure a Secure Data Connecon for Paragon Insights Devices | 286

Congure Data Summarizaon | 290

Modify the UDA, UDF, and Workow Engines | 299

Commit or Roll Back Conguraon Changes in Paragon Insights | 307

Logs for Paragon Insights Services | 309

Troubleshoong | 312

Paragon Insights Conguraon – Backup and Restore | 321

Back Up the Conguraon | 322

Restore the Conguraon | 322

Backup or Restore the Time Series Database (TSDB) | 323

3

License Management

Paragon Insights Licensing Overview | 326

View, Add, or Delete Paragon Insights Licenses | 326

vi

Add a Paragon Insights License | 327

Delete a Paragon Insights License | 328

View Paragon Insights Licensing Features | 328

View Status and Details of Paragon Insights License | 329

vii

About This Guide

Use this guide to understand the features you can congure and the tasks you can perform from the

Paragon Insights (formerly HealthBot) GUI.

viii

1

CHAPTER

Introducon to Paragon Insights

Paragon Insights Overview | 2

Paragon Insights Concepts | 6

Paragon Insights Tagging | 33

Paragon Insights Time Series Database (TSDB) | 57

Paragon Insights Machine Learning (ML) | 65

Frequency Proles and Oset Time | 87

Paragon Insights Overview

IN THIS SECTION

Main Components of Paragon Insights | 2

Closed-Loop Automaon | 4

Benets of Paragon Insights | 6

Paragon Insights (formerly HealthBot) is a highly automated and programmable device-level diagnoscs

and network analycs tool that provides consistent and coherent operaonal intelligence across

network deployments. Paragon Insights integrates mulple data collecon methods (such as Junos

Telemetry Interface (JTI), NETCONF, syslog, and SNMP) to aggregate and correlate large volumes of

me-sensive telemetry data, thereby providing a muldimensional and predicve view of the network.

Addionally, Paragon Insights translates troubleshoong, maintenance, and real-me analycs into an

intuive user experience to give network operators aconable insights into the health of individual

devices and of the overall network.

Main Components of Paragon Insights

Paragon Insights consists of two main components:

• Health Monitoring, to view an abstracted, hierarchical representaon of device and network-level

health, and dene the health parameters of key network elements through customizable key

performance indicators (KPIs), rules, and playbooks. A playbook is a collecon of rules. You can

create a playbook and apply the playbook to a device group or a network group. For more

informaon on rules and playbooks, see "Paragon Insights Rules and Playbooks" on page 141.

• Root Cause Analysis, which helps you nd the root cause of a device or network-level issue when

Paragon Insights detects a problem with a network element.

Paragon Insights Health Monitoring

The Challenge

2

With increasing data trac generated by cloud-nave applicaons and emerging technologies, service

providers and enterprises need a network analycs soluon to analyze volumes of telemetry data, oer

insights into overall network health, and produce aconable intelligence. While although telemetry-

based techniques have existed for years, the growing number of protocols, data formats, and KPIs from

diverse networking devices has made data analysis complex and costly. Tradional CLI-based interfaces

require specialized skills to extract business value from telemetry data, creang a barrier to entry for

network analycs

How Paragon Insights Health Monitoring Helps

By aggregang and correlang raw telemetry data from mulple sources, the Paragon Insights Health

Monitoring component provides a muldimensional view of network health that reports current status,

as well as projected threats to the network infrastructure and its workloads.

Health status determinaon is ghtly integrated with the Paragon Insights RCA component, which can

make use of system log data received from the network and its devices. Paragon Insights Health

Monitoring provides status indicators that alert you when a network resource is currently operang

outside a user-dened performance policy. Paragon Insights Health Monitoring does a risk analysis using

historical trends and predicts whether a resource may be unhealthy in the future. Paragon Insights

Health Monitoring not only oers a fully customizable view of the current health of network elements,

but also automacally iniates remedial acons based on predened service level agreements (SLAs).

Dening the health of a network element, such as broadband network gateway (BNG), provider edge

(PE), core, and leaf-spine, is highly contextual. Each element plays a dierent role in a network, with

unique KPIs to monitor. Given that there is no single denion for network health across all use cases,

Paragon Insights provides a highly customizable framework to allow you to dene your own health

proles.

Paragon Insights Root Cause Analysis

The Challenge

For some network issues, it can be challenging for network operators to gure out what caused a

networking device to stop working properly. In such cases, an operator must call on a specialist (with

knowledge built from years of experience) to troubleshoot the problem and nd the root cause.

How Paragon Insights RCA Helps

The Paragon Insights RCA component simplies the process of nding the root cause of a network issue.

Paragon Insights’s RCA captures the troubleshoong knowledge of specialists and has a knowledge base

in the form of Paragon Insights rules. These rules are evaluated either on demand by a specic trigger or

periodically in the background to ascertain the health of a networking component, such as roung

protocol, system, interface, or chassis, on the device.

To illustrate the benets of Paragon Insights RCA, let us consider the problem of OSPF apping. Figure 1

on page 4 highlights the workow sequence involved in debugging OSPF apping. A network

3

operator troubleshoong this issue would need to perform manual debugging steps for each le (step)

of the workow sequence in order to nd the root cause of the OSPF apping. On the other hand,

theParagon Insights RCA applicaon troubleshoots the issue automacally by using an RCA bot. The

RCA bot tracks all of the telemetry data collected by the Paragon Insights and translates the informaon

into graphical status indicators (displayed in the Paragon Insights web GUI) that correlate to dierent

parts of the workow sequence shown in Figure 1 on page 4.

Figure 1: High-level workow to debug OSPF-apping

When you congure Paragon Insights, each le of the workow sequence (shown in Figure 1 on page 4)

can be dened by one or more rules. For example, the RPD-OSPF le could be dened as two rule

condions: one to check if "hello-transmied" counters are incremenng and the other to check if

"hello-received" counters are incremenng. Based on these user-dened rules, Paragon Insights provides

status indicators, alarm nocaons, and an alarm management tool to inform and alert you of specic

network condions that could lead to OSPF apping.

By isolang a problem area in the workow, Paragon Insights RCA proacvely guides you in determining

the appropriate correcve acon to take to x a pending issue or avoid a potenal one.

Closed-Loop Automaon

Paragon Insights oers closed-loop automaon. The automaon workow can be divided into seven

main steps (see Figure 2 on page 5):

1. Dene—The user denes the health parameters of key network elements through customizable key

performance indicators (KPIs), rules, and playbooks, by using the tools provided by Paragon Insights.

4

2. Collect—Paragon Insights collects rule-based telemetry data from mulple devices using the

collecon methods specied for the dierent network devices.

3. Store—Paragon Insights stores me-sensive telemetry data in a me-series database (TSDB). This

allows users to query, perform operaons on, and write new data back to the database, days, or even

weeks aer the inial storage.

4. Analyze—Paragon Insights analyzes telemetry data based on the specied KPIs, rules, and playbooks.

5. Visualize—Paragon Insights provides mulple ways for you to visualize the aggregated telemetry data

through its web-based GUI to gain aconable and predicve insights into the health of your devices

and the overall network.

6. Nofy—Paragon Insights noes you through the GUI and nocaon alarms when problems in

individual devices or in the network are detected.

7. Act Paragon Insights performs user-dened acons to help resolve and proacvely prevent

network problems.

Figure 2: Paragon Insights Closed-Loop Automaon Workow

5

Benets of Paragon Insights

•Customizaon—Provides a framework to dene and customize health proles, allowing truly

aconable insights for the specic device or network being monitored.

•Automaon—Automates root cause analysis and log le analysis, streamlines diagnosc workows,

and provides self-healing and remediaon capabilies.

• Greater network visibility—Provides advanced muldimensional analycs across network elements,

giving you a clearer understanding of network behavior to establish operaonal benchmarks, improve

resource planning, and minimize service downme.

•Intuive GUI—Oers an intuive web-based GUI for policy management and easy data consumpon.

• Open integraon—Lowers the barrier of entry for telemetry and analycs by providing open source

data pipelines, nocaon capabilies, and third-party device support.

•Mulple data collecon methods—Includes support for JTI, OpenCong, NETCONF, CLI, Syslog,

NetFlow, and SNMP.

RELATED DOCUMENTATION

Paragon Insights Geng Started Guide

Paragon Insights Concepts

IN THIS SECTION

Paragon Insights Data Collecon Methods | 7

Paragon Insights Topics | 9

Paragon Insights Rules - Basics | 10

Paragon Insights Rules - Deep Dive | 12

Paragon Insights Playbooks | 31

6

Paragon Insights (formerly HealthBot) is a highly programmable telemetry-based analycs applicaon.

With it, you can diagnose and root cause network issues, detect network anomalies, predict potenal

network issues, and create real-me remedies for any issues that come up.

To accomplish this, network devices and Paragon Insights have to be congured to send and receive

large amounts of data, respecvely. Device conguraon is covered throughout this and other secons

of the guide.

Conguring Paragon Insights, or any applicaon, to read and react to incoming telemetry data requires a

language that describes several elements that are specic to the systems and data under analysis. This

type of language is called a Domain Specic Language (DSL), i.e., a language that is specic to one

domain. Any DSL is built to help answer quesons. For Paragon Insights, these quesons are:

• Q: What components make up the systems that are sending data?

A: Network devices are made up of memory, cpu, interfaces, protocols and so on. In Paragon Insights,

these are called "Paragon Insights Topics" on page 9.

• Q: How do we gather, lter, process, and analyze all of this incoming telemetry data?

A: Paragon Insights uses "Paragon Insights Rules - Basics" on page 10 that consist of informaon

blocks called sensors, elds, variables, triggers, and more.

• Q: How do we determine what to look for?

A: It depends on the problem you want to solve or the queson you want to answer. Paragon

Insights uses "Paragon Insights Playbooks" on page 31 to create collecons of specic rules and

apply them to specic groups of devices in order accomplish specic goals. For example, part of the

system-kpis-playbook can alert a user when system memory usage crosses a user-dened threshold.

This secon covers these key concepts and more, which you need to understand before using Paragon

Insights.

Paragon Insights Data Collecon Methods

IN THIS SECTION

Data Collecon - ’Push’ Model | 8

Data Collecon - ’Pull’ Model | 8

7

In order to provide visibility into the state of your network devices, Paragon Insights rst needs to

collect their telemetry data and other status informaon. It does this using sensors.

Paragon Insights supports sensors that “push” data from the device to Paragon Insights and sensors that

require Paragon Insights to “pull” data from the device using periodic polling.

Data Collecon - ’Push’ Model

As the number of objects in the network, and the metrics they generate, have grown, gathering

operaonal stascs for monitoring the health of a network has become an ever-increasing challenge.

Tradional ’pull’ data-gathering models, like SNMP and the CLI, require addional processing to

periodically poll the network element, and can directly limit scaling.

The ’push’ model overcomes these limits by delivering data asynchronously, which eliminates polling.

With this model, the Paragon Insights server can make a single request to a network device to stream

periodic updates. As a result, the ’push’ model is highly scalable and can support the monitoring of

thousands of objects in a network. Junos devices support this model in the form of the Junos Telemetry

Interface (JTI).

Paragon Insights currently supports ve ‘push’ ingest types.

•Nave GPB

• NetFlow

• sFlow

•OpenCong

• Syslog

These push-model data collecon—or

ingest

—methods are explained in detail in the Paragon Insights

Data Ingest Guide.

Data Collecon - ’Pull’ Model

While the ’push’ model is the preferred approach for its eciency and scalability, there are sll cases

where the ’pull’ data collecon model is appropriate. With the ’pull’ model, Paragon Insights requests

data from network devices at periodic intervals.

Paragon Insights currently supports two ‘pull’ ingest types.

• iAgent (CLI/NETCONF)

• SNMP

8

These pull-model data collecon—or

ingest

—methods are explained in detail in the Paragon Insights

Data Ingest Guide.

Paragon Insights Topics

Network devices are made up of a number of components and systems from CPUs and memory to

interfaces and protocol stacks and more. In Paragon Insights, a topic is the construct used to address

those dierent device components. The Topic block is used to create name spaces that dene what

needs to be modeled. Each Topic block is made up of one or more Rule blocks which, in turn, consist of

the Field blocks, Funcon blocks, Trigger blocks, etc. See "Paragon Insights Rules - Deep Dive" on page

12 for details. Each rule created in Paragon Insights must be part of a topic. Juniper has curated a

number of these system components into a list of Topics such as:

• chassis

• class-of-service

• external

•rewall

• interfaces

• kernel

• linecard

• logical-systems

• protocol

•roung-opons

• security

• service

• system

You can create sub-topics underneath any of the Juniper topic names by appending

.<sub-topic>

to the

topic name. For example, kernel.tcpip or system.cpu.

Any pre-dened rules provided by Juniper t within one of the Juniper topics with the excepon of

external

, The

external

topic is reserved for user-created rules. In the Paragon Insights web GUI, when

you create a new rule, the Topics eld is automacally populated with the

external

topic name.

9

Paragon Insights Rules - Basics

Paragon Insights’ primary funcon is collecng and reacng to telemetry data from network devices.

Dening how to collect the data, and how to react to it, is the role of a

rule

.

Paragon Insights ships with a set of default rules, which can be seen on the Conguraon > Rules page

of the Paragon Insights GUI, as well as in GitHub in the healthbot-rules repository. You can also create

your own rules.

The structure of a Paragon Insights rule looks like this:

To keep rules organized, Paragon Insights organizes them into

topics

. Topics can be very general, like

system, or they can be more granular, like protocol.bgp. Each topic contains one or more rules.

As described above, a

rule

contains all the details and instrucons to dene how to collect and handle

the data. Each rule contains the following required elements:

• The

sensor

denes the parameters for collecng the data. This typically includes which data

collecon method to use (as discussed above in "Paragon Insights Data Collecon Methods" on page

7), some guidance on which data to ingest, and how oen to push or pull the data. In any given rule,

a sensor can be dened directly within the rule or it can be referenced from another rule.

• Example: Using the SNMP sensor, poll the network device every 60 seconds to collect all the

device data in the Juniper SNMP MIB table jnxOperangTable.

• The sensor typically ingests a large set of data, so

elds

provide a way to lter or manipulate that

data, allowing you to idenfy and isolate the specic pieces of informaon you care about. Fields can

also act as placeholder values, like a stac threshold value, to help the system perform data analysis.

10

• Example: Extract, isolate, and store the jnxOperang15MinLoadAvg (CPU 15-minute average

ulizaon) value from the SNMP table specied above in the sensor.

•

Triggers

periodically bring together the elds with other elements to compare data and determine

current device status. A trigger includes one or more ’when-then’ statements, which include the

parameters that dene how device status is visualized on the health pages.

• Example: Every 90 seconds, check the CPU 15min average ulizaon value, and if it goes above a

dened threshold, set the device’s status to red on the device health page and display a message

showing the current value.

The rule can also contain the following oponal elements:

•

Vectors

allow you to leverage exisng elements to avoid the need to repeatedly congure the same

elements across mulple rules.

• Examples: A rule with a congured sensor, plus a vector to a second sensor from another rule; a

rule with no sensors, and vectors to elds from other rules

•

Variables

can be used to provide addional supporng parameters needed by the required elements

above.

• Examples: The string “ge-0/0/0”, used within a eld collecng status for all interfaces, to lter the

data down to just the one interface; an integer, such as “80”, referenced in a eld to use as a stac

threshold value

•

Funcons

allow you to provide instrucons (in the form of a Python script) on how to further interact

with data, and how to react to certain events.

• Examples: A rule that monitors input and output packet counts, using a funcon to compare the

count values; a rule that monitors system storage, invoking a funcon to cleanup temp and log

les if storage ulizaon goes above a dened threshold

NOTE: Rules, on their own, don’t actually do anything. To make use of rules you need to add

them to "Paragon Insights Playbooks" on page 31.

11

Paragon Insights Rules - Deep Dive

IN THIS SECTION

Rules | 13

Sensors | 16

Fields | 16

Vectors | 18

Variables | 19

Funcons | 19

Triggers | 20

Tagging | 24

Rule Properes | 24

Pre/Post Acon | 24

Mulple Sensors per Device | 25

Sensor Precedence | 28

A rule is a package of components, or blocks, needed to extract specic informaon from the network

or from a Junos device. Rules conform to a specically tailored domain specic language (DSL) for

analycs applicaons. The DSL is designed to allow rules to capture:

• The minimum set of input data that the rule needs to be able to operate

• The minimum set of telemetry sensors that need to be congured on the device(s)

• The elds of interest from the congured sensors

• The reporng or polling frequency

• The set of triggers that operate on the collected data

• The condions or evaluaons needed for triggers to kick in

• The acons or nocaons that need to be performed when a trigger kicks in

The details around rules, topics and playbooks are presented in the following secons.

12

Juniper Paragon Insights (formerly HealthBot) User guide

Related papers

Other documents