Centreon : Supervision d’un serveur HP

Salut à tous 🙂

Aujourd’hui nous allons voir comment superviser tous les composants de notre carte iLo sur un serveur HP. Cette supervision des composants va nous permettre de contrôler les différents matériaux que compose notre serveur à savoir : Les cartes réseaux, Les disques durs, la ventilation, Les blocs alimentations, la température du serveur, Le(s) processeur(s) et Les cartes réseaux


1 – Les prérequis

Pour rédiger cet article, j’utilise un serveur centreon en version 20.04.2 et je vais monitorer trois serveurs HP Proliant DL360 Gen 9.

Evidemment le serveur centreon et les cartes iLo des serveurs HP sont configurés avec des adresses IP statiques.

Sur votre serveur centreon, il faut avoir mise à jour les Centreon plugin packs depuis GitHub


2- Configuration de la carte iLo

Nous allons commencer par nous connecter sur les cartes iLo de nos serveurs

Il faut maintenant se rendre dans Administration > Management

Il faut renseigner dans cette page le nom de la communauté en lecture et l’adresse IP du serveur centreon

Une fois les modifications effectuées nous pouvons cliquer sur Apply pour enregistrer nos modifications

Maintenant que le serveur est prêt à recevoir nos trames SNMP, nous pouvons nous connecter à notre serveur pour configurer nos nouveaux services.


3 – Ajout de l’hôte dans centreon

Nous allons nous rendre dans le menu Hôtes > Hôtes pour ajouter nos serveurs

Nous allons cliquer sur Ajouter

Il faut remplir les informations du nouvel hôte avec l’@ IP de la carte iLo de votre serveur, le nom de la communauté, l’intervalle de vérification et pour finir nous allons choisir la commande base_host_alive (commande de ping) pour la vérification de notre hôte

Avant de créer nos commandes dans centreon nous allons nous connecter en SSH sur notre serveur centreon pour tester si elles fonctionnent


4 – Test des commandes

Je vais me rendre dans le répertoire d’installation de mes plugins via ssh grâce à putty

cd /usr/lib/centreon/plugins

Saisir la commande suivante

./centreon_plugins.pl --plugin=hardware::server::hp::proliant::snmp::plugin --mode=hardware --hostname=192.168.1.30 --snmp-community=networks-it --verbose

Cette commande doit nous retourner un nombre important de résultat

UNKNOWN: da controller accelerator '0' is other | 'psu_power_0.1'=49W;;500;; 'psu_voltage_0.1'=230V;;;; 'psu_power_0.2'=29W;;500;; 'psu_voltage_0.2'=230V;;;; 'temp_0.1_ambient'=20C;;0:42;; 'temp_0.2_cpu'=40C;;0:70;; 'temp_0.3_cpu'=40C;;0:70;; 'temp_0.4_memory'=24C;;0:89;; 'temp_0.5_memory'=24C;;0:89;; 'temp_0.6_memory'=28C;;0:89;; 'temp_0.7_memory'=25C;;0:89;; 'temp_0.8_system'=35C;;0:60;; 'temp_0.10_system'=30C;;0:105;; 'temp_0.11_powerSupply'=31C;;0:0;; 'temp_0.12_powerSupply'=31C;;0:0;; 'temp_0.13_system'=30C;;0:115;; 'temp_0.14_system'=34C;;0:115;; 'temp_0.15_system'=26C;;0:115;; 'temp_0.16_system'=26C;;0:115;; 'temp_0.17_system'=30C;;0:115;; 'temp_0.18_system'=29C;;0:115;; 'temp_0.19_powerSupply'=40C;;0:0;; 'temp_0.20_powerSupply'=40C;;0:0;; 'temp_0.24_ioBoard'=51C;;0:100;; 'temp_0.27_ambient'=22C;;0:65;; 'temp_0.28_system'=28C;;0:75;; 'temp_0.29_system'=27C;;0:75;; 'temp_0.30_system'=32C;;0:90;; 'temp_0.31_ioBoard'=25C;;0:70;; 'temp_0.32_ioBoard'=26C;;0:70;; 'temp_0.34_ioBoard'=28C;;0:70;; 'temp_0.35_system'=25C;;0:75;; 'temp_0.37_powerSupply'=27C;;0:100;; 'count_cpu'=2;;;; 'count_daacc'=1;;;; 'count_dactl'=1;;;; 'count_daldrive'=1;;;; 'count_dapdrive'=4;;;; 'count_fan'=7;;;; 'count_ilo'=1;;;; 'count_lnic'=2;;;; 'count_pnic'=4;;;; 'count_psu'=2;;;; 'count_temperature'=29;;;;
Product Name: ProLiant DL360 Gen9, Serial: DFX1234RWZ, Rom Version: P89 v2.00 (12/27/2015)
Checking cpu
cpu '0' [slot: 0, unit: 0, name: Intel Xeon, socket: 1] status is ok.
cpu '1' [slot: 0, unit: 0, name: Intel Xeon, socket: 2] status is ok.
Checking ide controllers
Checking ide logical drives
Checking ide physical drives
Checking power converters
Checking power supplies
powersupply '0.1' status is ok [redundance: 3, redundant partner: 0] (status noError).
powersupply '0.2' status is ok [redundance: 3, redundant partner: 0] (status noError).
Checking sas controllers
Checking sas logical drives
Checking sas physical drives
Checking scsi controllers
Checking scsi logical drives
Checking scsi physical drives
Checking fca host controller
Checking fca external controller
Checking fca external accelerator boards
Checking fca logical drives
Checking fca physical drives
Checking da controller
da controller '0' [slot: 0, model: unknown] status is ok.
Checking da accelerator boards
da controller accelerator '0' [status: invalid, battery status: not present] condition is other.
Checking da logical drives
da logical drive '0.1' [fault tolerance: distribDataGuard, condition: ok] status is ok.
Checking da physical drives
da physical drive '0.0' [status: ok] condition is ok.
da physical drive '0.1' [status: ok] condition is ok.
da physical drive '0.2' [status: ok] condition is ok.
da physical drive '0.3' [status: ok] condition is ok.
Checking fans
fan '0.1' status is ok, speed is normal [location: system, redundance: redundant, redundant partner: 2].
fan '0.2' status is ok, speed is normal [location: system, redundance: redundant, redundant partner: 3].
fan '0.3' status is ok, speed is normal [location: system, redundance: redundant, redundant partner: 4].
fan '0.4' status is ok, speed is normal [location: system, redundance: redundant, redundant partner: 5].
fan '0.5' status is ok, speed is normal [location: system, redundance: redundant, redundant partner: 6].
fan '0.6' status is ok, speed is normal [location: system, redundance: redundant, redundant partner: 7].
fan '0.7' status is ok, speed is normal [location: system, redundance: redundant, redundant partner: 1].
Checking physical nics
physical nic '1' [duplex: full, role: unknown, state: ok, status: ok] condition is ok.
physical nic '2' [duplex: full, role: unknown, state: ok, status: ok] condition is ok.
physical nic '3' [duplex: full, role: unknown, state: ok, status: ok] condition is ok.
physical nic '4' [duplex: full, role: unknown, state: ok, status: ok] condition is ok.
Checking logical nics
logical nic '1' [adapter count: 0, description: Software Loopback Interface 1, status: unknown] condition is other.
logical nic '2' [adapter count: 4, description: Switch Independent Team, status: ok] condition is ok.
Checking temperatures
'0.1' ambient temperature is 20C (42 max) (status is ok).
'0.2' cpu temperature is 40C (70 max) (status is ok).
'0.3' cpu temperature is 40C (70 max) (status is ok).
'0.4' memory temperature is 24C (89 max) (status is ok).
'0.5' memory temperature is 24C (89 max) (status is ok).
'0.6' memory temperature is 28C (89 max) (status is ok).
'0.7' memory temperature is 25C (89 max) (status is ok).
'0.8' system temperature is 35C (60 max) (status is ok).
'0.10' system temperature is 30C (105 max) (status is ok).
'0.11' powerSupply temperature is 31C (0 max) (status is ok).
'0.12' powerSupply temperature is 31C (0 max) (status is ok).
'0.13' system temperature is 30C (115 max) (status is ok).
'0.14' system temperature is 34C (115 max) (status is ok).
'0.15' system temperature is 26C (115 max) (status is ok).
'0.16' system temperature is 26C (115 max) (status is ok).
'0.17' system temperature is 30C (115 max) (status is ok).
'0.18' system temperature is 29C (115 max) (status is ok).
'0.19' powerSupply temperature is 40C (0 max) (status is ok).
'0.20' powerSupply temperature is 40C (0 max) (status is ok).
'0.24' ioBoard temperature is 51C (100 max) (status is ok).
'0.27' ambient temperature is 22C (65 max) (status is ok).
'0.28' system temperature is 28C (75 max) (status is ok).
'0.29' system temperature is 27C (75 max) (status is ok).
'0.30' system temperature is 32C (90 max) (status is ok).
'0.31' ioBoard temperature is 25C (70 max) (status is ok).
'0.32' ioBoard temperature is 26C (70 max) (status is ok).
'0.34' ioBoard temperature is 28C (70 max) (status is ok).
'0.35' system temperature is 25C (75 max) (status is ok).
'0.37' powerSupply temperature is 27C (100 max) (status is ok).
Checking ilo
ilo status is ok [message = ].

Pour filtrer les résultats nous allons utiliser la macro component pour monitorer les éléments en particulier.


4 – a ) Cartes réseaux

Pour surveiller les cartes réseaux il faut saisir la commande suivante

[root@centreon plugins]# ./centreon_plugins.pl --plugin=hardware::server::hp::proliant::snmp::plugin --mode=hardware --hostname=192.168.1.30 --component=nic --snmp-community=networks-it --verbose

Résultat

UNKNOWN: physical nic '2' is other - physical nic '3' is other - physical nic '4' is other | 'count_lnic'=5;;;; 'count_pnic'=4;;;;
Product Name: ProLiant DL360 Gen9, Serial: DFX1234RWZ, Rom Version: P89 v2.00 (12/27/2015)
Checking physical nics
physical nic '1' [duplex: full, role: unknown, state: ok, status: ok] condition is ok.
physical nic '2' [duplex: full, role: unknown, state: ok, status: ok] condition is ok.
physical nic '3' [duplex: full, role: unknown, state: ok, status: ok] condition is ok.
physical nic '4' [duplex: full, role: unknown, state: ok, status: ok] condition is ok.
Checking logical nics
logical nic '1' [adapter count: 0, description: Software Loopback Interface 1, status: unknown] condition is other.
logical nic '2' [adapter count: 4, description: Switch Independent Team, status: ok] condition is ok.

4 – b ) CPU

Pour surveiller les CPU il faut saisir la commande suivante

[root@centreon plugins]# ./centreon_plugins.pl --plugin=hardware::server::hp::proliant::snmp::plugin --mode=hardware --hostname=192.168.1.30 --component=cpu--snmp-community=networks-it --verbose

Résultat

OK: All 2 components are ok [2/2 cpus]. | 'count_cpu'=2;;;;
Product Name: ProLiant DL360 Gen9, Serial: DFX1234RWZ, Rom Version: P89 v2.00 (12/27/2015)
Checking cpu
cpu '0' [slot: 0, unit: 0, name: Intel Xeon, socket: 1] status is ok.
cpu '1' [slot: 0, unit: 0, name: Intel Xeon, socket: 2] status is ok.

4 – c ) Les disques

Pour surveiller les disques il faut saisir la commande suivante

[root@centreon plugins]# ./centreon_plugins.pl --plugin=hardware::server::hp::proliant::snmp::plugin --mode=hardware --hostname=192.168.1.30 --component=storage--snmp-community=networks-it --verbose

Résultat

UNKNOWN: da controller accelerator '0' is other | 'count_daacc'=1;;;; 'count_dactl'=1;;;; 'count_daldrive'=1;;;; 'count_dapdrive'=4;;;;
Product Name: ProLiant DL360 Gen9, Serial: DFX1234RWZ, Rom Version: P89 v2.00 (12/27/2015)
Checking ide controllers
Checking ide logical drives
Checking ide physical drives
Checking sas controllers
Checking sas logical drives
Checking sas physical drives
Checking scsi controllers
Checking scsi logical drives
Checking scsi physical drives
Checking fca host controller
Checking fca external controller
Checking fca external accelerator boards
Checking fca logical drives
Checking fca physical drives
Checking da controller
da controller '0' [slot: 0, model: unknown] status is ok.
Checking da accelerator boards
da controller accelerator '0' [status: invalid, battery status: not present] condition is other.
Checking da logical drives
da logical drive '0.1' [fault tolerance: distribDataGuard, condition: ok] status is ok.
Checking da physical drives
da physical drive '0.0' [status: ok] condition is ok.
da physical drive '0.1' [status: ok] condition is ok.
da physical drive '0.2' [status: ok] condition is ok.
da physical drive '0.3' [status: ok] condition is ok.

4 – d ) la ventilation

Pour surveiller les La ventilation il faut saisir la commande suivante

[root@centreon plugins]# ./centreon_plugins.pl --plugin=hardware::server::hp::proliant::snmp::plugin --mode=hardware --hostname=192.168.1.30 --component=fan--snmp-community=networks-it --verbose

Résultat

OK: All 7 components are ok [7/7 fans]. | 'count_fan'=7;;;;
Product Name: ProLiant DL360 Gen9, Serial: DFX1234RWZ, Rom Version: P89 v2.00 (12/27/2015)
Checking fans
fan '0.1' status is ok, speed is normal [location: system, redundance: redundant, redundant partner: 2].
fan '0.2' status is ok, speed is normal [location: system, redundance: redundant, redundant partner: 3].
fan '0.3' status is ok, speed is normal [location: system, redundance: redundant, redundant partner: 4].
fan '0.4' status is ok, speed is normal [location: system, redundance: redundant, redundant partner: 5].
fan '0.5' status is ok, speed is normal [location: system, redundance: redundant, redundant partner: 6].
fan '0.6' status is ok, speed is normal [location: system, redundance: redundant, redundant partner: 7].
fan '0.7' status is ok, speed is normal [location: system, redundance: redundant, redundant partner: 1].

4 – e ) les températures

Pour surveiller la température des composants du serveur il faut saisir la commande suivante

[root@centreon plugins]# ./centreon_plugins.pl --plugin=hardware::server::hp::proliant::snmp::plugin --mode=hardware --hostname=192.168.1.30 --component=temp --snmp-community=networks-it --verbose

Résultat

OK: All 29 components are ok [29/29 temperatures]. | 'temp_0.1_ambient'=20C;;0:42;; 'temp_0.2_cpu'=40C;;0:70;; 'temp_0.3_cpu'=40C;;0:70;; 'temp_0.4_memory'=24C;;0:89;; 'temp_0.5_memory'=24C;;0:89;; 'temp_0.6_memory'=28C;;0:89;; 'temp_0.7_memory'=25C;;0:89;; 'temp_0.8_system'=35C;;0:60;; 'temp_0.10_system'=30C;;0:105;; 'temp_0.11_powerSupply'=31C;;0:0;; 'temp_0.12_powerSupply'=31C;;0:0;; 'temp_0.13_system'=29C;;0:115;; 'temp_0.14_system'=32C;;0:115;; 'temp_0.15_system'=26C;;0:115;; 'temp_0.16_system'=26C;;0:115;; 'temp_0.17_system'=30C;;0:115;; 'temp_0.18_system'=29C;;0:115;; 'temp_0.19_powerSupply'=40C;;0:0;; 'temp_0.20_powerSupply'=40C;;0:0;; 'temp_0.24_ioBoard'=51C;;0:100;; 'temp_0.27_ambient'=22C;;0:65;; 'temp_0.28_system'=28C;;0:75;; 'temp_0.29_system'=27C;;0:75;; 'temp_0.30_system'=32C;;0:90;; 'temp_0.31_ioBoard'=25C;;0:70;; 'temp_0.32_ioBoard'=26C;;0:70;; 'temp_0.34_ioBoard'=28C;;0:70;; 'temp_0.35_system'=25C;;0:75;; 'temp_0.37_powerSupply'=27C;;0:100;; 'count_temperature'=29;;;;
Product Name: ProLiant DL360 Gen9, Serial: DFX1234RWZ, Rom Version: P89 v2.00 (12/27/2015)
Checking temperatures
'0.1' ambient temperature is 20C (42 max) (status is ok).
'0.2' cpu temperature is 40C (70 max) (status is ok).
'0.3' cpu temperature is 40C (70 max) (status is ok).
'0.4' memory temperature is 24C (89 max) (status is ok).
'0.5' memory temperature is 24C (89 max) (status is ok).
'0.6' memory temperature is 28C (89 max) (status is ok).
'0.7' memory temperature is 25C (89 max) (status is ok).
'0.8' system temperature is 35C (60 max) (status is ok).
'0.10' system temperature is 30C (105 max) (status is ok).
'0.11' powerSupply temperature is 31C (0 max) (status is ok).
'0.12' powerSupply temperature is 31C (0 max) (status is ok).
'0.13' system temperature is 29C (115 max) (status is ok).
'0.14' system temperature is 32C (115 max) (status is ok).
'0.15' system temperature is 26C (115 max) (status is ok).
'0.16' system temperature is 26C (115 max) (status is ok).
'0.17' system temperature is 30C (115 max) (status is ok).
'0.18' system temperature is 29C (115 max) (status is ok).
'0.19' powerSupply temperature is 40C (0 max) (status is ok).
'0.20' powerSupply temperature is 40C (0 max) (status is ok).
'0.24' ioBoard temperature is 51C (100 max) (status is ok).
'0.27' ambient temperature is 22C (65 max) (status is ok).
'0.28' system temperature is 28C (75 max) (status is ok).
'0.29' system temperature is 27C (75 max) (status is ok).
'0.30' system temperature is 32C (90 max) (status is ok).
'0.31' ioBoard temperature is 25C (70 max) (status is ok).
'0.32' ioBoard temperature is 26C (70 max) (status is ok).
'0.34' ioBoard temperature is 28C (70 max) (status is ok).
'0.35' system temperature is 25C (75 max) (status is ok).
'0.37' powerSupply temperature is 27C (100 max) (status is ok).

4 – f ) Les blocs d’alimentation

Pour surveiller les blocs d’alimentation il faut saisir la commande suivante

[root@centreon plugins]# ./centreon_plugins.pl --plugin=hardware::server::hp::proliant::snmp::plugin --mode=hardware --hostname=192.168.1.30 --component=psu--snmp-community=networks-it --verbose

Résultat

OK: All 2 components are ok [2/2 power supplies]. | 'psu_power_0.1'=53W;;500;; 'psu_voltage_0.1'=231V;;;; 'psu_power_0.2'=32W;;500;; 'psu_voltage_0.2'=230V;;;; 'count_psu'=2;;;;
Product Name: ProLiant DL360 Gen9, Serial: DFX1234RWZ, Rom Version: P89 v2.00 (12/27/2015)
Checking power supplies
powersupply '0.1' status is ok [redundance: 3, redundant partner: 0] (status noError).
powersupply '0.2' status is ok [redundance: 3, redundant partner: 0] (status noError).

5 – Configurations de la commande de vérification dans centreon

Nous allons maintenant créer nos commandes de vérification dans centreon, nous devons nous rendre dans le menu Commandes > Contrôles

Cliquer sur ajouter

Nommez votre commande de vérification et décrivez les macros

$USER2$/centreon_plugins.pl --plugin=$_SERVICEPLUGIN$ --mode=$_SERVICEMODE$ --hostname=$HOSTADDRESS$ --component=$_SERVICECOMPONENT$ --snmp-community=$_HOSTSNMPCOMMUNITY$ $_SERVICEOPTION$

6 – Ajout des services

Maintenant que nous avons enregistré la commande de vérification, il faut aller créer les services de vérifications et le lier aux hôtes. Il faut se rendre dans les paramètres > Services > Services par hôtes

Cliquer sur ajouter


6 – a ) Service de gestion des cartes réseaux

Complétons les éléments de la fiche de création du nouveau service. On va commercer par le service qui va surveiller les cartes réseaux.

Dans tous les cas il faudra chosir en commande de vérification la commande de contrôle faite précédemment

Les macros personnalisées 
PLUGIN : hardware::server::hp::proliant::snmp::plugin
MODE : hardware
COMPONENT : nic
OPTION : --verbose --threshold-overload='daacc,OK,other'

6 – b ) Service de gestion du CPU

Les macros personnalisées
PLUGIN : hardware::server::hp::proliant::snmp::plugin
MODE : hardware
COMPONENT : cpu
OPTION : --verbose

6 – c ) Service de gestion des Disques

Les macros personnalisées
PLUGIN : hardware::server::hp::proliant::snmp::plugin
MODE : hardware
COMPONENT : nic
OPTION : --verbose --threshold-overload='daacc,OK,other'

6 – d )Service de gestion de la ventilation

Les macros personnalisées
PLUGIN : hardware::server::hp::proliant::snmp::plugin
MODE : hardware
COMPONENT : nic
OPTION : --verbose 

 6 – e ) Température des composants du serveur

Les macros personnalisées
PLUGIN : hardware::server::hp::proliant::snmp::plugin
MODE : hardware
COMPONENT : nic
OPTION : --verbose

6 – f ) Les blocs d’alimentation

Les macros personnalisées
PLUGIN : hardware::server::hp::proliant::snmp::plugin
MODE : hardware
COMPONENT : nic
OPTION : --verbose

Il faut maintenant aller redémarrer le collecteur pour appliquer les modifications

On sélectionne le collecteur et on clique sur Exporter la configuration

On coche les 4 premières options et on clique sur exporter

Une fois l’exportation terminée, dans les détails de statut des services, on peut constater au bout de quelques minutes que le nouveau service est opérationnel

C’est terminé vous pouvez maintenant surveiller vos serveurs depuis centreon.