NjmonOlderReleaseNotes

njmon Older Release Notes


njmon version 50+ release:

  • After lot of user testing it seems pretty robust to me for njmon v53 and tools v50. Just run a test in your environment before a massive upgrade.
  • Special thanks to Pablo Daniel Martinez, an IBMer in Argentina, who wrote the prototype njmond.py and offered the code to me to Open Source. It's in Python 3 using multiple threads and master slaves model using queues to share work out. It is a master piece of engineering.
  • Make sure: njmon -? | grep -i version

Downloads:

  • njmon for v52 AIX/VIOS: REMOVED & REMOVED
    • 24th Jan 2020: Third attempt to fix a further uptime output style - It happen when there is zero hours in the uptime!
    • For AIX 6, 7.1, 7.2 TL2 & 7.2 TL3+ plus VIOS 2.2.6 and VIOS 3.1+
  • njmon for Linux: REMOVED & REMOVED
    • This includes the RHEL7 NVIDIA GPU version, if you have the NVIDIA GPU library installed at /usr/lib64/libnvidia-ml.so.1
    • For RHEL 7 & 8, SLES 12 & 15, Ubuntu 18 on ppc64 and AMD64 (x86_64)
  • njmon Central Server Tools: njmon_tools_v50.zip
    • Includes: njmond.py, njmond.conf, njmon2influx.py (for batch loading files) plus JSON filters line2pretty.py, pretty2line.py & njmonold2line.py
  • NEW turbo njmon2influx: Moved to njmon_tools_v55.zip
    • Use this to quickly batch load njmon JSON files. It loads via the multi-threaded & queuing njmond.py. Tested to 3500 records a second.
    • You must be running njmond.py. Change njmond.config "workers": 10 or more to go faster. Use njmon2influxturbo -h for more information.

Bug fixes (not the main focus of version 50 but some large user requested these):

  1. Fixes in njmon for Linux v53: -P for process stats memory SIGSEGV bug, handle network with two+ IP addresses & removed devices in "Not Supported" states, removed control chars in JSON strings, removed commented out code and field hardening in debug mode.
  2. Oracle raw disk sizes (normally impossible as the disks are not in a Volume group but can be done via the bootinfo command!), so you can find the RDBMS database disk size. Might help other raw disk uses.
  3. AIX uptime API is a 32 bit counter, so overflows at 497 days - doh!
    • Now njmon for AIX and Linux uses the uptime command but that outputs the uptime in at least six formats -doh! now you get identical data for AIX and Linux:
    • "uptime": {
           "days": 1211,
           "hours": 4,
           "minutes": 27,
           "users": 1
      }
      
    • Is that a record of 1211 days - Let me know?
    • Also note we have the number of logged on Users - Could be VERY interesting to detect unexpected Users or hack attacks.
  4. AIX running within Nutanix has a Machine-ID bug (control characters in the resulting string - now removed) - this crashes the injector as it is invalid JSON.
    • I think is returned by the Nutanix emulation layer of the POWER firmware but that is a guess.

Details . . .

  • This version 50 release does NOT require any InfluxDB changes, if running version 40+.
  • This release delivers simpler njmon admin, reduced resources during injecting on the Influx server and drastically less sockets in use.
  • You will have to replace BOTH njmon endpoint AND central server at the SAME time:
    • All njmon endpoints must be version 50 (but all the CLI options are the same), so it's an executable file change.
    • As njmon will run forever. So you could start it a boot up time as a restart-able service or just "nohup" it or
      use a /etc/rc.local script or hourly crontab using the -k option.
    • The central server Collector and Inject is replace by a single Python program njmond.py.
    • njmond.py has a small config file (in JSON) - this is also used by the bulk loading JSON file njmon2influx.py as it has the InfluxDB connection details.

njmon & nimon for AIX version 62 - Updated 16 April 2020

  • After working out how in the C language to send data directly to the InfluxDB service in a Line Protocol format
  • nimon is like njmon but sends the performance stats data straight to InfluxDB in Line Protocol format
  • Compatible to your njmon 50+ InfluxDB databases & use the regular njmon Grafana templates
  • AIX download REMOVED & include njmon & nimon REMOVED
    • Compiling: read the Makefile, it now creates njmon and nimon binaries. Build with: make [61|71|722|723|vios2|vios3]
  • Installing: new root user install script ninstall:
    • Works out which version to install depending on your AIX version / Linux
    • Installs to /usr/lbin
    • Installs first attempt at man pages for njmon and nimon into /usr/share/man/man1.
  • Fixes for VIOS: skip zero length descriptions, client_part_name & volumegroup name.
  • To check the command version: /bin/lbin/nimon -@
  • Run: nimon -s 30 -c 2880 -i <influxdb-host> -p 8086 -x databasename -y username -z password

njmon & nimon for Linux version 62 - Updated 20 April 2020

  • AIX download REMOVED & include njmon & nimon REMOVED
    • Compiling: read the Makefile, it now creates njmon and nimon binaries.
    • The Makefile works of the Linux Distro, Linux Distro version and hardware platform
    • Covers njmon and nimon biraies for: ppc64le and x86_64 for RHEL 7 + 8, SLES 12 + 15, Ubuntu 18 + 20 (for x86_64).
  • Installing: new root user install script ninstall:
    • Works out which version to install depending on your Linux version
    • Installs to /usr/local/bin
    • Man pages - perhaps next time.
  • Feedback to nigelargriffiths at hotmail DOT com

nmeasure for AIX and Linux version 2 - Updated 20 Aug July 2020
Any data you can get in script you can send to the njmon database with the njmon tags, so you can graph them along with the njmon OS level stats.

  • Renamed measure to nmeasure - avoids possible name clash, better parameter checks and optional user/passwd (if InfluxDB config does not demand them).
  • AIX: binary, code & Makefile nmeasure_aix_v3.zip - tested on AIX 7, 7.1 & 7.2
  • Linux: binary, code & Makefile nmeasure_linux_v3.zip - POWER and AMD versions tested on RHEL 7/8, SLES 12/15 & Ubuntu 18/20
  • I suggest running regular from crontab. Use nmeasure -h for options & hints. Simple Line Protocol data format. Takes ~1/100th of a second.
  • Example: nmeasure g rdbms -G commits=1234.56,rollbacks=123.0,hitratio=98.4 -i influxhost -p 8086 -x njmon

njmon & nimon for AIX version 65 - Updated 7 July 2020
njmon & nimon for Linux version 64 - Updated 12 July 2020

  1. Time-Series database Prometheus is supported via Telegraf (not available on POWER yet),
    see Blog Using nimon with Prometheus
  2. Internal data collector = add your stats via small C program, see Blog njmon/nimon Internal Data Collector
  3. rPerf stats (AIX only) for performance rating your LPAR/VM. Good for server consolidation & LPAR LPM migrating between POWER generations.
  4. Option -R to reduce the number stats (drops logical CPU stats & netbuffers) for a smaller database. If you have more than 32 logical CPUs, it is near impossible to graph them all.
  5. Option -a file to supply the command line options from a file to hid passwords etc.
    • Example: If /home/nigel/nimon.conf contains: -s 60 -k -i influx -p 8086 -x njmon
      Then start on AIX using nimon with: /usr/lbin/nimon -a /home/nigel/nimon.conf
      Or start/restart via crontab 0 * * * * /usr/lbin/nimon -a /home/nigel/nimon.conf 1>/dev/null
      Also note user and password is not required if InfluxDB config file not set to demand them.
    • Don't include the program name in the file.
  6. Quick install script: ninstall
    • Installs the correct versions of njmon & nimon, sets permissions plus installs man pages: man njmon -or- man nimon
    • Run ./ninstall as the root user or use sudo ./ninstall
    • Installs binary to AIX /usr/lbin and Linux /usr/local/bin
  7. Option -b for no PID in process names - user requested.
  8. On startup the child PID is printed. Allows stopping the njmon/nimon later but crontabs will need a >/dev/null to ignore it.
  • Linux: code + Makefile REMOVED
    and programs + ninstall + man pages REMOVED
    • Fixed filesystem stats default back to on. Added NJMON_PID_FILE/NIMON_PID_FILE Shell variabled for the -k option for running multiple njmon or nimon.
  • AIX: code + Makefile REMOVED
    and programs + ninstall + man pages REMOVED
    • Final bug fix for the -a option (fingers crossed)

njmon & nimon for AIX version 68 - Updated 1 Oct 202
njmon & nimon for Linux version 67 - Updated 1 Oct 2020

  1. Fixes to remove statistics that are infinite (normally due to maths with values set to zero)
  2. Fixes for crazy system admin guys having short hostnames and FQDN only set in the network aliases - NOT RECOMMENDED!
  3. Fixes for GPFS installed but not running. If that is not automatically detected, add export NOGPFS=1 to your environment before starting njmon/nimon.

njmon & nimon for AIX version 66 - Updated 1 Sept 2020
njmon & nimon for Linux version 66 - Updated 1 Sept 2020

  1. Fix for GPFS = Spectrum Scale. If GPFS is switch off with njmon/nimon running, they will continue to run.
    • But stats are not restarting, when GPFS is restarted.
  2. Improvements in the manual pages
  3. Reminder quick install script: ninstall
    • Installs the correct versions of njmon & nimon, sets permissions plus installs man pages: man njmon -or- man nimon
    • Run ./ninstall as the root user or use sudo ./ninstall
    • Installs binaries to AIX /usr/lbin and Linux /usr/local/bin
  4. Option -K pidfile = same as the -k option but the user decides the pidfile name.
    • For example /var/log/nimon.pid This only works for the root user, of course.
    • This allows multiple njmon/nimon to run at the same time with different pidfiles.
      • Reminder, -k option checks to see if the previous njmon/nimon is still running. If yes, it quietly exits.
      • This allows, for example, hourly cron based restarts that do nothing, if all is well.
  5. Option -H for nimon this forces the full hostname (FQDN) as the Influx host tag
    • Important, if you have multiple VM with the same short hostname in different domains.
    • IMHO this is bad practice and NOT recommended on AIX (see smitty tcpip + help).
    • For njmon you need to change the njmond.py code: replace hostname -> fullhostname in the tags.
  6. Fix: When a short hostname is the default, the function gethostbyname() is now used to get the full hostname (FQDN).
  7. Linux: C code + Makefile njmon_linux_code_v66.zip
    and programs + ninstall + man pages njmon_linux_binaries_v66.zip
  8. AIX: C code + Makefile njmon_aix_code_v66.zip
    and programs + ninstall + man pages njmon_aix_binaries_v66.zip
  9. Version 67 has corrections for finding Fully Qualified Domain Names on badly configured AIX systems (hostname and DNS hostnames are short names and FQDN is an alias).

News in 2020:

  • March: InfluxDB available on AIX
  • March: njmon version 50 allows a back-end restart at anytime and less processes with njmond.py
  • April: Released nimon = njmon but straight to InfluxDB (no intermediate JSON)
  • April: Ten njmon videos on njmon YouTube Playlist covering njmond.py & nimon: Click Here
  • June: njmon version 63+ for AIX - Prometheus support, add your own stats (C function), rPerf stats (AIX only) -a option for a config file and smaller items
  • July: njmon version 63+ for Linux - Prometheus support, add your own stats (C function), -a option for a config file and smaller items
  • July: New "measure" lets you add your own stats to the njmon database and graph along side the OS stats.
  • July: Listing the measures & stats in the njmon/nimon data to aid graphing Article & nimon_list_stats Shell script
  • July: Sending data to Prometheus - Click Here
  • August: Two videos for the njmon YouTube Playlist 11: Tags, Measure, Stats organisation & 12: Adding your own stats
  • August: Updated measure to nmeasure, better argument checking & user/passwd is optional (depends on your InfluxDB config).
  • September: njmon/nimon for AIX and AMD64 version 66. FQDN hostnames and users can decide the pidfile with -K option.
  • October: Further improvements for FQDN hostnames (-H) and "export NOGPFS=1" to ignore offline GPFS. Fixed in AIX v67 & Linux v67.
  • October: Updated njmond_toolsv55.zip improved njmond.py (if InfluxDB stops data is queued), now includes njmon2influxturbo.py
    • Fixed njmond thread failure if InfluxDB stops, failed injects retried every 15 seconds and data is cached until InfluxDB is up.
  • November:
    • njmond.py in njmon_tools updated. Fix a crash if influxDB stopped, now it also retries sending data and caches 1000's of records.
      • A big "thank you" to Donatas Rimkus (IBMer) for pointing out the bug, fix and other ideas.
    • njmon/nimon for AIX v69: added -n option for no-PID output on start-up and AIX improved manual for AIX njmon manual page and nimon manual page
    • njmon/nimon for Linux v69: added -n option for no-PID output on start-up and Linux manual for njmon Linux manual page and nimon Linux manual page
    • Noticed: like the telegraf plugin for Prometheus, there is also plugins for Splunk & elastic = very cool. Can some one test them?
    • njmon Hands-On Workshop - Creating you first Grafana Dashboard Graphs

New Version 71 Released - 16 Dec 2020

For testing new function:

  • GPFS improvements for massive number handling and sscanf() error checking
  • nimon reverse proxy POST with correct hostname and port, second attempt (-i host is passed on)
  • -W for ignoring warning messages like problem file system
  • Retested -t threshold option, AIX and Linux default 0.01% CPU
  • -X secret is depreciated = ignored
  • nimon -p port defaults to 8086
  • nimon file output (use -ff) include timestamps for loading in to InfluxDB
  • Linux on POWER reports same SerialNo as AIX
  • Linux on Mainframe/Z is now supported (Red Hat RHEL 7). Use: make Z
  • -k -r -K fixed and will stop if PID file exists but can't access it
  • AIX added voluntary /involuntary context switch + virtual, affinity=rdisp_sd0 to sd5 stats
  • All Debug output to stderr

New Version 73 AIX Released - 16 Feb 2021

  • Bug fix: Option -P did not switch on the Processes stats

New Version 74 AIX Released - 15 April 2021

  • Using AIX microsleep function (usleep()) for more accurate snapshot time keeping.
  • Refined uptime - yet more output styles found and correct a bug.
  • Added synthetic transaction monitoring for VM under stress warning
    • See measure "timestamp" now includes statistics: sleeping, execute_time and sleep_overrun.
    • Larger execute and overruns times highlight a lack of CPU time to run njmon/nimon.
  • Correction to units for disks: read and write "mbps" are now MB/s (previouly KB/s).
  • Massive Terabyte filesystems size calculations corrected - using double instead of float.
  • nimon sub-resources for processes was called "processe" - now "process".
  • nimon tag correction when vFC client_part_name is missing, it uses "none" to make the line protocol format valid.

New Version 75 AIX Released - not released

New Version 76 AIX Released - 16 May 2021

  1. Reduced just four binaries
    • For AIX6 and AIX7 binaries plus VIOS2 and VIOS3 binaries
    • Due to a recent AIX 7.1 binary works on AIX 7.2 provided both are recent releases of libperfstat
    • Old AIX AIX 7.1 or 7.2 might fail to start - compile directly on your syste (might need to remove some stats) ad let me know
  2. Merged njmon and nimon in to one binary.
    • If binary filename starts with njmon = njmon mode = JSON output
    • If binary filename starts with nimon then it is InfluxDB Line Protocol output
    • Added njmon/nimon CLI options -I and -J to force other mode. So -I force njmon to Influx output and -J forces nimon to use JSON output
  3. Improved the njmon version details, build date and Makefile simplified.
  4. Added elapsed time to timestamp measure
  5. Added compiler directive -D RAWSTATS to aid debugging unknow stat units and calculations.
    • not normally needed
  6. Changed calculation for tx_queue_size as it is not a rate but point in time number
  7. All cpu_logical stats renamed cpu_logicals as integers are now double floating point
  8. All cpu_physical stats renamed cpu_physicals as integers are now double floating point
    • the rename works around the InfluxDB limit that the type can't change
  9. Added runocc_average & swocc_average
  10. File system stats now use fstat64()
    • Use statfs_buffer.f_bsize for MB calculations
    • Fixes filesystem size for multi TB filesystems
    • Added file count and free files in the file system stats
  11. Finally removed the secret number / cookie
  12. Hacked the Hints output for mixed njmon and nimon syntax
  13. Merged njmon and nimon manual pages but still place in the manual pages njmon and nimon
  14. Added ps_disk_flush_minmax() function so the disk stats min + max get reset for every snapshot
  15. Change disk rserv_min/max/avg and wserv_min/max/avg calculations with HW Ticks
  16. Added AIX error report (errpt) number of errors to the "server" measure
    • The number is problematic - the number saved depends on the errpt buffering space
    • I found on my server that it got to 52 and then removed 1 error for each new on arriving

New Version 78 Linux Released - 20 September 2021

  • Max CPUs=240 with SMT=8 for the IBM Power10 E1080 .
  • Merged njmon & nimon in to one binary like that for AIX including merges of manual pages and changes to Makefile/ninstall.
    • njmon/nimon mode determined by the start of the binary filename or using override options -J (njmon mode) or -I (nimon mode).
  • Added synthetic transaction of monitoring njmon running - execute_time and sleep_overrun = excellent indicator of VM stress = needing addition resources.
  • Double -ff now saves each JSON records in separate files is a 6 digit number.
  • Extra fixes for uptime.
  • -! Outputs the njmon version number.