1111| DimmPlugin | sh -c 'dmidecode -t 17 \| tr -s " " \| grep -v "Volatile\\ | None\\ | Module" \| grep Size' 2>/dev/null<br >dmidecode<br >wmic memorychip get Capacity | - | ** Collection Args:** <br >- ` skip_sudo ` : bool | [ DimmDataModel] ( #DimmDataModel-Model ) | [ DimmCollector] ( #Collector-Class-DimmCollector ) | - |
1212| DkmsPlugin | dkms status<br >dkms --version | ** Analyzer Args:** <br >- ` dkms_status ` : Union[ str, list] <br >- ` dkms_version ` : Union[ str, list] <br >- ` regex_match ` : bool | - | [ DkmsDataModel] ( #DkmsDataModel-Model ) | [ DkmsCollector] ( #Collector-Class-DkmsCollector ) | [ DkmsAnalyzer] ( #Data-Analyzer-Class-DkmsAnalyzer ) |
1313| DmesgPlugin | dmesg --time-format iso -x<br>ls -1 /var/log/dmesg* 2>/dev/null \| grep -E '^/var/log/dmesg(\.[0-9]+(\.gz)?)?$' \|\| true | **Built-in Regexes:**<br>- Out of memory error: `(?:oom_kill_process.*)\|(?:Out of memory.*)`<br>- I/O Page Fault: `IO_PAGE_FAULT`<br>- Kernel Panic: `\bkernel panic\b.*`<br>- SQ Interrupt: `sq_intr`<br>- SRAM ECC: `sram_ecc.*`<br>- Failed to load driver. IP hardware init error.: `\[amdgpu\]\] \*ERROR\* hw_init of IP block.*`<br>- Failed to load driver. IP software init error.: `\[amdgpu\]\] \*ERROR\* sw_init of IP block.*`<br>- Real Time throttling activated: `sched: RT throttling activated.*`<br>- RCU preempt detected stalls: `rcu_preempt detected stalls.*`<br>- RCU preempt self-detected stall: `rcu_preempt self-detected stall.*`<br>- QCM fence timeout: `qcm fence wait loop timeout.*`<br>- General protection fault: `(?:[\w-]+(?:\[[0-9.]+\])?\s+)?general protectio...`<br>- Segmentation fault: `(?:segfault.*in .*\[)\|(?:[Ss]egmentation [Ff]au...`<br>- Failed to disallow cf state: `amdgpu: Failed to disallow cf state.*`<br>- Failed to terminate tmr: `\*ERROR\* Failed to terminate tmr.*`<br>- Suspend of IP block failed: `\*ERROR\* suspend of IP block <\w+> failed.*`<br>- amdgpu Page Fault: `(amdgpu \w{4}:\w{2}:\w{2}\.\w:\s+amdgpu:\s+\[\S...`<br>- Page Fault: `page fault for address.*`<br>- Fatal error during GPU init: `(?:amdgpu)(.*Fatal error during GPU init)\|(Fata...`<br>- PCIe AER Error Status: `(pcieport [\w:.]+: AER: aer_status:[^\n]*(?:\n[...`<br>- PCIe AER Correctable Error Status: `(.*aer_cor_status: 0x[0-9a-fA-F]+, aer_cor_mask...`<br>- PCIe AER Uncorrectable Error Status: `(.*aer_uncor_status: 0x[0-9a-fA-F]+, aer_uncor_...`<br>- PCIe AER Uncorrectable Error Severity with TLP Header: `(.*aer_uncor_severity: 0x[0-9a-fA-F]+.*)(\n.*TL...`<br>- Failed to read journal file: `Failed to read journal file.*`<br>- Journal file corrupted or uncleanly shut down: `journal corrupted or uncleanly shut down.*`<br>- ACPI BIOS Error: `ACPI BIOS Error`<br>- ACPI Error: `ACPI Error`<br>- Filesystem corrupted!: `EXT4-fs error \(device .*\):`<br>- Error in buffered IO, check filesystem integrity: `(Buffer I\/O error on dev)(?:ice)? (\w+)`<br>- PCIe card no longer present: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(Slot\(...`<br>- PCIe Link Down: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(Slot\(...`<br>- Mismatched clock configuration between PCIe device and host: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(curren...`<br>- RAS Correctable Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Uncorrectable Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Deferred Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Corrected PCIe Error: `((?:\[Hardware Error\]:\s+)?event severity: cor...`<br>- GPU Reset: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- GPU reset failed: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- ACA Error: `(Accelerator Check Architecture[^\n]*)(?:\n[^\n...`<br>- ACA Error: `(Accelerator Check Architecture[^\n]*)(?:\n[^\n...`<br>- MCE Error: `\[Hardware Error\]:.+MC\d+_STATUS.*(?:\n.*){0,5}`<br>- Mode 2 Reset Failed: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)? (...`<br>- RAS Corrected Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- SGX Error: `x86/cpu: SGX disabled by BIOS`<br>- MMP Error: `Failed to load MMP firmware qat_4xxx_mmp.bin`<br>- GPU Throttled: `amdgpu \w{4}:\w{2}:\w{2}.\w: amdgpu: WARN: GPU ...`<br>- RAS Poison Consumed: `amdgpu[ 0-9a-fA-F:.]+:(?:\s*amdgpu:)?\s+(?:{\d+...`<br>- RAS Poison created: `amdgpu[ 0-9a-fA-F:.]+:(?:\s*amdgpu:)?\s+(?:{\d+...`<br>- Bad page threshold exceeded: `(amdgpu: Saved bad pages (\d+) reaches threshol...`<br>- RAS Hardware Error: `Hardware error from APEI Generic Hardware Error...`<br>- Error Address: `Error Address.*(?:\s.*)`<br>- RAS EDR Event: `EDR: EDR event received`<br>- DPC Event: `DPC: .*`<br>- LNet: ko2iblnd has no matching interfaces: `(?:\[[^\]]+\]\s*)?LNetError:.*ko2iblnd:\s*No ma...`<br>- LNet: Error starting up LNI: `(?:\[[^\]]+\]\s*)?LNetError:\s*.*Error\s*-?\d+\...`<br>- Lustre: network initialisation failed: `LustreError:.*ptlrpc_init_portals\(\).*network ...` | **Collection Args:**<br>- `collect_rotated_logs`: bool<br>- `skip_sudo`: bool<br>- `log_dmesg_data`: bool | [DmesgData](#DmesgData-Model) | [DmesgCollector](#Collector-Class-DmesgCollector) | [DmesgAnalyzer](#Data-Analyzer-Class-DmesgAnalyzer) |
14- | FabricsPlugin | ibstat<br >ibv_devinfo<br >ls -l /sys/class/infiniband/* /device/net<br >mst start<br >mst status -v<br >ofed_info -s | - | - | [ FabricsDataModel] ( #FabricsDataModel-Model ) | [ FabricsCollector] ( #Collector-Class-FabricsCollector ) | - |
14+ | FabricsPlugin | lspci \| grep -i cassini< br >lsmod \| grep cxi< br >cxi_stat< br > ibstat<br >ibv_devinfo<br >ls -l /sys/class/infiniband/* /device/net<br >fi_info -p cxi< br > mst start<br >mst status -v< br >ip link show <br >ofed_info -s | - | - | [ FabricsDataModel] ( #FabricsDataModel-Model ) | [ FabricsCollector] ( #Collector-Class-FabricsCollector ) | - |
1515| JournalPlugin | journalctl --no-pager --system --output=short-iso<br >journalctl --no-pager --system --output=json | ** Analyzer Args:** <br >- ` check_priority ` : Optional[ int] <br >- ` group ` : bool | ** Collection Args:** <br >- ` boot ` : Optional[ int] | [ JournalData] ( #JournalData-Model ) | [ JournalCollector] ( #Collector-Class-JournalCollector ) | [ JournalAnalyzer] ( #Data-Analyzer-Class-JournalAnalyzer ) |
1616| KernelPlugin | sh -c 'uname -a'<br >sh -c 'cat /proc/sys/kernel/numa_balancing'<br >wmic os get Version /Value | ** Analyzer Args:** <br >- ` exp_kernel ` : Union[ str, list] <br >- ` exp_numa ` : Optional[ int] <br >- ` regex_match ` : bool | - | [ KernelDataModel] ( #KernelDataModel-Model ) | [ KernelCollector] ( #Collector-Class-KernelCollector ) | [ KernelAnalyzer] ( #Data-Analyzer-Class-KernelAnalyzer ) |
1717| KernelModulePlugin | cat /proc/modules<br >modinfo amdgpu<br >wmic os get Version /Value | ** Analyzer Args:** <br >- ` kernel_modules ` : dict[ str, dict] <br >- ` regex_filter ` : list[ str] | - | [ KernelModuleDataModel] ( #KernelModuleDataModel-Model ) | [ KernelModuleCollector] ( #Collector-Class-KernelModuleCollector ) | [ KernelModuleAnalyzer] ( #Data-Analyzer-Class-KernelModuleAnalyzer ) |
2525| RdmaPlugin | rdma link -j<br >rdma dev<br >rdma link<br >rdma statistic -j | - | - | [ RdmaDataModel] ( #RdmaDataModel-Model ) | [ RdmaCollector] ( #Collector-Class-RdmaCollector ) | [ RdmaAnalyzer] ( #Data-Analyzer-Class-RdmaAnalyzer ) |
2626| RocmPlugin | {rocm_path}/opencl/bin/* /clinfo<br >env \| grep -Ei 'rocm\| hsa\| hip\| mpi\| openmp\| ucx\| miopen'<br >ls /sys/class/kfd/kfd/proc/<br >grep -i -E 'rocm' /etc/ld.so.conf.d/* <br >{rocm_path}/bin/rocminfo<br >ls -v -d {rocm_path}* <br >ls -v -d {rocm_path}-[ 3-7] * \| tail -1<br >ldconfig -p \| grep -i -E 'rocm'<br >grep . -r {rocm_path}/.info/* | ** Analyzer Args:** <br >- ` exp_rocm ` : Union[ str, list] <br >- ` exp_rocm_latest ` : str<br >- ` exp_rocm_sub_versions ` : dict[ str, Union[ str, list]] | ** Collection Args:** <br >- ` rocm_path ` : str | [ RocmDataModel] ( #RocmDataModel-Model ) | [ RocmCollector] ( #Collector-Class-RocmCollector ) | [ RocmAnalyzer] ( #Data-Analyzer-Class-RocmAnalyzer ) |
2727| StoragePlugin | sh -c 'df -lH -B1 \| grep -v 'boot''<br >wmic LogicalDisk Where DriveType="3" Get DeviceId,Size,FreeSpace | - | ** Collection Args:** <br >- ` skip_sudo ` : bool | [ StorageDataModel] ( #StorageDataModel-Model ) | [ StorageCollector] ( #Collector-Class-StorageCollector ) | [ StorageAnalyzer] ( #Data-Analyzer-Class-StorageAnalyzer ) |
28- | SysSettingsPlugin | cat /sys/{}<br >ls -1 /sys/{} | ** Analyzer Args:** <br >- ` checks ` : Optional[ list[ nodescraper.plugins.inband.sys_settings.analyzer_args.SysfsCheck]] | ** Collection Args:** <br >- ` paths ` : list[ str] <br >- ` directory_paths ` : list[ str] | [ SysSettingsDataModel] ( #SysSettingsDataModel-Model ) | [ SysSettingsCollector] ( #Collector-Class-SysSettingsCollector ) | [ SysSettingsAnalyzer] ( #Data-Analyzer-Class-SysSettingsAnalyzer ) |
28+ | SysSettingsPlugin | cat /sys/{}<br >ls -1 /sys/{}< br >ls -l /sys/{} | ** Analyzer Args:** <br >- ` checks ` : Optional[ list[ nodescraper.plugins.inband.sys_settings.analyzer_args.SysfsCheck]] | ** Collection Args:** <br >- ` paths ` : list[ str] <br >- ` directory_paths ` : list[ str] | [ SysSettingsDataModel] ( #SysSettingsDataModel-Model ) | [ SysSettingsCollector] ( #Collector-Class-SysSettingsCollector ) | [ SysSettingsAnalyzer] ( #Data-Analyzer-Class-SysSettingsAnalyzer ) |
2929| SysctlPlugin | sysctl -n | ** Analyzer Args:** <br >- ` exp_vm_swappiness ` : Optional[ int] <br >- ` exp_vm_numa_balancing ` : Optional[ int] <br >- ` exp_vm_oom_kill_allocating_task ` : Optional[ int] <br >- ` exp_vm_compaction_proactiveness ` : Optional[ int] <br >- ` exp_vm_compact_unevictable_allowed ` : Optional[ int] <br >- ` exp_vm_extfrag_threshold ` : Optional[ int] <br >- ` exp_vm_zone_reclaim_mode ` : Optional[ int] <br >- ` exp_vm_dirty_background_ratio ` : Optional[ int] <br >- ` exp_vm_dirty_ratio ` : Optional[ int] <br >- ` exp_vm_dirty_writeback_centisecs ` : Optional[ int] <br >- ` exp_kernel_numa_balancing ` : Optional[ int] | - | [ SysctlDataModel] ( #SysctlDataModel-Model ) | [ SysctlCollector] ( #Collector-Class-SysctlCollector ) | [ SysctlAnalyzer] ( #Data-Analyzer-Class-SysctlAnalyzer ) |
3030| SyslogPlugin | ls -1 /var/log/syslog* 2>/dev/null \| grep -E '^/var/log/syslog(\. [ 0-9] +(\. gz)?)?$' \|\| true | - | - | [ SyslogData] ( #SyslogData-Model ) | [ SyslogCollector] ( #Collector-Class-SyslogCollector ) | - |
3131| UptimePlugin | uptime | - | - | [ UptimeDataModel] ( #UptimeDataModel-Model ) | [ UptimeCollector] ( #Collector-Class-UptimeCollector ) | - |
@@ -257,18 +257,28 @@ Collect InfiniBand/RDMA fabrics configuration details
257257- ** CMD_OFED_INFO** : ` ofed_info -s `
258258- ** CMD_MST_START** : ` mst start `
259259- ** CMD_MST_STATUS** : ` mst status -v `
260+ - ** CMD_CASSINI_PCI** : ` lspci | grep -i cassini `
261+ - ** CMD_NET_LINK** : ` ip link show `
262+ - ** CMD_LIBFABRIC_INFO** : ` fi_info -p cxi `
263+ - ** CMD_CXI_STAT** : ` cxi_stat `
264+ - ** CMD_CXI_MODULES** : ` lsmod | grep cxi `
260265
261266### Provides Data
262267
263268FabricsDataModel
264269
265270### Commands
266271
272+ - lspci | grep -i cassini
273+ - lsmod | grep cxi
274+ - cxi_stat
267275- ibstat
268276- ibv_devinfo
269277- ls -l /sys/class/infiniband/* /device/net
278+ - fi_info -p cxi
270279- mst start
271280- mst status -v
281+ - ip link show
272282- ofed_info -s
273283
274284## Collector Class JournalCollector
@@ -729,6 +739,7 @@ Collect sysfs settings from user-specified paths.
729739- ** SUPPORTED_OS_FAMILY** : ` {<OSFamily.LINUX: 3>} `
730740- ** CMD** : ` cat /sys/{} `
731741- ** CMD_LS** : ` ls -1 /sys/{} `
742+ - ** CMD_LS_LONG** : ` ls -l /sys/{} `
732743
733744### Provides Data
734745
@@ -738,6 +749,7 @@ SysSettingsDataModel
738749
739750- cat /sys/{}
740751- ls -1 /sys/{}
752+ - ls -l /sys/{}
741753
742754## Collector Class SysctlCollector
743755
@@ -928,6 +940,7 @@ Complete InfiniBand/RDMA fabrics configuration data
928940- ** ibdev_netdev_mappings** : ` List[nodescraper.plugins.inband.fabrics.fabricsdata.IbdevNetdevMapping] `
929941- ** ofed_info** : ` Optional[nodescraper.plugins.inband.fabrics.fabricsdata.OfedInfo] `
930942- ** mst_status** : ` Optional[nodescraper.plugins.inband.fabrics.fabricsdata.MstStatus] `
943+ - ** slingshot_data** : ` Optional[nodescraper.plugins.inband.fabrics.fabricsdata.SlingshotData] `
931944
932945## JournalData Model
933946
0 commit comments