Ruler segfault

**Describe the bug**
Observed panics due to segmentation faults in the ruler.


**To Reproduce**
Steps to reproduce the behavior:
Run Cortex 1.10.0 & run ruler

**Expected behavior**
Ruler should not panic

**Environment:**
 - Infrastructure: kubernetes - AKS
 - Deployment tool: customized yaml manifests

**Storage Engine**
- [x] Blocks
- [ ] Chunks

**Additional Context**

We are seeing consistent panics from the ruler, with errors like
```
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1e20df3]

goroutine 14595 [running]:
github.com/cortexproject/cortex/pkg/querier.querier.Select(0x2bdf130, 0xc0047b2820, 0xc0046fc440, 0x2, 0x2, 0x28908e0, 0x2bdeaa0, 0xc00206be60, 0x17ba6ec5d74, 0x17ba7234bf4, ...)
	/__w/cortex/cortex/pkg/querier/querier.go:323 +0x193
github.com/cortexproject/cortex/pkg/querier/lazyquery.LazyQuerier.Select.func1(0xc0020e13e0, 0x2be0da0, 0xc000157c00, 0xc004122900, 0x0, 0xc004122900, 0xa, 0x10)
	/__w/cortex/cortex/pkg/querier/lazyquery/lazyquery.go:52 +0x72
created by github.com/cortexproject/cortex/pkg/querier/lazyquery.LazyQuerier.Select
	/__w/cortex/cortex/pkg/querier/lazyquery/lazyquery.go:51 +0xad
```

The below is the configuration diff from the defaults, as emitted from the ruler.
Note that I also tried with `blocks_storage.bucket_store.index_header_lazy_loading_enabled: false` and experienced the same error.

```
alertmanager:
  enable_api: true
  external_url: https://alertmanager.cluster-monitor.*******.com/alertmanager
  sharding_enabled: true
  sharding_ring:
    kvstore:
      etcd:
        endpoints:
        - client.etcd.svc.cluster.local:2379
      prefix: cortex-alertmanagers/
      store: etcd
alertmanager_storage:
  s3:
    access_key_id: ******
    bucket_name: cortex-alertmanager
    endpoint: s3.storage.svc.cluster.local:9000
    insecure: true
    secret_access_key: '********'
api:
  response_compression_enabled: true
blocks_storage:
  bucket_store:
    bucket_index:
      enabled: true
    chunks_cache:
      backend: memcached
      memcached:
        addresses: dnssrv+_memcached._tcp.chunks-cache.cluster-monitor-cortex.svc.cluster.local
    index_cache:
      backend: memcached
      memcached:
        addresses: dnssrv+_memcached._tcp.index-cache.cluster-monitor-cortex.svc.cluster.local
    index_header_lazy_loading_enabled: true
    metadata_cache:
      backend: memcached
      bucket_index_content_ttl: 2m0s
      memcached:
        addresses: dnssrv+_memcached._tcp.metadata-cache.cluster-monitor-cortex.svc.cluster.local
      metafile_doesnt_exist_ttl: 2m0s
      tenant_blocks_list_ttl: 2m0s
    sync_interval: 5m0s
  s3:
    access_key_id: *****
    bucket_name: cortex
    endpoint: s3.storage.svc.cluster.local:9000
    insecure: true
    secret_access_key: '********'
  tsdb:
    close_idle_tsdb_timeout: 15m0s
    dir: /var/cortex/tsdb
    max_exemplars: 1000
compactor:
  block_deletion_marks_migration_enabled: false
  cleanup_interval: 5m0s
distributor:
  ha_tracker:
    enable_ha_tracker: true
    kvstore:
      etcd:
        endpoints:
        - client.etcd.svc.cluster.local:2379
      prefix: cortex-ha-tracker/
      store: etcd
  ring:
    kvstore:
      etcd:
        endpoints:
        - client.etcd.svc.cluster.local:2379
      prefix: cortex-collectors/
      store: etcd
  shard_by_all_labels: true
frontend:
  grpc_client_config:
    grpc_compression: snappy
  log_queries_longer_than: 1s
  query_stats_enabled: true
frontend_worker:
  frontend_address: query-frontend.cluster-monitor-cortex.svc.cluster.local:9095
  grpc_client_config:
    grpc_compression: snappy
    max_send_msg_size: 33554432
ingester:
  lifecycler:
    availability_zone: westeurope-2
    observe_period: 3s
    ring:
      kvstore:
        etcd:
          endpoints:
          - client.etcd.svc.cluster.local:2379
        prefix: cortex-collectors/
        store: etcd
  walconfig:
    wal_enabled: true
ingester_client:
  grpc_client_config:
    grpc_compression: snappy
limits:
  accept_ha_samples: true
  ingestion_burst_size: 75000
  ingestion_rate: 55000
  max_series_per_metric: 70000
querier:
  at_modifier_enabled: true
  query_store_for_labels_enabled: true
query_range:
  align_queries_with_step: true
  cache_results: true
  results_cache:
    cache:
      memcached:
        expiration: 12h0m0s
      memcached_client:
        addresses: dnssrv+_memcached._tcp.index-cache.cluster-monitor-cortex.svc.cluster.local
  split_queries_by_interval: 24h0m0s
ruler:
  alertmanager_url: http://alertmanager.cluster-monitor-cortex.svc.cluster.local:3100/alertmanager
  enable_api: true
  enable_sharding: true
  external_url: https://alertmanager.cluster-monitor.******.com
  ring:
    kvstore:
      etcd:
        endpoints:
        - client.etcd.svc.cluster.local:2379
      prefix: cortex-rulers/
      store: etcd
  ruler_client:
    grpc_compression: snappy
ruler_storage:
  s3:
    access_key_id: ********
    bucket_name: cortex-ruler
    endpoint: s3.storage.svc.cluster.local:9000
    insecure: true
    secret_access_key: '********'
server:
  http_listen_port: 3100
  log_level: debug
storage:
  engine: blocks
store_gateway:
  sharding_enabled: true
  sharding_ring:
    kvstore:
      etcd:
        endpoints:
        - client.etcd.svc.cluster.local:2379
      prefix: cortex-collectors/
      store: etcd
    zone_awareness_enabled: true
target: ruler
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ruler segfault #4459

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ruler segfault #4459

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions