elasticsearch报错index read-only

背景

线上服务器的Elasticsearch服务大量报错,查询数据没问题,但是新增或者修改数据时,返回如下错误:

1
2
3
4
5
6
7
8
9
10
11
12
13
{
"error": {
"root_cause": [
{
"type": "cluster_block_exception",
"reason": "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"
}
],
"type": "cluster_block_exception",
"reason": "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"
},
"status": 403
}

原因

由于磁盘空间不足,导致Elasticsearch触发磁盘保护,强制将所有索引设置成了只读状态,相关参数见官方文档:Disk-based Shard Allocation

主要相关参数如下:

  • cluster.routing.allocation.disk.threshold_enabled:是否启动根据磁盘空间自动分配,默认为true
  • cluster.routing.allocation.disk.watermark.low:控制磁盘使用率的低水位,默认是85%,当一个节点的磁盘空间使用率超过85%,就不会给该节点分配新的shard
  • cluster.routing.allocation.disk.watermark.high:控制磁盘使用率的高水位,默认是90%,当一个节点的磁盘空间使用率超过90%,就会将该节点的部分shard转移到其他节点上
  • cluster.routing.allocation.disk.watermark.flood_stage:洪水水位线,默认为95%,当一个节点的磁盘空间使用率超过95%,就会把所有索引设为只读。这是最后一个保护措施,索引的只读状态必须通过人工手动解除
  • cluster.info.update.interval:检查磁盘使用率的频率,默认30s

本地模拟磁盘空间不足,当磁盘空间占用超过85%时,Elasticsearch输出如下日志:

1
2
[2020-05-16T02:22:38,049][INFO ][o.e.c.r.a.DiskThresholdMonitor] [DYUbwoP] low disk watermark [85%] exceeded on [DYUbwoPHSAWB08omk4MmEA][DYUbwoP][/opt/elasticsearch-6.8.0/data/nodes/0] 
free: 929.3mb[14.6%], replicas will not be assigned to this node

当磁盘空间占用超过90%时,Elasticsearch输出如下日志:

1
2
[2020-05-16T02:26:38,124][WARN ][o.e.c.r.a.DiskThresholdMonitor] [DYUbwoP] high disk watermark [90%] exceeded on [DYUbwoPHSAWB08omk4MmEA][DYUbwoP][/opt/elasticsearch-6.8.0/data/nodes/0] 
free: 599.9mb[9.4%], shards will be relocated away from this node

当磁盘空间占用超过95%时,Elasticsearch输出如下日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
[2020-05-16T02:27:38,155][WARN ][o.e.c.r.a.DiskThresholdMonitor] [DYUbwoP] flood stage disk watermark [95%] exceeded on [DYUbwoPHSAWB08omk4MmEA][DYUbwoP][/opt/elasticsearch-6.8.0/data/nodes/0] 
free: 316.1mb[4.9%], all indices on this node will be marked read-only
[2020-05-16T02:29:11,336][WARN ][o.e.x.m.e.l.LocalExporter] [DYUbwoP] unexpected error while indexing monitoring document
org.elasticsearch.xpack.monitoring.exporter.ExportException: ClusterBlockException[blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];]
at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.lambda$throwExportException$2(LocalBulk.java:125) ~[?:?]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_161]
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) ~[?:1.8.0_161]
at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) ~[?:1.8.0_161]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[?:1.8.0_161]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[?:1.8.0_161]
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) ~[?:1.8.0_161]
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) ~[?:1.8.0_161]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_161]
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) ~[?:1.8.0_161]
at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.throwExportException(LocalBulk.java:126) ~[?:?]
at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.lambda$doFlush$0(LocalBulk.java:108) ~[?:?]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:85) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:81) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.bulk.TransportBulkAction$BulkRequestModifier.lambda$wrapActionListenerIfNeeded$0(TransportBulkAction.java:665) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$1.finishHim(TransportBulkAction.java:470) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$1.onFailure(TransportBulkAction.java:465) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.support.TransportAction$1.onFailure(TransportAction.java:91) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.finishAsFailed(TransportReplicationAction.java:945) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.doRun(TransportReplicationAction.java:785) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction.doExecute(TransportReplicationAction.java:171) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction.doExecute(TransportReplicationAction.java:99) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.apply(SecurityActionFilter.java:121) ~[?:?]
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:165) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:81) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation.doRun(TransportBulkAction.java:440) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(TransportBulkAction.java:553) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:256) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.bulk.TransportBulkAction.lambda$processBulkIndexIngestRequest$4(TransportBulkAction.java:607) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.ingest.IngestService$4.doRun(IngestService.java:411) [elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) [elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.8.0.jar:6.8.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
Caused by: org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];
at org.elasticsearch.cluster.block.ClusterBlocks.indexBlockedException(ClusterBlocks.java:208) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction.blockExceptions(TransportReplicationAction.java:254) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction.access$500(TransportReplicationAction.java:99) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.doRun(TransportReplicationAction.java:779) ~[elasticsearch-6.8.0.jar:6.8.0]
... 19 more

解决

紧急给Elasticsearch的硬盘扩容,扩容完毕后执行以下语句关闭索引的只读状态:

1
2
3
4
PUT _all/_settings
{
"index.blocks.read_only_allow_delete": null
}

如果来不及给Elasticsearch硬盘扩容,可以先关闭磁盘分配保护,让最后仅有的5%的磁盘空间缓冲一点时间,然后再给硬盘扩容。

  1. 关闭磁盘分配保护

    1
    2
    3
    4
    5
    6
    PUT _cluster/settings
    {
    "transient": {
    "cluster.routing.allocation.disk.threshold_enabled": false
    }
    }
  2. 关闭索引的只读状态

    1
    2
    3
    4
    PUT _all/_settings
    {
    "index.blocks.read_only_allow_delete": null
    }
  3. 磁盘扩容完毕后,启用磁盘分配保护

    1
    2
    3
    4
    5
    6
    PUT _cluster/settings
    {
    "transient": {
    "cluster.routing.allocation.disk.threshold_enabled": true
    }
    }
>