elasticsearch聚合统计-分组聚合

祈雨的笔记

2018-09-21

直方图聚合

GET /index/type/_search
{
  "size": 0, 
  "aggs": {
    "test_histogram": {
      "histogram": {
        "field": "field1",
        "interval": 5
      }
    }
  }
}

返回值表示，[15,20)区间内的值有1个，[20,25)区间内的值有0个，[25,30)区间内的值有1个，[30,35)区间内的值有1个。

{
  "aggregations": {
    "test_histogram": {
      "buckets": [
        {
          "key": 15,
          "doc_count": 1
        },
        {
          "key": 20,
          "doc_count": 0
        },
        {
          "key": 25,
          "doc_count": 1
        },
        {
          "key": 30,
          "doc_count": 1
        }
      ]
    }
  }
}

1、直方图筛选规则

举个例子，有一个price字段，这个字段描述了商品的价格，现在想每隔5就创建一个桶，统计每隔区间都有多少个文档（商品）。

如果有一个商品的价格为32，那么它会被放入30的桶中，计算的公式如下：

rem = value % interval
if (rem < 0) {
    rem += interval
}
bucket_key = value - rem

通过上面的方法，就可以确定文档属于哪一个桶。

不过也有一些问题存在，由于上面的方法是针对于整型数据的，因此如果字段是浮点数，那么需要先转换成整型，再调用上面的方法计算。问题来了，正数还好，如果该值是负数，就会出现计算出错。比如，一个字段的值为-4.5，在进行转换整型时，转换成了-4。那么按照上面的计算，它就会放入-4的桶中，但是其实-4.5应该放入-6的桶中。

2、extended_bounds

extended_bounds可以强制直方图聚合从指定最小值开始创建分组，直到最大值，即使没有任何文档存在。

且extended_bounds不会过滤分组，即使实际上的分组不在extended_bounds的最小值最大值区间内，直方图聚合依然以实际的最小值或最大值创建分组。

GET /index/type/_search
{
  "size": 0, 
  "aggs": {
    "test_histogram": {
      "histogram": {
        "field": "field1",
        "interval": 5,
        "extended_bounds":{
          "min": 0,
          "max": 20
        }
      }
    }
  }
}

3、排序

按直方图分组的key排序：

GET /index/type/_search
{
  "size": 0, 
  "aggs": {
    "test_histogram": {
      "histogram": {
        "field": "field1",
        "interval": 5,
        "order": {
          "_key": "asc"
        }
      }
    }
  }
}

按直方图分组的value排序：

GET /index/type/_search
{
  "size": 0, 
  "aggs": {
    "test_histogram": {
      "histogram": {
        "field": "field1",
        "interval": 5,
        "order": {
          "_count": "desc"
        }
      }
    }
  }
}

4、偏移

分组默认从0开始以interval为间隔步进，可以通过offset修改分组的开始位置。

GET /index/type/_search
{
  "size": 0, 
  "aggs": {
    "test_histogram": {
      "histogram": {
        "field": "field1",
        "interval": 5,
        "offset": 8
      }
    }
  }
}

日期直方图聚合

GET /index/type/_search
{
  "size": 0, 
  "aggs": {
    "test_date_histogram": {
      "date_histogram": {
        "field": "field1",
        "interval": "1M",
        "format": "yyyy-MM-dd"
      }
    }
  }
}

interval支持的表达式有：year、month、week、day、hour、quarter、minute、second。

日期范围聚合

GET /index/type/_search
{
  "size": 0, 
  "aggs": {
    "test_date_range": {
      "date_range": {
        "field": "field1",
        "format": "yyyy-MM-dd", 
        "ranges": [
          {
            "from": "now-10M/M",
            "to": "now"
          }
        ]
      }
    }
  }
}

范围聚合

GET /index/type/_search
{
  "size": 0, 
  "aggs": {
    "test_range": {
      "range": {
        "field": "field1",
        "ranges": [
          {
            "from": 0,
            "to": 10
          }
        ]
      }
    }
  }
}

过滤聚合

GET /index/type/_search
{
  "size": 0, 
  "aggs": {
    "test_filter": {
      "aggs": {
        "test_histogram": {
          "histogram": {
            "field": "field1",
            "interval": 10
          }
        }
      },
      "filter": {
        "range": {
          "field2": {
            "gte": 10
          }
        }
      }
    }
  }
}

多重过滤聚合

等价于批量过滤聚合。

GET /index/type/_search
{
  "size": 0, 
  "aggs": {
    "test_filters": {
      "aggs": {
        "test_histogram": {
          "histogram": {
            "field": "field1",
            "interval": 10
          }
        }
      },
      "filters": {
        "filters": {
          "test_range": {
            "range": {
              "field2": {
                "gte": 10
              }
            }
          },
          "test_range2" :{
            "range": {
              "field2": {
                "lte": 20
              }
            }
          }
        }
      }
    }
  }
}

空值集合

GET /testindex/testtype/_search
{
  "size": 0, 
  "aggs": {
    "test_missing": {
      "missing": {
        "field": "field1"
      }
    }
  }
}

索引词聚合

通过制定字段的值统计聚合。

GET /index/type/_search
{
  "size": 0, 
  "aggs": {
    "test_terms": {
      "terms": {
        "field": "field1"
      }
    }
  }
}

1、规模

通过size参数指定返回的分组数量，size设置为0表示规模大小为Integer.MAX_VALUE。

2、排序

通过order自定义分组排序方式，默认根据分组的doc_count值降序排序。

GET /testindex/testtype/_search
{
  "size": 0, 
  "aggs": {
    "test_terms": {
      "terms": {
        "field": "field1",
        "size": 10,
        "order": {
          "_count": "desc"
        }
      }
    }
  }
}

3、过滤

GET /testindex/testtype/_search
{
  "size": 0, 
  "aggs": {
    "test_terms": {
      "terms": {
        "field": "field1",
        "size": 10,
        "include": "*",
        "exclude": "water.*"
      }
    }
  }
}