换下风格^_^

记一次Hive Multi Insert 引起的GC overhead limit exceeded

Hive admin 1323℃ 0评论

当你有这么个需求从某张hive表里做各类统计,完了之后落到各个分类的统计表里存储。自然而然我们会想到使用hive的Multi Insert 语句来实现。因为使用Multi Insert 语句可以避免多次扫描同一份原始表数据。本文记录一次使用Multi Insert 语句出现的GC overhead limit exceeded问题。

问题描述

我有这么个需求从某个域名相关的表里统计各个维度的数据落到相应的表里面。下面是我的SQL实例代码:

FROM qbox_bi_gold.domain_info INPUT
             INSERT OVERWRITE TABLE 5min PARTITION (day="20151130")
                  SELECT cast(time/300000 as bigint)*300000 AS time  , SUM(flow) AS flow,SUM(hits) AS hits
                  WHERE day="20151130"
                  GROUP BY cast(time/300000 as bigint)*300000
             INSERT OVERWRITE TABLE prov_5min PARTITION (day="20151130")
                   SELECT cast(time/300000 as bigint)*300000 AS time ,prov, SUM(flow) AS flow,SUM(hits) AS hits
                   WHERE day="20151130"
                   GROUP BY cast(time/300000 as bigint)*300000 ,prov
             INSERT OVERWRITE TABLE prov_uid_5min PARTITION (day="20151130")
                    SELECT cast(time/300000 as bigint)*300000 AS time ,prov,uid, SUM(flow) AS flow,SUM(hits) AS hits
                    WHERE day="20151130"
                    GROUP BY cast(time/300000 as bigint)*300000 ,prov,uid
              INSERT OVERWRITE TABLE bucket_prov_uid_5min PARTITION (day="20151130")
                    SELECT cast(time/300000 as bigint)*300000 AS time ,bucket,prov,uid, SUM(flow) AS flow,SUM(hits) AS hits
                    WHERE day="20151130"
                    GROUP BY cast(time/300000 as bigint)*300000 ,bucket,prov,uid
              INSERT OVERWRITE TABLE bucket_domain_prov_uid_5min PARTITION (day="20151130")
                    SELECT cast(time/300000 as bigint)*300000 AS time ,bucket,domain,prov,uid, SUM(flow) AS flow,SUM(hits) AS hits
                    WHERE day="20151130"
                    GROUP BY cast(time/300000 as bigint)*300000 ,bucket,domain,prov,uid
              INSERT OVERWRITE TABLE bucket_city_domain_prov_uid_5min PARTITION (day="20151130")
                    SELECT cast(time/300000 as bigint)*300000 AS time ,bucket,city,domain,prov,uid, SUM(flow) AS flow,SUM(hits) AS hits
                    WHERE day="20151130"
                    GROUP BY cast(time/300000 as bigint)*300000 ,bucket,city,domain,prov,uid

上述语句会产生6个Job,你可以使用explain hsql来查看执行解析流程:

STAGE DEPENDENCIES:
  Stage-6 is a root stage
  Stage-0 depends on stages: Stage-6
  Stage-7 depends on stages: Stage-0
  Stage-8 depends on stages: Stage-6
  Stage-1 depends on stages: Stage-8
  Stage-9 depends on stages: Stage-1
  Stage-10 depends on stages: Stage-6
  Stage-2 depends on stages: Stage-10
  Stage-11 depends on stages: Stage-2
  Stage-12 depends on stages: Stage-6
  Stage-3 depends on stages: Stage-12
  Stage-13 depends on stages: Stage-3
  Stage-14 depends on stages: Stage-6
  Stage-4 depends on stages: Stage-14
  Stage-15 depends on stages: Stage-4
  Stage-16 depends on stages: Stage-6
  Stage-5 depends on stages: Stage-16
  Stage-17 depends on stages: Stage-5

STAGE PLANS:
  Stage: Stage-6
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: input
            Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: time (type: bigint), flow (type: bigint), hits (type: bigint)
              outputColumnNames: time, flow, hits
              Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                aggregations: sum(flow), sum(hits)
                keys: (UDFToLong((time / 300000)) * 300000) (type: bigint)
                mode: hash
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: bigint)
                  sort order: +
                  Map-reduce partition columns: _col0 (type: bigint)
                  Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col1 (type: bigint), _col2 (type: bigint)
            Select Operator
              expressions: time (type: bigint), prov (type: string), flow (type: bigint), hits (type: bigint)
              outputColumnNames: time, prov, flow, hits
              Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                aggregations: sum(flow), sum(hits)
                keys: (UDFToLong((time / 300000)) * 300000) (type: bigint), prov (type: string)
                mode: hash
                outputColumnNames: _col0, _col1, _col2, _col3
                Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  table:
                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
            Select Operator
              expressions: time (type: bigint), prov (type: string), uid (type: int), flow (type: bigint), hits (type: bigint)
              outputColumnNames: time, prov, uid, flow, hits
              Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                aggregations: sum(flow), sum(hits)
                keys: (UDFToLong((time / 300000)) * 300000) (type: bigint), prov (type: string), uid (type: int)
                mode: hash
                outputColumnNames: _col0, _col1, _col2, _col3, _col4
                Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  table:
                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
            Select Operator
              expressions: time (type: bigint), bucket (type: string), prov (type: string), uid (type: int), flow (type: bigint), hits (type: bigint)
              outputColumnNames: time, bucket, prov, uid, flow, hits
              Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                aggregations: sum(flow), sum(hits)
                keys: (UDFToLong((time / 300000)) * 300000) (type: bigint), bucket (type: string), prov (type: string), uid (type: int)
                mode: hash
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
                Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  table:
                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
            Select Operator
              expressions: time (type: bigint), bucket (type: string), domain (type: string), prov (type: string), uid (type: int), flow (type: bigint), hits (type: bigint)
              outputColumnNames: time, bucket, domain, prov, uid, flow, hits
              Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                aggregations: sum(flow), sum(hits)
                keys: (UDFToLong((time / 300000)) * 300000) (type: bigint), bucket (type: string), domain (type: string), prov (type: string), uid (type: int)
                mode: hash
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6
                Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  table:
                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
            Select Operator
              expressions: time (type: bigint), bucket (type: string), city (type: string), domain (type: string), prov (type: string), uid (type: int), flow (type: bigint), hits (type: bigint)
              outputColumnNames: time, bucket, city, domain, prov, uid, flow, hits
              Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                aggregations: sum(flow), sum(hits)
                keys: (UDFToLong((time / 300000)) * 300000) (type: bigint), bucket (type: string), city (type: string), domain (type: string), prov (type: string), uid (type: int)
                mode: hash
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
                Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  table:
                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
      Reduce Operator Tree:
        Group By Operator
          aggregations: sum(VALUE._col0), sum(VALUE._col1)
          keys: KEY._col0 (type: bigint)
          mode: mergepartial
          outputColumnNames: _col0, _col1, _col2
          Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col0 (type: bigint), _col1 (type: bigint), _col2 (type: bigint)
            outputColumnNames: _col0, _col1, _col2
            Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                  name: domain_areav1.5min

  Stage: Stage-0
    Move Operator
      tables:
          partition:
            day 20151130
          replace: true
          table:
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: domain_areav1.5min

  Stage: Stage-7
    Stats-Aggr Operator

  Stage: Stage-8
    Map Reduce
      Map Operator Tree:
          TableScan
            Reduce Output Operator
              key expressions: _col0 (type: bigint), _col1 (type: string)
              sort order: ++
              Map-reduce partition columns: _col0 (type: bigint), _col1 (type: string)
              Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE
              value expressions: _col2 (type: bigint), _col3 (type: bigint)
      Reduce Operator Tree:
        Group By Operator
          aggregations: sum(VALUE._col0), sum(VALUE._col1)
          keys: KEY._col0 (type: bigint), KEY._col1 (type: string)
          mode: mergepartial
          outputColumnNames: _col0, _col1, _col2, _col3
          Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: bigint), _col3 (type: bigint)
            outputColumnNames: _col0, _col1, _col2, _col3
            Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                  name: domain_areav1.prov_5min

  Stage: Stage-1
    Move Operator
      tables:
          partition:
            day 20151130
          replace: true
          table:
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: domain_areav1.prov_5min

  Stage: Stage-9
    Stats-Aggr Operator

  Stage: Stage-10
    Map Reduce
      Map Operator Tree:
          TableScan
            Reduce Output Operator
              key expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: int)
              sort order: +++
              Map-reduce partition columns: _col0 (type: bigint), _col1 (type: string), _col2 (type: int)
              Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE
              value expressions: _col3 (type: bigint), _col4 (type: bigint)
      Reduce Operator Tree:
        Group By Operator
          aggregations: sum(VALUE._col0), sum(VALUE._col1)
          keys: KEY._col0 (type: bigint), KEY._col1 (type: string), KEY._col2 (type: int)
          mode: mergepartial
          outputColumnNames: _col0, _col1, _col2, _col3, _col4
          Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: int), _col3 (type: bigint), _col4 (type: bigint)
            outputColumnNames: _col0, _col1, _col2, _col3, _col4
            Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                  name: domain_areav1.prov_uid_5min

  Stage: Stage-2
    Move Operator
      tables:
          partition:
            day 20151130
          replace: true
          table:
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: domain_areav1.prov_uid_5min

  Stage: Stage-11
    Stats-Aggr Operator

  Stage: Stage-12
    Map Reduce
      Map Operator Tree:
          TableScan
            Reduce Output Operator
              key expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: int)
              sort order: ++++
              Map-reduce partition columns: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: int)
              Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE
              value expressions: _col4 (type: bigint), _col5 (type: bigint)
      Reduce Operator Tree:
        Group By Operator
          aggregations: sum(VALUE._col0), sum(VALUE._col1)
          keys: KEY._col0 (type: bigint), KEY._col1 (type: string), KEY._col2 (type: string), KEY._col3 (type: int)
          mode: mergepartial
          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
          Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: int), _col4 (type: bigint), _col5 (type: bigint)
            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
            Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                  name: domain_areav1.bucket_prov_uid_5min

  Stage: Stage-3
    Move Operator
      tables:
          partition:
            day 20151130
          replace: true
          table:
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: domain_areav1.bucket_prov_uid_5min

  Stage: Stage-13
    Stats-Aggr Operator

  Stage: Stage-14
    Map Reduce
      Map Operator Tree:
          TableScan
            Reduce Output Operator
              key expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: int)
              sort order: +++++
              Map-reduce partition columns: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: int)
              Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE
              value expressions: _col5 (type: bigint), _col6 (type: bigint)
      Reduce Operator Tree:
        Group By Operator
          aggregations: sum(VALUE._col0), sum(VALUE._col1)
          keys: KEY._col0 (type: bigint), KEY._col1 (type: string), KEY._col2 (type: string), KEY._col3 (type: string), KEY._col4 (type: int)
          mode: mergepartial
          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6
          Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: int), _col5 (type: bigint), _col6 (type: bigint)
            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6
            Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                  name: domain_areav1.bucket_domain_prov_uid_5min

  Stage: Stage-4
    Move Operator
      tables:
          partition:
            day 20151130
          replace: true
          table:
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: domain_areav1.bucket_domain_prov_uid_5min

  Stage: Stage-15
    Stats-Aggr Operator

  Stage: Stage-16
    Map Reduce
      Map Operator Tree:
          TableScan
            Reduce Output Operator
              key expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: int)
              sort order: ++++++
              Map-reduce partition columns: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: int)
              Statistics: Num rows: 248368802 Data size: 106301849426 Basic stats: COMPLETE Column stats: NONE
              value expressions: _col6 (type: bigint), _col7 (type: bigint)
      Reduce Operator Tree:
        Group By Operator
          aggregations: sum(VALUE._col0), sum(VALUE._col1)
          keys: KEY._col0 (type: bigint), KEY._col1 (type: string), KEY._col2 (type: string), KEY._col3 (type: string), KEY._col4 (type: string), KEY._col5 (type: int)
          mode: mergepartial
          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
          Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: int), _col6 (type: bigint), _col7 (type: bigint)
            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
            Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              Statistics: Num rows: 124184401 Data size: 53150924713 Basic stats: COMPLETE Column stats: NONE
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                  name: domain_areav1.bucket_city_domain_prov_uid_5min

  Stage: Stage-5
    Move Operator
      tables:
          partition:
            day 20151130
          replace: true
          table:
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: domain_areav1.bucket_city_domain_prov_uid_5min

  Stage: Stage-17
    Stats-Aggr Operator

从上面可以看到Stage-6 is a root stage。Stage-6是第一个需要完成的job,然而问题就出现在这里。GC overhead limit exceeded !!!
从失败的jobhistory里可以看到失败是发生在map阶段。

...
map = 99%,  reduce = 33%, Cumulative CPU 9676.12 sec
map = 100%,  reduce = 100%, Cumulative CPU 9686.12 sec

也就是发现在map阶段。我们先看看错误堆栈:

2015-12-01 18:21:02,424 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
	at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
	at sun.nio.cs.StreamDecoder.(StreamDecoder.java:250)
	at sun.nio.cs.StreamDecoder.(StreamDecoder.java:230)
	at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:69)
	at java.io.InputStreamReader.(InputStreamReader.java:74)
	at java.io.FileReader.(FileReader.java:72)
	at org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:381)
	at org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.updateProcessTree(ProcfsBasedProcessTree.java:162)
	at org.apache.hadoop.mapred.Task.updateResourceCounters(Task.java:839)
	at org.apache.hadoop.mapred.Task.updateCounters(Task.java:978)
	at org.apache.hadoop.mapred.Task.access$500(Task.java:77)
	at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:727)
	at java.lang.Thread.run(Thread.java:745)

map阶段OutOfMemoryError: GC overhead limit exceeded。

问题分析

OMM的通用原因大家都知道。加内存嘛!呵呵,咱不是土豪,而且OMM的原因加内存不一定能解决,还是找找内因。那么解决OMM该怎么做呢?首先我们得清楚OMM的原因的可能。1. 内存确实不够程序使用。2. 程序存在内存泄露或者程序的不够高效。作为立志成为资深程序猿的人应该从第二个入手。好,我们先分析分析:

hive程序运行环境

系统46台 Ubuntu12.04, 8核心,32G Mem。Hadoop版本2.2.0 ,hive 0.12。数据100G+ Text。使用的队列最大大概是总体的40%。上述hive程序启动map数380左右,reduce数120左右。按理说这样的数量应该不算大。但问题是它确实OMM了。应为使用的时hive程序,不是自己写的。应该不大可能存在内存泄露的代码。那么应该是hive sql不合理,首先想到的是Multi Insert的效率问题。测试:分别跑单个Insert语句,即删掉一些Insert语句。
实例代码如下:

FROM qbox_bi_gold.domain_info INPUT
                      INSERT OVERWRITE TABLE 5min PARTITION (day="20151130")
                            SELECT cast(time/300000 as bigint)*300000 AS time  , SUM(flow) AS flow,SUM(hits) AS hits
                            WHERE day="20151130"
                            GROUP BY cast(time/300000 as bigint)*300000

FROM qbox_bi_gold.domain_info INPUT
           INSERT OVERWRITE TABLE prov_5min PARTITION (day="20151130")
                 SELECT cast(time/300000 as bigint)*300000 AS time ,prov, SUM(flow) AS flow,SUM(hits) AS hits
                 WHERE day="20151130"
                 GROUP BY cast(time/300000 as bigint)*300000 ,prov

FROM qbox_bi_gold.domain_info INPUT
         INSERT OVERWRITE TABLE prov_uid_5min PARTITION (day="20151130")
                            SELECT cast(time/300000 as bigint)*300000 AS time ,prov,uid, SUM(flow) AS flow,SUM(hits) AS hits
                            WHERE day="20151130"
                            GROUP BY cast(time/300000 as bigint)*300000 ,prov,uid

FROM qbox_bi_gold.domain_info INPUT
          INSERT OVERWRITE TABLE bucket_prov_uid_5min PARTITION (day="20151130")
                            SELECT cast(time/300000 as bigint)*300000 AS time ,bucket,prov,uid, SUM(flow) AS flow,SUM(hits) AS hits
                            WHERE day="20151130"
                            GROUP BY cast(time/300000 as bigint)*300000 ,bucket,prov,uid

FROM qbox_bi_gold.domain_info INPUT
          INSERT OVERWRITE TABLE bucket_domain_prov_uid_5min PARTITION (day="20151130")
                            SELECT cast(time/300000 as bigint)*300000 AS time ,bucket,domain,prov,uid, SUM(flow) AS flow,SUM(hits) AS hits
                            WHERE day="20151130"
                            GROUP BY cast(time/300000 as bigint)*300000 ,bucket,domain,prov,uid
FROM qbox_bi_gold.domain_info INPUT
          INSERT OVERWRITE TABLE bucket_city_domain_prov_uid_5min PARTITION (day="20151130")
                            SELECT cast(time/300000 as bigint)*300000 AS time ,bucket,city,domain,prov,uid, SUM(flow) AS flow,SUM(hits) AS hits
                            WHERE day="20151130"
                            GROUP BY cast(time/300000 as bigint)*300000 ,bucket,city,domain,prov,uid

结果都是能够跑出来的。也就是说Multi Insert是比较耗费内存导致OMM,并不是sql程序的问题。那么最大的原因是我们给程序(mapreduce)的内存过小。那么先看下我们到底配置了多大的内存。在hive cli里执行下面命令:

hive> set mapreduce.map.java.opts;
mapreduce.map.java.opts=-Xmx1500m

hive> set mapreduce.reduce.java.opts;
mapreduce.reduce.java.opts=-Xmx2048m

hive> set mapreduce.map.memory.mb;
mapreduce.map.memory.mb=2048

hive> set mapreduce.reduce.memory.mb;
mapreduce.reduce.memory.mb=3072

我们的程序问题出现在map阶段OMM,所以应该是map的内存设置小了(mapreduce.map.java.opts=1.5g)。也是设置大点,但是不能操作map允许的最大值 mapreduce.map.memory.mb(这里为2g)。

总结:

对于内存问题导致的OMM我们需要从两点入手:

  • 程序是否有内存泄露
  • 内存是否确实设置过小

对于第一个首先排查程序问题。在上面案例中我们使用了Multi Insert导致内存不够Gc。这里你就会问了什么是GC overhead limit exceeded 而不是Java heap space?

GC overhead limit exceeded的解释:
一、异常描述:
Exception in thread “main” java.lang.OutOfMemoryError: GC overhead limit exceeded

二、解释:
JDK6新增错误类型。当GC为释放很小空间占用大量时间时抛出。 一般是因为堆太小。
导致异常的原因:没有足够的内存。

三、解决方案:
1、查看系统是否有使用大内存的代码或死循环。
2、可以添加JVM的启动参数来限制使用内存:-XX:-UseGCOverheadLimit

所以对于本案例来说我的优化如下:

set mapreduce.map.java.opts=-Xmx1800m -XX:-UseGCOverheadLimit

本文不是讨论JVM的,所以就不深表。对于Multi Insert 的优化,如果是insert 同一个表我们也可以使用union all来代替。不过本案需要insert的时多种表就不适合了。关于union all与multi insert更多细节请看《合理使用union all与multi insert (Hive 优化)

完!

参考

深入理解OutOfMemoryError
合理使用union all与multi insert (Hive 优化)
Hadoop Yarn memory settings in HDInsight
7 Tips for Improving MapReduce Performance

转载请注明:极豆技术博客 » 记一次Hive Multi Insert 引起的GC overhead limit exceeded

喜欢 (0)
捐助本站极豆博客全站无广告。如果您觉得本博客的内容对您小有帮助,可以对我小额赞助,您的赞助将用于维持博客运营。

极豆博客

发表我的评论
取消评论
表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址