【Oracle数据库指南】第03篇:Oracle SQL分组统计与排序——GROUP BY、HAVING与ORDER BY深度解析
上一篇【第02篇】Oracle SQL查询高级技巧——条件与函数下一篇【第04篇】Oracle多表查询与连接操作——JOIN的全面解析摘要本文详细讲解Oracle SQL中的分组统计功能包括分组函数COUNT、SUM、AVG、MAX、MIN等的用法、GROUP BY子句的多列分组技巧、HAVING对分组结果的过滤以及ORDER BY的高级排序技巧。通过丰富的实战案例帮助读者掌握数据汇总与报表生成的核心技能。一、引言数据分析的核心需求之一是汇总统计。无论是统计各部门员工人数、计算各地区销售额还是找出最高薪资都离不开Oracle的分组统计功能。Oracle分组统计的核心语句结构如下SELECT 分组列, 分组函数 FROM 表名 WHERE 行级过滤条件分组前过滤 GROUP BY 分组列 HAVING 分组过滤条件分组后过滤 ORDER BY 排序列理解这六个子句的执行顺序是掌握分组查询的关键FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY二、分组函数详解2.1 基础分组函数-- COUNT计数SELECTCOUNT(*)FROMemployees;-- 统计总行数包含NULLSELECTCOUNT(commission_pct)FROMemployees;-- 统计非NULL的值的个数SELECTCOUNT(DISTINCTdepartment_id)FROMemployees;-- 统计不重复的部门数-- SUM求和忽略NULLSELECTSUM(salary)FROMemployees;SELECTSUM(salary*NVL(commission_pct,0))AStotal_commissionFROMemployees;-- AVG平均值忽略NULLSELECTAVG(salary)FROMemployees;SELECTAVG(commission_pct)FROMemployees;-- 只计算有提成的员工-- 与包含NULL的平均值对比SELECTSUM(commission_pct)/COUNT(*)ASavg_including_nullFROMemployees;-- MAX/MIN最大值/最小值忽略NULL可用于日期和字符串SELECTMAX(salary),MIN(salary)FROMemployees;SELECTMAX(hire_date),MIN(hire_date)FROMemployees;-- 最近/最早入职日期SELECTMAX(last_name)FROMemployees;-- 字母顺序最后的姓名-- 同时使用多个分组函数SELECTCOUNT(*)AStotal_emp,COUNT(commission_pct)ASemp_with_comm,SUM(salary)AStotal_salary,ROUND(AVG(salary),2)ASavg_salary,MAX(salary)ASmax_salary,MIN(salary)ASmin_salary,MAX(salary)-MIN(salary)ASsalary_rangeFROMemployees;2.2 LISTAGG函数11g新增LISTAGG是Oracle 11g引入的强大聚合函数可以将多行数据合并为一行字符串。-- 将各部门的员工姓名合并为逗号分隔的字符串SELECTdepartment_id,COUNT(*)ASemp_count,LISTAGG(last_name,, )WITHINGROUP(ORDERBYlast_name)ASemp_namesFROMemployeesGROUPBYdepartment_idORDERBYdepartment_id;-- 输出示例-- DEPARTMENT_ID EMP_COUNT EMP_NAMES-- 10 1 Whalen-- 20 2 Fay, Hartstein-- 30 6 Baida, Colmenares, Himuro, Khoo, Raphaely, Tobias2.3 STDDEV和VARIANCE统计函数-- STDDEV标准差样本标准差SELECTdepartment_id,ROUND(AVG(salary),2)ASavg_salary,ROUND(STDDEV(salary),2)ASsalary_stddev,ROUND(VARIANCE(salary),2)ASsalary_varianceFROMemployeesGROUPBYdepartment_idHAVINGCOUNT(*)2ORDERBYdepartment_id;三、GROUP BY子句3.1 单列分组-- 按部门统计员工数量和平均薪资SELECTdepartment_id,COUNT(*)ASemp_count,ROUND(AVG(salary),2)ASavg_salaryFROMemployeesGROUPBYdepartment_idORDERBYdepartment_id;3.2 多列分组-- 按部门和职位分组统计SELECTdepartment_id,job_id,COUNT(*)ASemp_count,SUM(salary)AStotal_salary,ROUND(AVG(salary),2)ASavg_salaryFROMemployeesGROUPBYdepartment_id,job_idORDERBYdepartment_id,job_id;-- 注意SELECT中非聚合列必须出现在GROUP BY中-- 以下写法错误job_id未在GROUP BY中-- SELECT department_id, job_id, COUNT(*) FROM employees GROUP BY department_id; -- ERROR!3.3 在表达式上分组-- 按入职年份分组统计SELECTTO_CHAR(hire_date,YYYY)AShire_year,COUNT(*)ASemp_count,ROUND(AVG(salary),2)ASavg_salaryFROMemployeesGROUPBYTO_CHAR(hire_date,YYYY)ORDERBYhire_year;-- 按薪资范围分组需要CASE表达式SELECTCASEWHENsalary5000THEN低薪5000以下WHENsalaryBETWEEN5000AND10000THEN中薪5000-10000ELSE高薪10000以上ENDASsalary_range,COUNT(*)ASemp_count,ROUND(AVG(salary),2)ASavg_in_rangeFROMemployeesGROUPBYCASEWHENsalary5000THEN低薪5000以下WHENsalaryBETWEEN5000AND10000THEN中薪5000-10000ELSE高薪10000以上ENDORDERBYemp_countDESC;3.4 ROLLUP与CUBE小计与总计ROLLUP和CUBE是GROUP BY的扩展用于生成多层次的汇总报表。-- ROLLUP层次化小计-- 生成部门-职位明细 部门小计 总计SELECTNVL(TO_CHAR(department_id),所有部门)ASdept,NVL(job_id,所有职位)ASjob,COUNT(*)ASemp_count,SUM(salary)AStotal_salaryFROMemployeesGROUPBYROLLUP(department_id,job_id)ORDERBYdepartment_id,job_id;-- CUBE生成所有可能组合的汇总-- 生成部门-职位明细 部门小计 职位小计 总计SELECTNVL(TO_CHAR(department_id),所有)ASdept,NVL(job_id,所有)ASjob,COUNT(*)ASemp_count,SUM(salary)AStotal_salaryFROMemployeesGROUPBYCUBE(department_id,job_id)ORDERBYdepartment_id NULLSLAST,job_id NULLSLAST;-- GROUPING函数标识某列是否是汇总行1汇总行0明细行SELECTCASEGROUPING(department_id)WHEN1THEN【总计】ELSETO_CHAR(department_id)ENDASdept_display,CASEGROUPING(job_id)WHEN1THEN【小计】ELSEjob_idENDASjob_display,COUNT(*)ASemp_count,SUM(salary)AStotal_salaryFROMemployeesGROUPBYROLLUP(department_id,job_id);3.5 GROUPING SETS指定分组集合-- 只生成指定的汇总维度比CUBE更精确SELECTdepartment_id,job_id,manager_id,COUNT(*)ASemp_count,SUM(salary)AStotal_salaryFROMemployeesGROUPBYGROUPING SETS((department_id,job_id),-- 按部门职位(department_id),-- 只按部门()-- 总计)ORDERBYdepartment_id,job_id;四、HAVING子句4.1 HAVING vs WHERE这是初学者最容易混淆的地方特性WHEREHAVING过滤时机分组前过滤行分组后过滤组可使用分组函数不可以可以执行顺序在GROUP BY之前在GROUP BY之后-- WHERE过滤单行分组前-- HAVING过滤分组结果SELECTdepartment_id,COUNT(*)ASemp_count,AVG(salary)ASavg_salaryFROMemployeesWHEREsalary3000-- 先过滤排除薪资≤3000的员工GROUPBYdepartment_idHAVINGCOUNT(*)3-- 再过滤只显示员工数≥3的部门ANDAVG(salary)6000-- 且平均薪资6000的部门ORDERBYavg_salaryDESC;-- 错误写法不能在WHERE中使用分组函数-- SELECT department_id, COUNT(*) FROM employees WHERE COUNT(*) 3 GROUP BY department_id; -- ERROR!4.2 HAVING的典型使用场景-- 找出有两个以上员工具有相同薪资的薪资值可能是薪资标准化的结果SELECTsalary,COUNT(*)ASemp_countFROMemployeesGROUPBYsalaryHAVINGCOUNT(*)1ORDERBYsalary;-- 找出平均薪资高于公司整体平均薪资的部门SELECTdepartment_id,ROUND(AVG(salary),2)ASdept_avg_salaryFROMemployeesGROUPBYdepartment_idHAVINGAVG(salary)(SELECTAVG(salary)FROMemployees)ORDERBYdept_avg_salaryDESC;-- 找出至少有一名员工有提成的部门SELECTdepartment_id,COUNT(*)AStotal_emp,COUNT(commission_pct)ASemp_with_commFROMemployeesGROUPBYdepartment_idHAVINGCOUNT(commission_pct)0ORDERBYemp_with_commDESC;五、ORDER BY高级排序5.1 基础排序-- 单列排序SELECTemployee_id,first_name,salaryFROMemployeesORDERBYsalary;-- 默认升序ASCORDERBYsalaryDESC;-- 降序-- 多列排序先按第一列排相同时按第二列排SELECTdepartment_id,salary,last_nameFROMemployeesORDERBYdepartment_idASC,salaryDESC,last_nameASC;-- 使用列号排序不推荐可读性差但在某些场景有用SELECTdepartment_id,salary,last_nameFROMemployeesORDERBY1,2DESC;5.2 NULL值的排序位置-- 默认情况下ASC排序时NULL排在最后DESC排序时NULL排在最前-- 使用NULLS FIRST/NULLS LAST控制NULL的位置SELECTemployee_id,commission_pctFROMemployeesORDERBYcommission_pctASCNULLSFIRST;-- NULL排最前SELECTemployee_id,commission_pctFROMemployeesORDERBYcommission_pctDESCNULLSLAST;-- NULL排最后5.3 自定义排序顺序-- 使用DECODE或CASE实现自定义排序顺序-- 需求按照IT、HR、Finance、Sales、其他的顺序排列部门SELECTd.department_name,COUNT(e.employee_id)ASemp_countFROMdepartments dLEFTJOINemployees eONd.department_ide.department_idGROUPBYd.department_nameORDERBYDECODE(d.department_name,IT,1,Human Resources,2,Finance,3,Sales,4,99)-- 其他部门排最后;5.4 ORDER BY与ROWNUM的配合分页查询-- Oracle 11g中实现分页查询的标准写法-- 查询按薪资降序排列的第11-20名员工SELECT*FROM(SELECTe.*,ROWNUMASrnFROM(SELECTemployee_id,first_name,last_name,salaryFROMemployeesORDERBYsalaryDESC)eWHEREROWNUM20-- 先取前20行)WHERErn11;-- 再取第11行及以后-- 注意不能直接 WHERE ROWNUM BETWEEN 11 AND 20因为ROWNUM是伪列-- 必须使用嵌套查询的方式-- 参数化分页每页显示pageSize行取第pageNum页-- pageNum 2, pageSize 10SELECT*FROM(SELECTe.*,ROWNUMASrnFROM(SELECTemployee_id,first_name,salaryFROMemployeesORDERBYsalaryDESC)eWHEREROWNUM2*10-- pageNum * pageSize)WHERErn(2-1)*10;-- (pageNum - 1) * pageSize六、综合实战案例案例一部门薪资分析报表-- 生成完整的部门薪资分析报表SELECTNVL(TO_CHAR(e.department_id),未分配)AS部门编号,NVL(d.department_name,未分配)AS部门名称,COUNT(e.employee_id)AS员工人数,TO_CHAR(SUM(e.salary),999,999)AS薪资总额,TO_CHAR(ROUND(AVG(e.salary),2),99,999.99)AS平均薪资,TO_CHAR(MAX(e.salary),99,999)AS最高薪资,TO_CHAR(MIN(e.salary),99,999)AS最低薪资,TO_CHAR(MAX(e.salary)-MIN(e.salary),99,999)AS薪资差距,ROUND(STDDEV(e.salary),2)AS薪资标准差,COUNT(e.commission_pct)AS有提成人数,ROUND(COUNT(e.commission_pct)*100.0/COUNT(*),1)||%AS有提成比例FROMemployees eLEFTJOINdepartments dONe.department_idd.department_idGROUPBYe.department_id,d.department_nameHAVINGCOUNT(e.employee_id)0ORDERBYSUM(e.salary)DESC;案例二员工入职趋势分析-- 按年份统计员工入职趋势SELECTTO_CHAR(hire_date,YYYY)AS入职年份,COUNT(*)AS入职人数,ROUND(AVG(salary),2)AS平均起薪,-- 生成简单的柱状图RPAD(█,COUNT(*),█)AS趋势图,-- 计算当年入职人数占总人数的百分比ROUND(COUNT(*)*100.0/(SELECTCOUNT(*)FROMemployees),1)||%AS占比FROMemployeesGROUPBYTO_CHAR(hire_date,YYYY)ORDERBY入职年份;案例三TOP N分析各部门薪资前3名-- 找出每个部门薪资前3名的员工-- 使用RANK()或ROW_NUMBER()分析函数将在进阶篇详细讲解SELECT*FROM(SELECTdepartment_id,employee_id,first_name|| ||last_nameASfull_name,salary,RANK()OVER(PARTITIONBYdepartment_idORDERBYsalaryDESC)ASsalary_rankFROMemployees)WHEREsalary_rank3ORDERBYdepartment_id,salary_rank;七、常见问题与陷阱Q1GROUP BY中能用列别名吗-- 错误GROUP BY不能使用SELECT中定义的别名SELECTTO_CHAR(hire_date,YYYY)AShire_year,COUNT(*)FROMemployeesGROUPBYhire_year;-- ORA-00904: HIRE_YEAR: invalid identifier-- 正确必须重复表达式SELECTTO_CHAR(hire_date,YYYY)AShire_year,COUNT(*)FROMemployeesGROUPBYTO_CHAR(hire_date,YYYY);Q2COUNT(*) vs COUNT(列名) vs COUNT(1)-- COUNT(*)统计所有行数包括NULL值的行-- COUNT(列名)统计该列非NULL值的行数-- COUNT(1)与COUNT(*)等价性能也相同优化器会做等价转换-- COUNT(DISTINCT 列名)统计不重复的非NULL值的个数SELECTCOUNT(*)AStotal_rows,COUNT(manager_id)ASrows_with_manager,COUNT(*)-COUNT(manager_id)ASrows_without_managerFROMemployees;Q3HAVING能不能引用WHERE已过滤的列-- HAVING可以引用任何可以在查询中使用的表达式-- 但注意WHERE先执行所以HAVING看到的数据已经是WHERE过滤后的结果SELECTdepartment_id,AVG(salary)FROMemployeesWHEREjob_id!SA_REP-- 先排除销售代表GROUPBYdepartment_idHAVINGAVG(salary)7000;-- 对排除后的数据再做汇总过滤八、最佳实践先WHERE后HAVING能用WHERE过滤的条件不要放在HAVING里。WHERE在分组前过滤减少参与分组的数据量性能更好。分组键的索引GROUP BY列上的索引可以显著提升分组排序性能。ROLLUP替代多次查询需要小计和合计时使用ROLLUP一次查询比分别写多个SQL高效。避免分组过多GROUP BY字段过多会导致结果集膨胀要根据实际业务需求设计合理的分组粒度。ORDER BY放最后ORDER BY是最后执行的操作排序代价较高确保只在必要时使用。九、总结本文系统讲解了Oracle SQL分组统计的核心知识分组函数COUNT、SUM、AVG、MAX、MIN、LISTAGG等GROUP BY单列分组、多列分组、ROLLUP/CUBE/GROUPING SETS高级分组HAVING分组结果的过滤与WHERE的本质区别ORDER BY多列排序、NULL值处理、自定义排序、分页查询理解执行顺序FROM→WHERE→GROUP BY→HAVING→SELECT→ORDER BY是正确编写分组查询的基础。下一篇将进入多表查询与连接操作的深度解析。上一篇【第02篇】Oracle SQL查询高级技巧——条件与函数下一篇【第04篇】Oracle多表查询与连接操作——JOIN的全面解析参考资料《Oracle 11g数据库管理员指南》— 刘宪军著Oracle官方文档SQL Language Reference - SELECT StatementOracle官方文档SQL Language Reference - Aggregate Functions