PostgreSQL数据类型，太多了，记不住，看这一篇就够

菜小编

数据结构与算法

2022-9-21

本文将介绍PostgreSQL以下数据类型

整数类型（integer types）
任意精度类型（arbitrary precision numbers）
浮点数类型（floating-point types）
序列类型（serial types）
金额类型（monetary types）
字符类型（character types）
二进制数据类型（binary data types）
日期/时间类型（date/time types）
布尔类型（boolean type）
枚举类型（enumerated types）
几何类型（geometric types）
文本搜索类型（text search types）
JSON 类型（JSON types）
数组类型（Array types）
组合类型（Composite types）
范围类型（range types）
领域类型（domain types）

长文预警-------------------

整数类型（integer types）

integer，4字节，应该成为数字类型的首选，例如存储金额，可以以分为单位存储
smallint，2字节，更节省磁盘空间
bigint，8字节，能够存储的数字范围更大

任意精度类型（arbitrary precision numbers）

关键特性：占用存储空间可变

语法

NUMERIC(precision, scale)

precision，总精度，可以存储的总位数
scale，小数点后的位数

NUMERIC类型适用于要求精确的场景，例如金融领域。

对NUMERIC类型的计算，如加、减、乘是精确的，但精确度的提升带来的是速度的下降，相比INTEGER类型和FLOAT类型，NUMERIC类型的计算速度十分缓慢。

对于小数位超过scale的情况，超过的部分会被舍入，对于整数部分超过限制的情况，会引发错误。

浮点数类型（floating-point types）

占用空间

float4，单精度，4字节
float8，双精度，8字节

浮点数，精确度低于NUMERIC，浮点数的精确性问题普遍存在于各类编程语言中。

序列类型（serial types）

占用空间

smallserial，2字节
serial，4字节
bigserial，8字节

序列类型通常用于自增ID，最好为这类字段加上UNIQUE，或PRIMARY KEY约束，这些约束不是自动生成的。

SERIAL类型的字段可能出现“空洞”，即使没有删除过任何数据。通过nextval()获取到的值会被消耗掉，即使获取到的值没有成功写入，例如当事务回滚了。

金额类型（monetary types）

语法 money 存储空间 8字节描述存储货币金额精确度同NUMERIC，默认有2位小数

字符类型（character types）

varchar(n)，有长度限制的字符串
char(n)，定长字符串，长度不足则向后填充空白字符
text，不限长度

char(n)类型，长度不足时向后填充空白，存储和展示都包含空白。

比较两个char类型的数据时，空白字符会被忽略。

mydb=# SELECT '12'::char(10);
  bpchar  
------------
12        
(1 row)

mydb=# SELECT '12'::char(20)='12'::char(2);
?column?
----------
t
(1 row)

mydb=# SELECT '12'::char(20)='12'::char(10);
?column?
----------
t

将char类型的数据转为varchar或text时，尾部的空白会被删除。

mydb=# SELECT '12'::char(10)::varchar(3);
varchar
---------
12
(1 row)

字符串类型的字段需要额外的字节存储，126字节以内需要1字节，更长的需要4字节。

超长的字符串会被存储在“隐藏表”（background table），所以不会影响对于短字段的频繁读取。

长字符串会被自动压缩。最长可以存储长为1GB的字符串。

这三种数据类型没有性能差异。

varchar(n)可以由数据库校验字符串长度，超长会引发错误。通常情况下，char(n)是三种类型中最慢的，因为它需要额外的存储消耗（填充空白字符）。相比char，varchar和text更加常用。

二进制数据类型（binary data types）

TODO

日期/时间类型（date/time types）

timestamp with time zone 简写为 timestamptz ：存储时以UTC时间存储，展示时转换为当前系统时区或指定时区。
timestamp without time zone 简写为 timestamp，没有时区，展示时需要指定时区。
time with time zone有些尴尬，它只记录的当天的时间点而不包含当天的日期，当做时区转换时，时间点可能会跨到昨天或明天，这样看起来是容易混淆的。PostgreSQL不建议使用time with time zone。
time without time zone，目前还没看出它适用的场景。
date，日期类型，不能指定time zone。想象一下+8时区的某一天的早上5点，是-8时区前一天的下午，这也是个问题。
interval，时间间隔，可以按年月日时分秒计算

pgsql 时区配置位于 postgresql.conf

timezone = 'Asia/Shanghai'

可以通过 SET TIME ZONE命令设置当前会话的时区

mydb=# create table test_datetime (
ts timestamp, 
tstz timestamp with time zone, 
period interval
);
mydb=# \d test_datetime;
                    Table "public.test_datetime"
Column |            Type             | Collation | Nullable | Default
--------+-----------------------------+-----------+----------+---------
ts     | timestamp without time zone |           |          |
tstz   | timestamp with time zone    |           |          |
period | interval                    |           |          |
mydb=# insert into test_datetime (ts,tstz,period) values 
('2019-12-12 11:30:30', '2019-12-12 11:30:30', 'P0000-00-00T00:10:00');
mydb=# select * from test_datetime;
        ts          |          tstz          |  period  
---------------------+------------------------+----------
2019-12-12 11:30:30 | 2019-12-12 11:30:30+08 | 00:10:00
mydb=# select tstz AT TIME ZONE 'UTC' from test_datetime;
     timezone      
---------------------
2019-12-12 03:30:30
mydb=# select ts AT TIME ZONE 'UTC' from test_datetime;
       timezone        
------------------------
2019-12-12 19:30:30+08
mydb=# select ts AT TIME ZONE 'America/New_York' from test_datetime;
       timezone        
------------------------
2019-12-13 00:30:30+08
mydb=# select tstz AT TIME ZONE 'America/New_York' from test_datetime;
     timezone      
---------------------
2019-12-11 22:30:30
mydb=# select tstz AT TIME ZONE 'UTC' from test_datetime;
     timezone      
---------------------
2019-12-12 03:30:30

timestamp without time zone，通过AT TIME ZONE转换时，时间会被认为是指定的time zone的当前时间，例如

mydb=# select ts AT TIME ZONE 'America/New_York' from test_datetime;
       timezone        
------------------------
2019-12-13 00:30:30+08

会被认为是纽约当地时间 2019-12-12 11:30:30，转换成+8时区时间就是 2019-12-13 00:30:30+08。

由此可见，使用timestamptz是更合理的选择，前提是确保postgresql.conf中的timezone配置是正确的。

timestamp(p)用于控制精度，p的取值范围0-6，最大精度微秒，p为0时精确到秒。

布尔类型（boolean type）

用于表示true或false

mydb=# \d test_bool;
            Table "public.test_bool"
Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
flag   | boolean |           |          |
remark | text    |           |          |
mydb=# insert into test_bool (flag, remark) values ('t', 'True'), ('f', 'False');
mydb=# select * from test_bool;
flag | remark
------+--------
t    | True
f    | False
mydb=# select * from test_bool where flag IS TRUE;
flag | remark
------+--------
t    | True
mydb=# select * from test_bool where flag IS FALSE;
flag | remark
------+--------
f    | False
mydb=# select * from test_bool where flag;
flag | remark
------+--------
t    | True

枚举类型（enumerated types）

枚举类型在PostgreSQL中属于自定义类型，包含一组静态且有序的值。

mydb=# CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
mydb=# CREATE TABLE person (name text, current_mood mood);
mydb=# INSERT INTO person (name, current_mood) VALUES ('me', 'happy');
mydb=# SELECT * FROM person;
name | current_mood
------+--------------
me   | happy
mydb=# INSERT INTO person (name, current_mood) VALUES ('me', 'happier');
ERROR:  invalid input value for enum mood: "happier"
LINE 1: ...RT INTO person (name, current_mood) VALUES ('me', 'happier')...

枚举类型支持排序、比较

mydb=# SELECT * FROM person;
name | current_mood
------+--------------
me   | happy
Moe  | ok
Mary | sad
mydb=# SELECT * FROM person ORDER BY current_mood;
name | current_mood
------+--------------
Mary | sad
Moe  | ok
me   | happy
mydb=# SELECT * FROM person WHERE current_mood>='ok';
name | current_mood
------+--------------
me   | happy
Moe  | ok

枚举类型支持所有比较操作符和MIN()、MAX()等聚合函数。

枚举值的大小，和定义时指定的先后顺序相同。

枚举是类型安全的，不同的枚举类型之间相互独立，不能直接比较

mydb=# CREATE TYPE happiness AS ENUM ('happy', 'very happy', 'ecstatic');
mydb=# CREATE TABLE holidays (
mydb(#     num_weeks integer,
mydb(#     happiness happiness
mydb(# );
mydb=# SELECT person.name, holidays.num_weeks FROM person, holidays
mydb-#   WHERE person.current_mood = holidays.happiness;
ERROR:  operator does not exist: mood = happiness
LINE 2:   WHERE person.current_mood = holidays.happiness;
                                   ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

显式转换成字符串类型后可以做比较

mydb=# SELECT person.name, holidays.num_weeks FROM
mydb-# person, holidays
mydb-# WHERE person.current_mood::text = holidays.happiness::text;
name | num_weeks
------+-----------
me   |         4
(1 row)

枚举类型的值对大小写敏感，例如happy与Happy是不同的值。

可以修改枚举类型的定义，重命名枚举值，添加新的值。

已经存在的枚举值不可删除或改变顺序。

mydb=# ALTER TYPE mood ADD VALUE IF NOT EXISTS 'happier' AFTER 'happy';
mydb=# \dT+ mood;
                                     List of data types
Schema | Name | Internal name | Size | Elements |  Owner   | Access privileges | Description
--------+------+---------------+------+----------+----------+-------------------+-------------
public | mood | mood          | 4    | sad     +| postgres |                   |
       |      |               |      | ok      +|          |                   |
       |      |               |      | happy   +|          |                   |
       |      |               |      | happier  |          |                   |
mydb=# ALTER TYPE mood RENAME VALUE 'ok' TO 'fine';
mydb=# \dT+ mood;
                                     List of data types
Schema | Name | Internal name | Size | Elements |  Owner   | Access privileges | Description
--------+------+---------------+------+----------+----------+-------------------+-------------
public | mood | mood          | 4    | sad     +| postgres |                   |
       |      |               |      | fine    +|          |                   |
       |      |               |      | happy   +|          |                   |
       |      |               |      | happier  |          |                   |
mydb=# ALTER TYPE mood ADD VALUE IF NOT EXISTS 'ok' BEFORE 'fine';
mydb=# \dT+ mood;
                                     List of data types
Schema | Name | Internal name | Size | Elements |  Owner   | Access privileges | Description
--------+------+---------------+------+----------+----------+-------------------+-------------
public | mood | mood          | 4    | sad     +| postgres |                   |
       |      |               |      | ok      +|          |                   |
       |      |               |      | fine    +|          |                   |
       |      |               |      | happy   +|          |                   |
       |      |               |      | happier  |          |                   |

几何类型（geometric types）

point，二维平面上的点，用(x, y)表示，x、y分别表示x轴、y轴的坐标值

mydb=# create table test_point (name text, pt point);
mydb=# INSERT INTO test_point (name, pt) VALUES ('p1', '(1.1,22.0)');
mydb=# INSERT INTO test_point (name, pt) VALUES ('p2', '(10.1,22.0)');
mydb=# INSERT INTO test_point (name, pt) VALUES ('p3', '(1.1,2.0)');
mydb=# SELECT * FROM test_point;
name |    pt    
------+-----------
p1   | (1.1,22)
p2   | (10.1,22)
p3   | (1.1,2)
mydb=# SELECT * FROM test_point WHERE pt ?- '(0,22)';
name |    pt    
------+-----------
p1   | (1.1,22)
p2   | (10.1,22)
mydb=# SELECT * FROM test_point WHERE pt ?| '(1.1,0)';
name |    pt    
------+----------
p1   | (1.1,22)
p3   | (1.1,2)

操作符

?-：是否横向齐平（y坐标是否相同）
?|：是否纵向齐平（x坐标是否相同）

circle，圆，用圆心和半径表示，如 x, y, r，其中(x，y)为圆心，r为半径。

mydb=# CREATE table test_circle (name text, cc circle);
mydb=# INSERT INTO test_circle (name, cc) VALUES ('c1', '1,2,2');
mydb=# INSERT INTO test_circle (name, cc) VALUES ('c1', '10.2,-15,10');
mydb=# SELECT * FROM test_circle WHERE point '(1,2)' <@ cc;
name |    cc    
------+-----------
c1   | <(1,2),2>
mydb=# SELECT * FROM test_circle WHERE cc @> point '(1,2)';
name |    cc    
------+-----------
c1   | <(1,2),2>
mydb=# SELECT * FROM test_circle WHERE cc << circle '10,6,3';
name |    cc    
------+-----------
c1   | <(1,2),2>
mydb=# SELECT * FROM test_circle WHERE cc >> circle '0,0,0.1';
name |       cc        
------+-----------------
c1   | <(10.2,-15),10>
mydb=# SELECT circle '((0,0),1)' <-> circle '((5,0),1)';
?column?
----------
       3

操作符

<@ 点是否位于圆内
@> 圆是否包含点
<< 圆是否严格在圆的左侧
>> 圆是否严格在圆的右侧
<-> 两个圆之间的距离

line，线，在坐标系上用Ax + By + C = 0表示，也就是一个二元一次方程，A、B表示斜率，C表示偏移量。

mydb=# CREATE TABLE test_line (name text, info line);=
mydb=# INSERT INTO test_line VALUES ('c1', '(0,1), (2,3)');
mydb=# INSERT INTO test_line VALUES ('c2', '(-5,6), (3,7)');
mydb=# SELECT * FROM test_line;
name |       info      
------+------------------
c1   | {1,-1,1}
c2   | {0.125,-1,6.625}
mydb=# SELECT * FROM test_line WHERE info ?|| '{1,-1,10}';
name |   info  
------+----------
c1   | {1,-1,1}
mydb=# SELECT * FROM test_line WHERE info ?-| '{1,1,10}';
name |   info  
------+----------
c1   | {1,-1,1}

操作符

?|| 两条线是否平行
?-| 两条线是否垂直

文本搜索类型（text search types）

tsvector，顾名思义，该数据类型存储的是一个vector（向量），包含了文本中的所有词元，vector中的元素是经过排序的。

tsquery，存储一组需要搜索的词元。

 mydb=# CREATE TABLE test_tsvector (name text, raw text, vector tsvector);

可以看出中文不支持自动分词（需要安装插件zhparser）。

可以在应用程序做好分词后再写入，这样可以降低数据库的运算压力。

mydb=# INSERT INTO test_tsvector (name, raw, vector) VALUES ('allen', 'I love swimming', to_tsvector('I love swimming'));
mydb=# INSERT INTO test_tsvector (name, raw, vector) VALUES ('李雷', '我喜欢游泳', to_tsvector('我喜欢游泳'));
mydb=# INSERT INTO test_tsvector (name, raw, vector) VALUES ('bob', 'I love football', to_tsvector('I love football'));
mydb=# INSERT INTO test_tsvector (name, raw, vector) VALUES ('kate', 'I love badminton', to_tsvector('I love badminton'));
mydb=# INSERT INTO test_tsvector (name,raw,vector) VALUES ('李雷','喜欢足球','喜欢:2 足球:3'::tsvector);
mydb=# SELECT * FROM test_tsvector;
name  |       raw        |         vector        
-------+------------------+------------------------
allen | I love swimming  | 'love':2 'swim':3
李雷  | 我喜欢游泳       | '我喜欢游泳':1
bob   | I love football  | 'footbal':3 'love':2
kate  | I love badminton | 'badminton':3 'l

条件查询

mydb=# SELECT * FROM test_tsvector WHERE vector @@ to_tsquery('喜欢 & 足球');
name |   raw    |      vector      
------+----------+-------------------
李雷 | 喜欢足球 | '喜欢':2 '足球':3
mydb=# SELECT * FROM test_tsvector WHERE vector @@ to_tsquery('足球');
name |   raw    |      vector      
------+----------+-------------------
李雷 | 喜欢足球 | '喜欢':2 '足球':3
mydb=# SELECT * FROM test_tsvector WHERE vector @@ to_tsquery('足球 & 喜欢');
name |   raw    |      vector      
------+----------+-------------------
李雷 | 喜欢足球 | '喜欢':2 '足球':3

非标准化输入

可以看出，大写会被忽略，变形单词会被提取为原词

mydb=# INSERT INTO test_tsvector (name,raw,vector) VALUES ('mary','love dancing',to_tsvector('Love Dancing'));
mydb=# SELECT * FROM test_tsvector WHERE name='mary';
name |     raw      |      vector      
------+--------------+-------------------
mary | love dancing | 'danc':2 'love':1
mydb=# INSERT INTO test_tsvector (name,raw,vector) VALUES ('annie','loves dancing',to_tsvector('Loves Dancing'));
mydb=# SELECT * FROM test_tsvector WHERE vector @@ to_tsquery('dancing');
name  |      raw      |      vector      
-------+---------------+-------------------
mary  | love dancing  | 'danc':2 'love':1
annie | loves dancing | 'danc':2 'love':1
mydb=# SELECT * FROM test_tsvector WHERE vector @@ to_tsquery('dance');
name  |      raw      |      vector      
-------+---------------+-------------------
mary  | love dancing  | 'danc':2 'love':1
annie | loves dancing | 'danc':2 'love':1
mydb=# SELECT * FROM test_tsvector WHERE vector @@ to_tsquery('足球 | dancing');
name  |      raw      |      vector      
-------+---------------+-------------------
李雷  | 喜欢足球      | '喜欢':2 '足球':3
mary  | love dancing  | 'danc':2 'love':1
annie | loves dancing | 'danc':2 'love':1

操作符及函数

@@： tsvector是否匹配tsquery
to_tsvector()：将文本转换为tsvector
to_tsquery()：将文本转换为tsquery

tsquery相关操作符

&&，与
||，或
!!，非
<->，跟随，用于有序匹配，例如to_tsquery('love') <-> to_tsquery('dance')

mydb=# SELECT * FROM test_tsvector WHERE vector @@ (to_tsquery('dance') || to_tsquery('足球'));
name  |      raw      |      vector      
-------+---------------+-------------------
李雷  | 喜欢足球      | '喜欢':2 '足球':3
mary  | love dancing  | 'danc':2 'love':1
annie | loves dancing | 'danc':2 'love':1
mydb=# SELECT * FROM test_tsvector WHERE vector @@ (to_tsquery('dance') && to_tsquery('足球'));
name | raw | vector
------+-----+--------
mydb=# SELECT * FROM test_tsvector WHERE vector @@ !!to_tsquery('dance');
name  |       raw        |         vector        
-------+------------------+------------------------
allen | I love swimming  | 'love':2 'swim':3
李雷  | 我喜欢游泳       | '我喜欢游泳':1
bob   | I love football  | 'footbal':3 'love':2
kate  | I love badminton | 'badminton':3 'love':2
李雷  | 喜欢足球         | '喜欢':2 '足球':3
mydb=# SELECT * FROM test_tsvector WHERE vector @@ (to_tsquery('love') <-> to_tsquery('dance'));
name  |      raw      |      vector      
-------+---------------+-------------------
mary  | love dancing  | 'danc':2 'love':1
annie | loves dancing | 'danc':2 'love':1
mydb=# SELECT * FROM test_tsvector WHERE vector @@ (to_tsquery('dance') <-> to_tsquery('love'));
name | raw | vector
------+-----+--------

创建索引

 mydb=# CREATE INDEX vector_idx ON test_tsvector USING GIN (vector);

TODO https://www.postgresql.org/docs/12/textsearch.html

JSON 类型（JSON types）

json，输入被存储为文本，不做任何处理，输出和输入完全相同。

jsonb：输入经过预处理，忽略一些不重要的因素，如空白字符。输出可能和输入不同

适用json、jsonb的操作符

-> int：选择array JSON中的元素，索引从0开始，参数为负数时从尾端开始查找
-> text：选择object JSON中的key
以上操作符返回json或jsonb，变体
->> int|text，返回text
#> text[]，按路径查找，例如 #> '{a,b}'，将查找 json.a.b，返回json|jsonb
#>> text[]，按路径查找，返回text

仅适用jsonb的操作符

@>： 是否包含，仅检查顶层key
<@： 是否被包含，仅检查顶层key
? text： 文本是否是顶层key，或JSON array中的元素
?| text[]：数组中的任一元素是否是顶层key，或JSON array中的元素
?& text[]：数组中的全部元素是否都是顶层key，或JSON array中的元素
@? jsonpath：JSON数据中是否有jsonpath匹配的元素
@@ jsonpath： 返回jsonpath对JSON数据的断言检查结果，只考虑JSON中第一个匹配的元素，如果返回值不是Boolean类型，则返回NULL

索引

GIN索引可以加快对JSON数据的查询，GIN的默认选项为jsonb_ops，支持?, ?&, ?| 和 @>

 CREATE INDEX idxgin ON api USING GIN (jdoc);

索引选项jsonb_path_ops仅支持 @>，@@，@? 。相比默认选项，该选项性能更好，索引的体积更小。

 CREATE INDEX idxginp ON api USING GIN (jdoc jsonb_path_ops);

示例数据，表名api

{
    "guid": "9c36adc1-7fb5-4d5b-83b4-90356a46061a",
    "name": "Angela Barton",
    "is_active": true,
    "company": "Magnafone",
    "address": "178 Howard Place, Gulf, Washington, 702",
    "registered": "2009-11-07T08:53:22 +08:00",
    "latitude": 19.793713,
    "longitude": 86.513373,
    "tags": [
        "enim",
        "aliquip",
        "qui"
    ]
}

创建索引

 CREATE INDEX idxgin ON api USING GIN (jdoc);

以下查询可以用上索引 idxgin

SELECT jdoc->'guid',jdoc->'name' FROM api WHERE jdoc @> '{"company": "Magnafone"}';

以下查询无法用上索引 idxgin，因为索引不是创建于jdoc->'tags'

 SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc -> 'tags' ? 'qui';

需要为jdoc->'tags'单独创建索引

 CREATE INDEX idxgintags ON api USING GIN ((jdoc -> 'tags'));

以下查询同样可以用上idxgintags索引

SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ '$.tags[*] ? (@ == "qui")'; 

SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"tags": ["qui"]}';

按需创建索引，如单独为jdoc->'tags'创建索引，相比为jdoc整体创建索引，前者体积更小、速度更快。

官方解释：

The technical difference between a jsonb_ops and a jsonb_path_ops GIN index is that the former creates independent index items for each key and value in the data, while the latter creates index items only for each value in the data. [6] Basically, each jsonb_path_ops index item is a hash of the value and the key(s) leading to it; for example to index {"foo": {"bar": "baz"}}, a single index item would be created incorporating all three of foo, bar, and baz into the hash value. Thus a containment query looking for this structure would result in an extremely specific index search; but there is no way at all to find out whether foo appears as a key. On the other hand, a jsonb_ops index would create three index items representing foo, bar, and baz separately; then to do the containment query, it would look for rows containing all three of these items. While GIN indexes can perform such an AND search fairly efficiently, it will still be less specific and slower than the equivalent jsonb_path_ops search, especially if there are a very large number of rows containing any single one of the three index items.
A disadvantage of the jsonb_path_ops approach is that it produces no index entries for JSON structures not containing any values, such as {"a": {}}. If a search for documents containing such a structure is requested, it will require a full-index scan, which is quite slow. jsonb_path_ops is therefore ill-suited for applications that often perform such searches.

jsonpath

用于定位JSON中的数据，示例

 {"track": {"segments": [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}]}}

定位segments

 '$.track.segments'

定位segments包含的所有location，[*]将segments数组展开，类似于循环处理

 '$.track.segments[*].location'

定位segments包含的第一个location

 '$.track.segments[0].location'

计算segments元素数量

 '$.track.segments.size()'

jsonpath可以包含过滤语句，过滤语句以?开始，condition为过滤表达式

 ? (condition)

过滤HR大于100的结果

 '$.track.segments[*] ? (@.HR > 100)'

过滤HR大于100且小于120的结果

'$.track.segments[*] ? (@.HR > 100 && @.HR < 120)'

mydb=# CREATE TABLE test_json (name text, gps_json json, gps_jsonb jsonb);
mydb=# INSERT INTO test_json (name, gps_json, gps_jsonb) VALUES ('First', '{
mydb'#   "track": {
mydb'#     "segments": [
mydb'#       {
mydb'#         "location":   [ 47.763, 13.4034 ],
mydb'#         "start time": "2018-10-14 10:05:14",
mydb'#         "HR": 73
mydb'#       },
mydb'#       {
mydb'#         "location":   [ 47.706, 13.2635 ],
mydb'#         "start time": "2018-10-14 10:39:21",
mydb'#         "HR": 135
mydb'#       }
mydb'#     ]
mydb'#   }
mydb'# }'::json, '{
mydb'#   "track": {
mydb'#     "segments": [
mydb'#       {
mydb'#         "location":   [ 47.763, 13.4034 ],
mydb'#         "start time": "2018-10-14 10:05:14",
mydb'#         "HR": 73
mydb'#       },
mydb'#       {
mydb'#         "location":   [ 47.706, 13.2635 ],
mydb'#         "start time": "2018-10-14 10:39:21",
mydb'#         "HR": 135
mydb'#       }
mydb'#     ]
mydb'#   }
mydb'# }'::jsonb);
mydb=# SELECT gps_json FROM test_json;
-[ RECORD 1 ]------------------------------------------
gps_json | {                                           +
        |   "track": {                                +
        |     "segments": [                           +
        |       {                                     +
        |         "location":   [ 47.763, 13.4034 ],  +
        |         "start time": "2018-10-14 10:05:14",+
        |         "HR": 73                            +
        |       },                                    +
        |       {                                     +
        |         "location":   [ 47.706, 13.2635 ],  +
        |         "start time": "2018-10-14 10:39:21",+
        |         "HR": 135                           +
        |       }                                     +
        |     ]                                       +
        |   }                                         +
        | }
mydb=# SELECT gps_jsonb FROM test_json;
-[ RECORD 1 ]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
gps_jsonb | {"track": {"segments": [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}]}}
mydb=# SELECT gps_jsonb->'track'->'segments' FROM test_json WHERE gps_jsonb @? '$.track.segments[*] ? (@.HR > 100)';
-[ RECORD 1 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
?column? | [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}]
mydb=# SELECT gps_jsonb->'track'->'segments' FROM test_json WHERE gps_jsonb @? '$.track.segments[*] ? (@.HR > 180)';
(0 rows)
mydb=# SELECT gps_jsonb->'track'->'segments' FROM test_json WHERE gps_jsonb @@ '$.track.segments[0].HR > 10';
-[ RECORD 1 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
?column? | [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}]

@@ 和 @? 的区别

@? 用于判断jsonpath是否能够匹配到元素，例如下面的语句判断是有HR>100的元素。

mydb=# SELECT gps_jsonb->'track'->'segments' FROM test_json WHERE gps_jsonb @? '$.track.segments[*] ? (@.HR > 100)';
-[ RECORD 1 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
?column? | [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}]
mydb=# SELECT gps_jsonb->'track'->'segments' FROM test_json WHERE gps_jsonb @@ '$.track.segments[*] ? (@.HR > 100)';
(0 rows)

@@ 用于判断jsonpath的断言是否返回TRUE，例如下面的语句判断segments的元素数量是否大于3

mydb=# SELECT gps_jsonb->'track'->'segments' FROM test_json WHERE gps_jsonb @@ '$.track.segments.size()>3';
(0 rows)
mydb=# SELECT gps_jsonb->'track'->'segments' FROM test_json WHERE gps_jsonb @@ '$.track.segments.size()<3';
-[ RECORD 1 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
?column? | [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}]

jsonpath变量

$ 代表被查询的JSON
@ 用于filter condition，代表jsonpath匹配的元素

jsonpath accessors

.key: JSON的某个域
.*: 当前对象的所有顶层成员
.**: 当前对象的所有成员，递归获取，忽略嵌套层级
[*]: array对象的所有元素

数组类型（Array types）

一维数组

-- 整数型 
dates integer[] 
-- 字符串类型 names text[]

二维数组

dates integer[][] 
names text[][]

可以指定数组大小，但数据库不会对输入做任何限制，跟不设置大小没有区别

 names text[10]

创建表结构

 CREATE TABLE sal_emp (
 name text,
 pay_by_quarter integer[],
 schedule text[][]
 );

写入数据

INSERT INTO sal_emp
    VALUES ('Bill',
    '{10000, 10000, 10000, 10000}',
    '{{"meeting", "lunch"}, {"training", "presentation"}}');
INSERT INTO sal_emp
    VALUES ('Carol',
    '{20000, 25000, 25000, 25000}',
    '{{"breakfast", "consulting"}, {"meeting", "lunch"}}');

查询数据

mydb=# SELECT * FROM sal_emp;
name  |      pay_by_quarter       |                 schedule                  
-------+---------------------------+-------------------------------------------
Bill  | {10000,10000,10000,10000} | {{meeting,lunch},{training,presentation}}
Carol | {20000,25000,25000,25000} | {{breakfast,consulting},{meeting,lunch}}

获取数组中的元素，默认情况下元素索引从1开始

mydb=# SELECT pay_by_quarter[1] FROM sal_emp;
pay_by_quarter
----------------
         10000
         20000
mydb=# SELECT * FROM sal_emp WHERE pay_by_quarter[1] != pay_by_quarter[2];
name  |      pay_by_quarter       |                 schedule                
-------+---------------------------+------------------------------------------
Carol | {20000,25000,25000,25000} | {{breakfast,consulting},{meeting,lunch}}

获取数组的任意切片，以[lower-bound:upper-bound]指定索引范围，范围为闭区间

mydb=# SELECT schedule[1:2][1:1] FROM sal_emp WHERE name='Bill';
       schedule        
------------------------
{{meeting},{training}}

可以省略lower-bound或upper-bound，默认分别取1和最大索引

mydb=# SELECT schedule[:2][1:] FROM sal_emp WHERE name='Bill';
                schedule                  
-------------------------------------------
{{meeting,lunch},{training,presentation}}

不存在的索引会返回NULL，而不会引发错误

mydb=# SELECT schedule[5][5] IS NULL FROM sal_emp WHERE name='Bill';
?column?
----------
t

数组相关函数

array_dims - 返回array的维度，返回文本类型
array_lower - 返回数组某一维度的下界
array_upper - 返回数组某一维度的上界 array_length - 返回数组某一维度的长
cardinality - 返回数组的元素个数，统计所有维度

mydb=# SELECT array_dims(schedule) FROM sal_emp;
array_dims
------------
[1:2][1:2]
[1:2][1:2]
mydb=# SELECT array_lower(schedule, 1), array_lower(schedule, 2) FROM sal_emp;
array_lower | array_lower
-------------+-------------
          1 |           1
          1 |           1
mydb=# SELECT array_upper(schedule, 1), array_upper(schedule, 2) FROM sal_emp;
array_upper | array_upper
-------------+-------------
          2 |           2
          2 |           2
mydb=# SELECT array_length(schedule, 1), array_length(schedule, 2) FROM sal_emp;
array_length | array_length
--------------+--------------
           2 |            2
           2 |            2
mydb=# SELECT cardinality(pay_by_quarter), cardinality(schedule) FROM sal_emp;
cardinality | cardinality
-------------+-------------
          4 |           4
          4 |           4

array_lower存在的意义？

更新数组时可以设置索引不为1为基准，如

mydb=# UPDATE sal_emp SET pay_by_quarter[-1:0] = '{8000, 9000}' WHERE name='Bill';
mydb=# SELECT * FROM sal_emp WHERE name='Bill';
name |               pay_by_quarter               |                 schedule                  
------+--------------------------------------------+-------------------------------------------
Bill | [-1:4]={8000,9000,10000,10000,10000,10000} | {{meeting,lunch},{training,presentation}}
mydb=# UPDATE sal_emp SET pay_by_quarter[-7:-6] = '{8000, 9000}' WHERE name='Bill';
mydb=# SELECT * FROM sal_emp WHERE name='Bill';
name |                              pay_by_quarter                              |                 schedule  
               
------+--------------------------------------------------------------------------+---------------------------
----------------
Bill | [-7:4]={8000,9000,NULL,NULL,NULL,NULL,8000,9000,10000,10000,10000,10000} | {{meeting,lunch},{training
,presentation}}

array拼接操作符||，可以看出，在首部和尾部拼接元素均不会改变array的lower-bound。

mydb=# SELECT 1 || '[0:1]={2,3}'::int[];
  ?column?    
---------------
[0:2]={1,2,3}
mydb=# SELECT '[0:1]={2,3}'::int[] || 1;
  ?column?    
---------------
[0:2]={2,3,1}

将一维数组拼接至二维数组时，一维数组将作为二维数组的第一维元素，以此类推至N维拼接至N+1维。

mydb=# SELECT ARRAY[1,2] || ARRAY[[3,4],[5,6]];
     ?column?      
---------------------
{{1,2},{3,4},{5,6}}
mydb=# SELECT ARRAY[[3,4],[5,6]] || ARRAY[1,2];
     ?column?      
---------------------
{{3,4},{5,6},{1,2}}

拼接函数

array_append - 拼接至尾部，适用一维数组
array_prepend - 拼接至首部，适用一维数组
array_cat - 拼接，适用多维数组

mydb=# SELECT ARRAY[1,2] || NULL;
?column?
----------
{1,2}
mydb=# SELECT array_append(ARRAY[1,2], NULL);
array_append
--------------
{1,2,NULL}
mydb=# SELECT NULL || ARRAY[1,2];
?column?
----------
{1,2}
mydb=# SELECT array_prepend(NULL, ARRAY[1,2]);
array_prepend
---------------
{NULL,1,2}
mydb=# SELECT array_cat(ARRAY[0,1], ARRAY[1,2]);
array_cat
-----------
{0,1,1,2}
mydb=# SELECT array_cat(ARRAY[0,1], ARRAY[2,NULL]);
 array_cat  
--------------
{0,1,2,NULL}

搜索数组元素

匹配数组中的任意元素

mydb=# SELECT * FROM sal_emp WHERE 20000 = ANY(pay_by_quarter);
name  |      pay_by_quarter       |                 schedule                
-------+---------------------------+------------------------------------------
Carol | {20000,25000,25000,25000} | {{breakfast,consulting},{meeting,lunch}}

匹配数组中的所有元素

 mydb=# SELECT * FROM sal_emp WHERE 20000 = ALL(pay_by_quarter);
 name | pay_by_quarter | schedule
 ------+----------------+----------

判断数组是否有重合，操作符&&

mydb=# SELECT name,pay_by_quarter FROM sal_emp WHERE pay_by_quarter && ARRAY[10000];
name |                              pay_by_quarter                              
------+--------------------------------------------------------------------------
Bill | [-7:4]={8000,9000,NULL,NULL,NULL,NULL,8000,9000,10000,10000,10000,10000}
mydb=# SELECT name,pay_by_quarter FROM sal_emp WHERE pay_by_quarter && ARRAY[10000,20000];
name  |                              pay_by_quarter                              
-------+--------------------------------------------------------------------------
Carol | {20000,25000,25000,25000}
Bill  | [-7:4]={8000,9000,NULL,NULL,NULL,NULL,8000,9000,10000,10000,10000,10000}
mydb=# SELECT name,pay_by_quarter FROM sal_emp WHERE pay_by_quarter && ARRAY[30000];
name | pay_by_quarter
------+----------------

判断包含，操作符 @>

mydb=# SELECT name,pay_by_quarter FROM sal_emp WHERE pay_by_quarter @> ARRAY[10000];
name |                              pay_by_quarter                              
------+--------------------------------------------------------------------------
Bill | [-7:4]={8000,9000,NULL,NULL,NULL,NULL,8000,9000,10000,10000,10000,10000}

可以为array类型创建GIN索引

mydb=# CREATE INDEX idx_pay ON sal_emp USING GIN (pay_by_quarter);

组合类型（Composite types）

组合类型，将简单类型组合在一起，形成自定义类型。数据表的列可以定义为组合类型。创建组合类型，声明类型包含的字段和字段类型

 CREATE TYPE inventory_item AS (
 name            text,
 supplier_id     integer,
 price           numeric
 );

将表字段声明为组合类型

 CREATE TABLE on_hand (
 item      inventory_item,
 count     integer
 );
 INSERT INTO on_hand VALUES (ROW('fuzzy dice', 42, 1.99), 1000); SELECT * FROM on_hand;
 item          | count
 ------------------------+-------
 ("fuzzy dice",42,1.99) |  1000

ROW语句用于构建组合类型的值。创建数据表的同时，会创建同名的type，如

 -- 相当于同时执行了CREATE TYPE complex AS (r double precision, i double precision);
CREATE TABLE complex (r double precision, i double precision); 

CREATE TABLE my_complex (name text, value complex); 

INSERT INTO my_complex (name, value) VALUES ('one', ROW(1.0, 1.0)); 

SELECT * FROM my_complex;
 name | value
 ------+-------
 one  | (1,1)

查询组合类型，用()包围组合类型的列名，用.指向组合类型的“域”

 SELECT (value).r FROM my_complex WHERE (value).i=1;
r
---
1

创建索引

 CREATE INDEX idx_real ON my_complex (((value).r));
\d my_complex;
            Table "public.my_complex"
Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
name   | text    |           |          |
value  | complex |           |          |
Indexes:
   "idx_real" btree (((value).r))
CREATE INDEX idx_complex ON my_complex (value);
\d my_complex;
            Table "public.my_complex"
Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
name   | text    |           |          |
value  | complex |           |          |
Indexes:
   "idx_complex" btree (value)
   "idx_real" btree (((value).r))

范围类型（range types）

范围类型，用于表示基本类型的范围，如

int4range - 4字节类型范围
int8range - 8字节类型范围
numrange - numeric类型范围
tsrange - 不带timezone的时间戳范围
tstzrange - 带timezone的时间戳范围
daterange - 日期范围

 -- 创建数据表 
CREATE TABLE reservation (room int, during tstzrange);
 
-- 插入数据
INSERT INTO reservation (room, during) VALUES (101, '[2020-01-06 10:00:00, 2020-01-07 12:00:00)'); 

SELECT * FROM reservation;
 room |                       during                        
 ------+-----------------------------------------------------
 101 | ["2020-01-06 10:00:00+08","2020-01-07 12:00:00+08")

开区间、闭区间区间由(lower-bound, upper-bound)表示，“[”或"]"代表闭合，“(”或“)"代表打开 '[2020-01-06 10:00:00, 2020-01-07 12:00:00)'构造的区间，包含下界2020-01-06 10:00:00，不包含上界2020-01-07 12:00:00。

 SELECT * FROM reservation WHERE during @> '2020-01-07 12:00:00'::timestamptz;
 room | during
 ------+--------
 (0 rows) 

SELECT * FROM reservation WHERE during @> '2020-01-06 10:00:00'::timestamptz;
 room |                       during                        
 ------+-----------------------------------------------------
 101 | ["2020-01-06 10:00:00+08","2020-01-07 12:00:00+08")
 (1 row)

索引

 CREATE INDEX idx_during ON reservation USING GIST (during);

约束。以下约束将不允许任意两个during数据有重合部分。

 CREATE TABLE reservation (
 during tsrange,
 EXCLUDE USING GIST (during WITH &&)
 );

领域类型（domain types）

领域类型，基于已经存在的类型，通常用于给类型增加约束，使已存在的类型适用于特定“领域”。

CREATE DOMAIN posint AS integer CHECK (VALUE > 0);

CREATE TABLE mytable (id posint);

INSERT INTO mytable VALUES(1);   -- works

INSERT INTO mytable VALUES(-1);  -- fails

THE END

数据库(PostgreSQL)查询优化：使用联合索引

<<上一篇

API开发之RESTful 规范教程：什么是 RESTful ？该如何学习？

下一篇>>