AI Vector Search #1 - 기본 기능
AI Vector Search #2 - HNSW 벡터 인덱스
AI Vector Search #3 - IVF 벡터 인덱스
HNSW(Navigable Small World Graph) 벡터 인덱스는 계층적 탐색이 가능한 In-Memory Neighbor Graph 벡터 인덱스의 한 형태로 벡터 근사 유사도 검색에 효율적인 인덱스입니다. 이 글에서는 HNSW 벡터 인덱스의 구조와 동작에 대해 간단히 살펴보겠습니다.
테스트 버전은 아래와 같습니다.
-- 1
SELECT version_full FROM product_component_version;
VERSION_FULL
------------
23.4.0.24.05
1 row selected.
테스트를 위해 아래와 같이 테이블을 생성하겠습니다.
-- 2
DROP TABLE t1 PURGE;
CREATE TABLE t1 (c1, c2, c3, CONSTRAINT t1_pk PRIMARY KEY (c1, c2)) AS
SELECT a.c1, b.c1, TO_VECTOR (JSON_ARRAY (a.c1, b.c1))
FROM (SELECT ROWNUM AS c1 FROM XMLTABLE ('1 to 3')) a
, (SELECT ROWNUM AS c1 FROM XMLTABLE ('1 to 3')) b;
인덱스 생성에 앞서 vector_memory_size 파라미터로 벡터 메모리를 할당해야 HNSW 벡터 인덱스를 생설할 수 있습니다. 참고로 vector_memory_size 파라미터는 issys_modifiable 값이 IMMEDIATE지만 SCOPE를 SPFILE로 지정하지 않고 값을 설정하면 "ORA-51950: The Oracle Database Vector Memory size cannot be increased." 에러가 발생합니다. 벡터 메모리와 관련하여 V$VECTOR_MEMORY_POOL 뷰와 V$VECTOR_MEM_SEGMENTS_DETAIL 뷰가 추가되었습니다.
-- 3-1
ALTER SYSTEM SET vector_memory_size = 128M SCOPE = SPFILE;
SHUTDOWN
STARTUP
-- 3-2
SELECT name, value, display_value, issys_modifiable
FROM v$parameter
WHERE name = 'vector_memory_size';
NAME VALUE DISPLAY_VALUE ISSYS_MODIFIABLE
------------------ --------- ------------- ----------------
vector_memory_size 134217728 128M IMMEDIATE
1 row selected.
-- 3-3
SELECT pool, alloc_bytes, used_bytes, populate_status
FROM v$vector_memory_pool;
POOL ALLOC_BYTES USED_BYTES POPULATE_STATUS
---------------- ----------- ---------- ---------------
1MB POOL 82837504 0 DONE
64KB POOL 33554432 0 DONE
IM POOL METADATA 16777216 16777216 DONE
3 rows selected.
-- 3-4
SELECT obj, membytes
FROM v$vector_mem_segments_detail;
no rows selected
아래와 같이 HNSW 벡터 인덱스를 생성하겠습니다. VECSYS.VECTOR$INDEX 뷰에서 벡터 인덱스 정보를 조회할 수 있습니다. 생성 구문에 대한 상세한 내용은 CREATE VECTOR INDEX 문을 참고하세요.
-- 4-1: Hierarchical Navigable Small World (HNSW)
CREATE VECTOR INDEX t1_x1 ON t1 (c3) ORGANIZATION INMEMORY NEIGHBOR GRAPH DISTANCE EUCLIDEAN WITH TARGET ACCURACY 95;
-- 4-2
SELECT idx_owner#, idx_name, idx_base_table_objn, idx_base_table_owner#
FROM vecsys.vector$index;
IDX_OWNER# IDX_NAME IDX_BASE_TABLE_OBJN IDX_BASE_TABLE_OWNER#
---------- -------- ------------------- ---------------------
133 T1_X1 90009 133
1 row selected.
-- 4-3
SELECT pool, alloc_bytes, used_bytes, populate_status
FROM v$vector_memory_pool;
POOL ALLOC_BYTES USED_BYTES POPULATE_STATUS
---------------- ----------- ---------- ---------------
1MB POOL 82837504 1048576 DONE
64KB POOL 33554432 327680 DONE
IM POOL METADATA 16777216 16777216 DONE
3 rows selected.
-- 4-4
SELECT obj, membytes
FROM v$vector_mem_segments_detail;
OBJ MEMBYTES
----- --------
0 131072
90013 1245184
2 rows selected.
HNSW 벡터 인덱스의 index_type 값은 VECTOR, index_subtype 값은 INMEMORY_NEIGHBOR_GRAPH_HNSW입니다. HNSW 벡터 인덱스는 In-Memory 인덱스이므로 인덱스 세그먼트가 생성되지 않으며, 암시적으로 추가 오브젝트를 생성합니다. *_INDEXES 뷰를 조회하면 2개의 B-tree 인덱스와 3개의 테이블이 생성된 것을 확인할 수 있습니다.
-- 5-1
SELECT index_name, index_type, index_subtype
FROM user_indexes
WHERE index_name LIKE '%T1_X1%';
INDEX_NAME INDEX_TYPE INDEX_SUBTYPE
---------- ---------- ----------------------------
T1_X1 VECTOR INMEMORY_NEIGHBOR_GRAPH_HNSW
1 row selected.
-- 5-2
SELECT segment_name, segment_type
FROM user_segments
WHERE segment_name = 'T1_X1';
no rows selected
-- 5-3
SELECT index_name, index_type, table_name, uniqueness, partitioned
FROM user_indexes
WHERE table_name LIKE '%T1_X1%';
INDEX_NAME INDEX_TYPE TABLE_NAME UNIQUENESS PARTITIONED
------------------------ ---------- ------------------------------------------------------------------ ---------- -----------
SYS_C008641 NORMAL VECTOR$T1_X1$90009_90013_0$HNSW_ROWID_VID_MAP UNIQUE NO
SYS_IL0000090019C00007$$ LOB VECTOR$T1_X1$90009_90013_0$HNSW_SHARED_JOURNAL_CHANGE_LOG UNIQUE YES
PK_XID_90013 NORMAL VECTOR$T1_X1$90009_90013_0$HNSW_SHARED_JOURNAL_TRANSACTION_COMMITS UNIQUE NO
3 rows selected.
테이블의 접두어는 VECTOR${index_name}${sequence}$HNSW_입니다. 접미어가 ROWID_VID_MAP인 테이블은 BASE 테이블의 ROWID가 저장되어 있습니다. 접미어가 SHARED_JOURNAL_CHANGE_LOG와 SHARED_JOURNAL_TRANSACTION_COMMITS인 테이블은 데이터가 존재하지 않으며 향후 DML 지원을 위한 테이블로 예상됩니다.
-- 6-1
SELECT table_name, partitioned
FROM user_tables
WHERE table_name LIKE '%T1_X1%';
TABLE_NAME PARTITIONED
------------------------------------------------------------------ -----------
VECTOR$T1_X1$90009_90013_0$HNSW_ROWID_VID_MAP NO
VECTOR$T1_X1$90009_90013_0$HNSW_SHARED_JOURNAL_CHANGE_LOG YES
VECTOR$T1_X1$90009_90013_0$HNSW_SHARED_JOURNAL_TRANSACTION_COMMITS YES
3 rows selected.
-- 6-2
SELECT table_name, partitioning_type, subpartitioning_type, partition_count
FROM user_part_tables
WHERE table_name LIKE '%T1_X1%';
TABLE_NAME PARTITIONING_TYPE SUBPARTITIONING_TYPE PARTITION_COUNT
------------------------------------------------------------------ ----------------- -------------------- ---------------
VECTOR$T1_X1$90009_90013_0$HNSW_SHARED_JOURNAL_CHANGE_LOG REFERENCE NONE 1048575
VECTOR$T1_X1$90009_90013_0$HNSW_SHARED_JOURNAL_TRANSACTION_COMMITS RANGE NONE 1048575
2 rows selected.
-- 6-3
SELECT name, column_name, column_position
FROM user_part_key_columns
WHERE name LIKE '%T1_X1%';
NAME COLUMN_NAME COLUMN_POSITION
------------------------------------------------------------------ ----------- ---------------
VECTOR$T1_X1$90009_90013_0$HNSW_SHARED_JOURNAL_CHANGE_LOG USN 1
VECTOR$T1_X1$90009_90013_0$HNSW_SHARED_JOURNAL_CHANGE_LOG SLOT 2
VECTOR$T1_X1$90009_90013_0$HNSW_SHARED_JOURNAL_CHANGE_LOG SEQ 3
VECTOR$T1_X1$90009_90013_0$HNSW_SHARED_JOURNAL_TRANSACTION_COMMITS COMMIT_SCN 1
4 rows selected.
-- 6-4
SELECT *
FROM vector$t1_x1$90009_90013_0$hnsw_rowid_vid_map;
BASE_TABLE_ROWID VERTEX_ID
------------------ ---------
AAAV+ZAAAAAACIjAAA 1
AAAV+ZAAAAAACIjAAB 2
AAAV+ZAAAAAACIjAAC 3
AAAV+ZAAAAAACIjAAD 4
AAAV+ZAAAAAACIjAAE 5
AAAV+ZAAAAAACIjAAF 6
AAAV+ZAAAAAACIjAAG 7
AAAV+ZAAAAAACIjAAH 8
AAAV+ZAAAAAACIjAAI 9
9 rows selected.
-- 6-5
SELECT *
FROM vector$t1_x1$90009_90013_0$hnsw_shared_journal_change_log;
no rows selected
-- 6-7
DESC vector$t1_x1$90009_90013_0$hnsw_shared_journal_change_log
Name Null? Type
---------------- -------- ------------
USN NOT NULL NUMBER
SLOT NOT NULL NUMBER
SEQ NOT NULL NUMBER
XCN NUMBER
BASE_TABLE_ROWID ROWID
DML_OP VARCHAR2(10)
DATA_VECTOR CLOB
-- 6-8
SELECT *
FROM vector$t1_x1$90009_90013_0$hnsw_shared_journal_transaction_commits;
no rows selected
-- 6-9
DESC vector$t1_x1$90009_90013_0$hnsw_shared_journal_transaction_commits
Name Null? Type
---------- -------- ------
USN NOT NULL NUMBER
SLOT NOT NULL NUMBER
SEQ NOT NULL NUMBER
COMMIT_SCN NOT NULL NUMBER
아래는 HNSW 벡터 인덱스를 사용한 실행 계획입니다. 실행 계획 5번에 VECTOR INDEX HNSW SCAN 오퍼레이션이 표시되고, Outline Data 항목에서 VECTOR_INDEX_SCAN 힌트가 사용된 것을 확인할 수 있습니다. 실행 계획을 살펴보면 A-Rows는 부분 범위 처리가 동작한 것으로 표시되지만 실행 계획 3번에서 소트가 발생한 것으로 보아 HNSW 벡터 인덱스로 순서와 무관한 k개의 인접한 벡터를 조회하고 인덱스에 저장된 ROWID로 테이블을 조회한 후 결과를 정렬하는 방식으로 동작하는 것을 유추할 수 있습니다.
-- 7
SELECT c1, c2, c3, VECTOR_DISTANCE (c3, '[1,1]', EUCLIDEAN) AS euclidean
FROM t1
ORDER BY VECTOR_DISTANCE (c3, '[1,1]', EUCLIDEAN)
FETCH APPROX FIRST 5 ROWS ONLY;
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | Buffers | Used-Mem |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 5 | 1 | |
|* 1 | COUNT STOPKEY | | 1 | | 5 | 1 | |
| 2 | VIEW | | 1 | 5 | 5 | 1 | |
|* 3 | SORT ORDER BY STOPKEY | | 1 | 5 | 5 | 1 | 2048 (0)| --> SORT
| 4 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 5 | 5 | 1 | |
| 5 | VECTOR INDEX HNSW SCAN | T1_X1 | 1 | 5 | 5 | 0 | |
------------------------------------------------------------------------------------------------
Outline Data
-------------
VECTOR_INDEX_SCAN(@"SEL$1" "T1"@"SEL$1" "T1_X1")
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=5)
3 - filter(ROWNUM<=5)
아래 쿼리는 7번 쿼리에 (c1, c2) != (1, 1) 조건을 추가했습니다. 실행 계획 5번에 VECTOR INDEX HNSW SCAN IN-FILTER 오퍼레이션이 표시되고, Outline Data 항목에서 VECTOR_INDEX_TRANSFORM 힌트가 사용된 것을 확인할 수 있습니다.
-- 8
SELECT c1, c2, c3, VECTOR_DISTANCE (c3, '[1,1]', EUCLIDEAN) AS euclidean
FROM t1
WHERE (c1, c2) != (1, 1)
ORDER BY VECTOR_DISTANCE (c3, '[1,1]', EUCLIDEAN)
FETCH APPROX FIRST 5 ROWS ONLY;
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | A-Rows | Buffers | Used-Mem |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 2 | | -- (8)
|* 1 | COUNT STOPKEY | | 1 | 5 | 2 | | -- (7)
| 2 | VIEW | | 1 | 5 | 2 | | -- (6)
|* 3 | SORT ORDER BY STOPKEY | | 1 | 5 | 2 | 2048 (0)| -- (5)
|* 4 | TABLE ACCESS BY INDEX ROWID | T1 | 1 | 5 | 2 | | -- (4)
| 5 | VECTOR INDEX HNSW SCAN IN-FILTER| T1_X1 | 1 | 5 | 1 | | -- (1)
| 6 | VIEW | VW_HIJ_BA8ECEFB | 9 | 8 | 1 | | -- (3)
|* 7 | TABLE ACCESS BY USER ROWID | T1 | 9 | 8 | 1 | | -- (2)
-------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
5 - SEL$22E65899 / "T1"@"SEL$1"
Outline Data
-------------
VECTOR_INDEX_TRANSFORM(@"SEL$1" "T1"@"SEL$1" "T1_X1" IN_FILTER_WITH_JOIN_BACK)
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=5)
3 - filter(ROWNUM<=5)
4 - filter(("T1"."C2"<>1 OR "T1"."C1"<>1))
7 - filter(("T1"."C2"<>1 OR "T1"."C1"<>1))
문서에 따르면 HNSW 벡터 인덱스를 사용하는 쿼리에 필터 조건이 존재하는 경우 필터링 순서(pre-filtering, in-filtering)와 조인 백(join-back, no-join-back) 여부에 따라 아래의 네 가지 실행 계획이 생성될 수 있습니다.
-- 9-1
/*+ VECTOR_INDEX_TRANSFORM(T1 T1_X1 IN_FILTER_WITH_JOIN_BACK) */
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | A-Rows | Buffers | Used-Mem |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 2 | | -- (8)
|* 1 | COUNT STOPKEY | | 1 | 5 | 2 | | -- (7)
| 2 | VIEW | | 1 | 5 | 2 | | -- (6)
|* 3 | SORT ORDER BY STOPKEY | | 1 | 5 | 2 | 2048 (0)| -- (5)
|* 4 | TABLE ACCESS BY INDEX ROWID | T1 | 1 | 5 | 2 | | -- (4) JOIN
| 5 | VECTOR INDEX HNSW SCAN IN-FILTER| T1_X1 | 1 | 5 | 1 | | -- (1)
| 6 | VIEW | VW_HIJ_BA8ECEFB | 9 | 8 | 1 | | -- (3)
|* 7 | TABLE ACCESS BY USER ROWID | T1 | 9 | 8 | 1 | | -- (2)
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=5)
3 - filter(ROWNUM<=5)
4 - filter(("T1"."C2"<>1 OR "T1"."C1"<>1))
7 - filter(("T1"."C2"<>1 OR "T1"."C1"<>1))
-- 9-2
/*+ VECTOR_INDEX_TRANSFORM(T1 T1_X1 IN_FILTER_WITHOUT_JOIN_BACK) */
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | A-Rows | Buffers | Used-Mem |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 1 | | -- (7)
|* 1 | COUNT STOPKEY | | 1 | 5 | 1 | | -- (6)
| 2 | VIEW | | 1 | 5 | 1 | | -- (5)
|* 3 | SORT ORDER BY STOPKEY | | 1 | 5 | 1 | 2048 (0)| -- (4)
| 4 | VECTOR INDEX HNSW SCAN IN-FILTER| T1_X1 | 1 | 5 | 1 | 534K (0)| -- (1)
| 5 | VIEW | VW_HIF_BA8ECEFB | 9 | 8 | 1 | | -- (3)
|* 6 | TABLE ACCESS BY USER ROWID | T1 | 9 | 8 | 1 | | -- (2)
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=5)
3 - filter(ROWNUM<=5)
6 - filter(("T1"."C2"<>1 OR "T1"."C1"<>1))
-- 9-3
/*+ VECTOR_INDEX_TRANSFORM(T1 T1_X1 PRE_FILTER_WITH_JOIN_BACK) */
--------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | A-Rows | Buffers | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 16 | | -- (11)
|* 1 | COUNT STOPKEY | | 1 | 5 | 16 | | -- (10)
| 2 | VIEW | | 1 | 5 | 16 | | -- (9)
|* 3 | SORT ORDER BY STOPKEY | | 1 | 5 | 16 | 2048 (0)| -- (8)
|* 4 | TABLE ACCESS BY INDEX ROWID | T1 | 1 | 5 | 16 | | -- (7)
| 5 | VECTOR INDEX HNSW SCAN PRE-FILTER| T1_X1 | 1 | 5 | 15 | | -- (6)
| 6 | VIEW | VW_HPJ_7513CF2B | 1 | 8 | 15 | | -- (5)
| 7 | NESTED LOOPS OUTER | | 1 | 8 | 15 | | -- (2)
|* 8 | TABLE ACCESS FULL | T1 | 1 | 8 | 3 | | -- (1)
| 9 | TABLE ACCESS BY INDEX ROWID | VECTOR$T1_X1$90009_90013_0$HNSW_ROWID_VID_MAP | 8 | 8 | 12 | | -- (4)
|* 10 | INDEX UNIQUE SCAN | SYS_C008687 | 8 | 8 | 4 | | -- (3)
--------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=5)
3 - filter(ROWNUM<=5)
4 - filter(("T1"."C2"<>1 OR "T1"."C1"<>1))
8 - filter(("T1"."C2"<>1 OR "T1"."C1"<>1))
10 - access("T1".ROWID="VTIX_RIDVID"."BASE_TABLE_ROWID")
-- 9-4
/*+ VECTOR_INDEX_TRANSFORM(T1 T1_X1 PRE_FILTER_WITHOUT_JOIN_BACK) */
-------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | A-Rows | Buffers | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 15 | | -- (10)
|* 1 | COUNT STOPKEY | | 1 | 5 | 15 | | -- (9)
| 2 | VIEW | | 1 | 5 | 15 | | -- (8)
|* 3 | SORT ORDER BY STOPKEY | | 1 | 5 | 15 | 2048 (0)| -- (7)
| 4 | VECTOR INDEX HNSW SCAN PRE-FILTER| T1_X1 | 1 | 5 | 15 | 530K (0)| -- (6)
| 5 | VIEW | VW_HPF_BA8ECEFB | 1 | 8 | 15 | | -- (5)
| 6 | NESTED LOOPS OUTER | | 1 | 8 | 15 | | -- (2)
|* 7 | TABLE ACCESS FULL | T1 | 1 | 8 | 3 | | -- (1)
| 8 | TABLE ACCESS BY INDEX ROWID | VECTOR$T1_X1$90009_90013_0$HNSW_ROWID_VID_MAP | 8 | 8 | 12 | | -- (4)
|* 9 | INDEX UNIQUE SCAN | SYS_C008687 | 8 | 8 | 4 | | -- (3)
-------------------------------------------------------------------------------------------------------------------------------------
아쉽게도 HNSW 벡터 인덱스를 생성한 테이블은 DML을 지원하지 않는 것으로 보입니다.
-- 10
INSERT INTO t1 SELECT * FROM t1;
ORA-51928: Data Manipulation Language (DML) on tables with in-memory neighbor graph vector index is not supported.
[2024-11-28]
24.6 버전부터 HNSW 벡터 인덱스를 생성한 테이블에 대한 DML을 지원합니다.