AI Vector Search #1 - 기본 기능
AI Vector Search #2 - HNSW 벡터 인덱스
AI Vector Search #3 - IVF 벡터 인덱스
HNSW(Navigable Small World Graph) 벡터 인덱스는 계층적 탐색이 가능한 In-Memory Neighbor Graph 벡터 인덱스의 한 형태로 벡터 근사 유사도 검색에 효율적인 인덱스입니다. 이 글에서는 HNSW 벡터 인덱스의 구조와 동작에 대해 간단히 살펴보겠습니다.
테스트 버전은 아래와 같습니다.
-- 1 SELECT version_full FROM product_component_version; VERSION_FULL ------------ 23.4.0.24.05 1 row selected.
테스트를 위해 아래와 같이 테이블을 생성하겠습니다.
-- 2 DROP TABLE t1 PURGE; CREATE TABLE t1 (c1, c2, c3, CONSTRAINT t1_pk PRIMARY KEY (c1, c2)) AS SELECT a.c1, b.c1, TO_VECTOR (JSON_ARRAY (a.c1, b.c1)) FROM (SELECT ROWNUM AS c1 FROM XMLTABLE ('1 to 3')) a , (SELECT ROWNUM AS c1 FROM XMLTABLE ('1 to 3')) b;
인덱스 생성에 앞서 vector_memory_size 파라미터로 벡터 메모리를 할당해야 HNSW 벡터 인덱스를 생설할 수 있습니다. 참고로 vector_memory_size 파라미터는 issys_modifiable 값이 IMMEDIATE지만 SCOPE를 SPFILE로 지정하지 않고 값을 설정하면 "ORA-51950: The Oracle Database Vector Memory size cannot be increased." 에러가 발생합니다. 벡터 메모리와 관련하여 V$VECTOR_MEMORY_POOL 뷰와 V$VECTOR_MEM_SEGMENTS_DETAIL 뷰가 추가되었습니다.
-- 3-1 ALTER SYSTEM SET vector_memory_size = 128M SCOPE = SPFILE; SHUTDOWN STARTUP -- 3-2 SELECT name, value, display_value, issys_modifiable FROM v$parameter WHERE name = 'vector_memory_size'; NAME VALUE DISPLAY_VALUE ISSYS_MODIFIABLE ------------------ --------- ------------- ---------------- vector_memory_size 134217728 128M IMMEDIATE 1 row selected. -- 3-3 SELECT pool, alloc_bytes, used_bytes, populate_status FROM v$vector_memory_pool; POOL ALLOC_BYTES USED_BYTES POPULATE_STATUS ---------------- ----------- ---------- --------------- 1MB POOL 82837504 0 DONE 64KB POOL 33554432 0 DONE IM POOL METADATA 16777216 16777216 DONE 3 rows selected. -- 3-4 SELECT obj, membytes FROM v$vector_mem_segments_detail; no rows selected
아래와 같이 HNSW 벡터 인덱스를 생성하겠습니다. VECSYS.VECTOR$INDEX 뷰에서 벡터 인덱스 정보를 조회할 수 있습니다. 생성 구문에 대한 상세한 내용은 CREATE VECTOR INDEX 문을 참고하세요.
-- 4-1: Hierarchical Navigable Small World (HNSW) CREATE VECTOR INDEX t1_x1 ON t1 (c3) ORGANIZATION INMEMORY NEIGHBOR GRAPH DISTANCE EUCLIDEAN WITH TARGET ACCURACY 95; -- 4-2 SELECT idx_owner#, idx_name, idx_base_table_objn, idx_base_table_owner# FROM vecsys.vector$index; IDX_OWNER# IDX_NAME IDX_BASE_TABLE_OBJN IDX_BASE_TABLE_OWNER# ---------- -------- ------------------- --------------------- 133 T1_X1 90009 133 1 row selected. -- 4-3 SELECT pool, alloc_bytes, used_bytes, populate_status FROM v$vector_memory_pool; POOL ALLOC_BYTES USED_BYTES POPULATE_STATUS ---------------- ----------- ---------- --------------- 1MB POOL 82837504 1048576 DONE 64KB POOL 33554432 327680 DONE IM POOL METADATA 16777216 16777216 DONE 3 rows selected. -- 4-4 SELECT obj, membytes FROM v$vector_mem_segments_detail; OBJ MEMBYTES ----- -------- 0 131072 90013 1245184 2 rows selected.
HNSW 벡터 인덱스의 index_type 값은 VECTOR, index_subtype 값은 INMEMORY_NEIGHBOR_GRAPH_HNSW입니다. HNSW 벡터 인덱스는 In-Memory 인덱스이므로 인덱스 세그먼트가 생성되지 않으며, 암시적으로 추가 오브젝트를 생성합니다. *_INDEXES 뷰를 조회하면 2개의 B-tree 인덱스와 3개의 테이블이 생성된 것을 확인할 수 있습니다.
-- 5-1 SELECT index_name, index_type, index_subtype FROM user_indexes WHERE index_name LIKE '%T1_X1%'; INDEX_NAME INDEX_TYPE INDEX_SUBTYPE ---------- ---------- ---------------------------- T1_X1 VECTOR INMEMORY_NEIGHBOR_GRAPH_HNSW 1 row selected. -- 5-2 SELECT segment_name, segment_type FROM user_segments WHERE segment_name = 'T1_X1'; no rows selected -- 5-3 SELECT index_name, index_type, table_name, uniqueness, partitioned FROM user_indexes WHERE table_name LIKE '%T1_X1%'; INDEX_NAME INDEX_TYPE TABLE_NAME UNIQUENESS PARTITIONED ------------------------ ---------- ------------------------------------------------------------------ ---------- ----------- SYS_C008641 NORMAL VECTOR$T1_X1$90009_90013_0$HNSW_ROWID_VID_MAP UNIQUE NO SYS_IL0000090019C00007$$ LOB VECTOR$T1_X1$90009_90013_0$HNSW_SHARED_JOURNAL_CHANGE_LOG UNIQUE YES PK_XID_90013 NORMAL VECTOR$T1_X1$90009_90013_0$HNSW_SHARED_JOURNAL_TRANSACTION_COMMITS UNIQUE NO 3 rows selected.
테이블의 접두어는 VECTOR${index_name}${sequence}$HNSW_입니다. 접미어가 ROWID_VID_MAP인 테이블은 BASE 테이블의 ROWID가 저장되어 있습니다. 접미어가 SHARED_JOURNAL_CHANGE_LOG와 SHARED_JOURNAL_TRANSACTION_COMMITS인 테이블은 데이터가 존재하지 않으며 향후 DML 지원을 위한 테이블로 예상됩니다.
-- 6-1 SELECT table_name, partitioned FROM user_tables WHERE table_name LIKE '%T1_X1%'; TABLE_NAME PARTITIONED ------------------------------------------------------------------ ----------- VECTOR$T1_X1$90009_90013_0$HNSW_ROWID_VID_MAP NO VECTOR$T1_X1$90009_90013_0$HNSW_SHARED_JOURNAL_CHANGE_LOG YES VECTOR$T1_X1$90009_90013_0$HNSW_SHARED_JOURNAL_TRANSACTION_COMMITS YES 3 rows selected. -- 6-2 SELECT table_name, partitioning_type, subpartitioning_type, partition_count FROM user_part_tables WHERE table_name LIKE '%T1_X1%'; TABLE_NAME PARTITIONING_TYPE SUBPARTITIONING_TYPE PARTITION_COUNT ------------------------------------------------------------------ ----------------- -------------------- --------------- VECTOR$T1_X1$90009_90013_0$HNSW_SHARED_JOURNAL_CHANGE_LOG REFERENCE NONE 1048575 VECTOR$T1_X1$90009_90013_0$HNSW_SHARED_JOURNAL_TRANSACTION_COMMITS RANGE NONE 1048575 2 rows selected. -- 6-3 SELECT name, column_name, column_position FROM user_part_key_columns WHERE name LIKE '%T1_X1%'; NAME COLUMN_NAME COLUMN_POSITION ------------------------------------------------------------------ ----------- --------------- VECTOR$T1_X1$90009_90013_0$HNSW_SHARED_JOURNAL_CHANGE_LOG USN 1 VECTOR$T1_X1$90009_90013_0$HNSW_SHARED_JOURNAL_CHANGE_LOG SLOT 2 VECTOR$T1_X1$90009_90013_0$HNSW_SHARED_JOURNAL_CHANGE_LOG SEQ 3 VECTOR$T1_X1$90009_90013_0$HNSW_SHARED_JOURNAL_TRANSACTION_COMMITS COMMIT_SCN 1 4 rows selected. -- 6-4 SELECT * FROM vector$t1_x1$90009_90013_0$hnsw_rowid_vid_map; BASE_TABLE_ROWID VERTEX_ID ------------------ --------- AAAV+ZAAAAAACIjAAA 1 AAAV+ZAAAAAACIjAAB 2 AAAV+ZAAAAAACIjAAC 3 AAAV+ZAAAAAACIjAAD 4 AAAV+ZAAAAAACIjAAE 5 AAAV+ZAAAAAACIjAAF 6 AAAV+ZAAAAAACIjAAG 7 AAAV+ZAAAAAACIjAAH 8 AAAV+ZAAAAAACIjAAI 9 9 rows selected. -- 6-5 SELECT * FROM vector$t1_x1$90009_90013_0$hnsw_shared_journal_change_log; no rows selected -- 6-7 DESC vector$t1_x1$90009_90013_0$hnsw_shared_journal_change_log Name Null? Type ---------------- -------- ------------ USN NOT NULL NUMBER SLOT NOT NULL NUMBER SEQ NOT NULL NUMBER XCN NUMBER BASE_TABLE_ROWID ROWID DML_OP VARCHAR2(10) DATA_VECTOR CLOB -- 6-8 SELECT * FROM vector$t1_x1$90009_90013_0$hnsw_shared_journal_transaction_commits; no rows selected -- 6-9 DESC vector$t1_x1$90009_90013_0$hnsw_shared_journal_transaction_commits Name Null? Type ---------- -------- ------ USN NOT NULL NUMBER SLOT NOT NULL NUMBER SEQ NOT NULL NUMBER COMMIT_SCN NOT NULL NUMBER
아래는 HNSW 벡터 인덱스를 사용한 실행 계획입니다. 실행 계획 5번에 VECTOR INDEX HNSW SCAN 오퍼레이션이 표시되고, Outline Data 항목에서 VECTOR_INDEX_SCAN 힌트가 사용된 것을 확인할 수 있습니다. 실행 계획을 살펴보면 A-Rows는 부분 범위 처리가 동작한 것으로 표시되지만 실행 계획 3번에서 소트가 발생한 것으로 보아 HNSW 벡터 인덱스로 순서와 무관한 k개의 인접한 벡터를 조회하고 인덱스에 저장된 ROWID로 테이블을 조회한 후 결과를 정렬하는 방식으로 동작하는 것을 유추할 수 있습니다.
-- 7 SELECT c1, c2, c3, VECTOR_DISTANCE (c3, '[1,1]', EUCLIDEAN) AS euclidean FROM t1 ORDER BY VECTOR_DISTANCE (c3, '[1,1]', EUCLIDEAN) FETCH APPROX FIRST 5 ROWS ONLY; ------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | Buffers | Used-Mem | ------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | 5 | 1 | | |* 1 | COUNT STOPKEY | | 1 | | 5 | 1 | | | 2 | VIEW | | 1 | 5 | 5 | 1 | | |* 3 | SORT ORDER BY STOPKEY | | 1 | 5 | 5 | 1 | 2048 (0)| --> SORT | 4 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 5 | 5 | 1 | | | 5 | VECTOR INDEX HNSW SCAN | T1_X1 | 1 | 5 | 5 | 0 | | ------------------------------------------------------------------------------------------------ Outline Data ------------- VECTOR_INDEX_SCAN(@"SEL$1" "T1"@"SEL$1" "T1_X1") Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=5) 3 - filter(ROWNUM<=5)
아래 쿼리는 7번 쿼리에 (c1, c2) != (1, 1) 조건을 추가했습니다. 실행 계획 5번에 VECTOR INDEX HNSW SCAN IN-FILTER 오퍼레이션이 표시되고, Outline Data 항목에서 VECTOR_INDEX_TRANSFORM 힌트가 사용된 것을 확인할 수 있습니다.
-- 8 SELECT c1, c2, c3, VECTOR_DISTANCE (c3, '[1,1]', EUCLIDEAN) AS euclidean FROM t1 WHERE (c1, c2) != (1, 1) ORDER BY VECTOR_DISTANCE (c3, '[1,1]', EUCLIDEAN) FETCH APPROX FIRST 5 ROWS ONLY; ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | A-Rows | Buffers | Used-Mem | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 5 | 2 | | -- (8) |* 1 | COUNT STOPKEY | | 1 | 5 | 2 | | -- (7) | 2 | VIEW | | 1 | 5 | 2 | | -- (6) |* 3 | SORT ORDER BY STOPKEY | | 1 | 5 | 2 | 2048 (0)| -- (5) |* 4 | TABLE ACCESS BY INDEX ROWID | T1 | 1 | 5 | 2 | | -- (4) | 5 | VECTOR INDEX HNSW SCAN IN-FILTER| T1_X1 | 1 | 5 | 1 | | -- (1) | 6 | VIEW | VW_HIJ_BA8ECEFB | 9 | 8 | 1 | | -- (3) |* 7 | TABLE ACCESS BY USER ROWID | T1 | 9 | 8 | 1 | | -- (2) ------------------------------------------------------------------------------------------------------- Query Block Name / Object Alias (identified by operation id): ------------------------------------------------------------- 5 - SEL$22E65899 / "T1"@"SEL$1" Outline Data ------------- VECTOR_INDEX_TRANSFORM(@"SEL$1" "T1"@"SEL$1" "T1_X1" IN_FILTER_WITH_JOIN_BACK) Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=5) 3 - filter(ROWNUM<=5) 4 - filter(("T1"."C2"<>1 OR "T1"."C1"<>1)) 7 - filter(("T1"."C2"<>1 OR "T1"."C1"<>1))
문서에 따르면 HNSW 벡터 인덱스를 사용하는 쿼리에 필터 조건이 존재하는 경우 필터링 순서(pre-filtering, in-filtering)와 조인 백(join-back, no-join-back) 여부에 따라 아래의 네 가지 실행 계획이 생성될 수 있습니다.
-- 9-1 /*+ VECTOR_INDEX_TRANSFORM(T1 T1_X1 IN_FILTER_WITH_JOIN_BACK) */ ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | A-Rows | Buffers | Used-Mem | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 5 | 2 | | -- (8) |* 1 | COUNT STOPKEY | | 1 | 5 | 2 | | -- (7) | 2 | VIEW | | 1 | 5 | 2 | | -- (6) |* 3 | SORT ORDER BY STOPKEY | | 1 | 5 | 2 | 2048 (0)| -- (5) |* 4 | TABLE ACCESS BY INDEX ROWID | T1 | 1 | 5 | 2 | | -- (4) JOIN | 5 | VECTOR INDEX HNSW SCAN IN-FILTER| T1_X1 | 1 | 5 | 1 | | -- (1) | 6 | VIEW | VW_HIJ_BA8ECEFB | 9 | 8 | 1 | | -- (3) |* 7 | TABLE ACCESS BY USER ROWID | T1 | 9 | 8 | 1 | | -- (2) ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=5) 3 - filter(ROWNUM<=5) 4 - filter(("T1"."C2"<>1 OR "T1"."C1"<>1)) 7 - filter(("T1"."C2"<>1 OR "T1"."C1"<>1)) -- 9-2 /*+ VECTOR_INDEX_TRANSFORM(T1 T1_X1 IN_FILTER_WITHOUT_JOIN_BACK) */ ------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | A-Rows | Buffers | Used-Mem | ------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 5 | 1 | | -- (7) |* 1 | COUNT STOPKEY | | 1 | 5 | 1 | | -- (6) | 2 | VIEW | | 1 | 5 | 1 | | -- (5) |* 3 | SORT ORDER BY STOPKEY | | 1 | 5 | 1 | 2048 (0)| -- (4) | 4 | VECTOR INDEX HNSW SCAN IN-FILTER| T1_X1 | 1 | 5 | 1 | 534K (0)| -- (1) | 5 | VIEW | VW_HIF_BA8ECEFB | 9 | 8 | 1 | | -- (3) |* 6 | TABLE ACCESS BY USER ROWID | T1 | 9 | 8 | 1 | | -- (2) ------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=5) 3 - filter(ROWNUM<=5) 6 - filter(("T1"."C2"<>1 OR "T1"."C1"<>1)) -- 9-3 /*+ VECTOR_INDEX_TRANSFORM(T1 T1_X1 PRE_FILTER_WITH_JOIN_BACK) */ -------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | A-Rows | Buffers | Used-Mem | -------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 5 | 16 | | -- (11) |* 1 | COUNT STOPKEY | | 1 | 5 | 16 | | -- (10) | 2 | VIEW | | 1 | 5 | 16 | | -- (9) |* 3 | SORT ORDER BY STOPKEY | | 1 | 5 | 16 | 2048 (0)| -- (8) |* 4 | TABLE ACCESS BY INDEX ROWID | T1 | 1 | 5 | 16 | | -- (7) | 5 | VECTOR INDEX HNSW SCAN PRE-FILTER| T1_X1 | 1 | 5 | 15 | | -- (6) | 6 | VIEW | VW_HPJ_7513CF2B | 1 | 8 | 15 | | -- (5) | 7 | NESTED LOOPS OUTER | | 1 | 8 | 15 | | -- (2) |* 8 | TABLE ACCESS FULL | T1 | 1 | 8 | 3 | | -- (1) | 9 | TABLE ACCESS BY INDEX ROWID | VECTOR$T1_X1$90009_90013_0$HNSW_ROWID_VID_MAP | 8 | 8 | 12 | | -- (4) |* 10 | INDEX UNIQUE SCAN | SYS_C008687 | 8 | 8 | 4 | | -- (3) -------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=5) 3 - filter(ROWNUM<=5) 4 - filter(("T1"."C2"<>1 OR "T1"."C1"<>1)) 8 - filter(("T1"."C2"<>1 OR "T1"."C1"<>1)) 10 - access("T1".ROWID="VTIX_RIDVID"."BASE_TABLE_ROWID") -- 9-4 /*+ VECTOR_INDEX_TRANSFORM(T1 T1_X1 PRE_FILTER_WITHOUT_JOIN_BACK) */ ------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | A-Rows | Buffers | Used-Mem | ------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 5 | 15 | | -- (10) |* 1 | COUNT STOPKEY | | 1 | 5 | 15 | | -- (9) | 2 | VIEW | | 1 | 5 | 15 | | -- (8) |* 3 | SORT ORDER BY STOPKEY | | 1 | 5 | 15 | 2048 (0)| -- (7) | 4 | VECTOR INDEX HNSW SCAN PRE-FILTER| T1_X1 | 1 | 5 | 15 | 530K (0)| -- (6) | 5 | VIEW | VW_HPF_BA8ECEFB | 1 | 8 | 15 | | -- (5) | 6 | NESTED LOOPS OUTER | | 1 | 8 | 15 | | -- (2) |* 7 | TABLE ACCESS FULL | T1 | 1 | 8 | 3 | | -- (1) | 8 | TABLE ACCESS BY INDEX ROWID | VECTOR$T1_X1$90009_90013_0$HNSW_ROWID_VID_MAP | 8 | 8 | 12 | | -- (4) |* 9 | INDEX UNIQUE SCAN | SYS_C008687 | 8 | 8 | 4 | | -- (3) -------------------------------------------------------------------------------------------------------------------------------------
아쉽게도 HNSW 벡터 인덱스를 생성한 테이블은 DML을 지원하지 않는 것으로 보입니다.
-- 10 INSERT INTO t1 SELECT * FROM t1; ORA-51928: Data Manipulation Language (DML) on tables with in-memory neighbor graph vector index is not supported.
[2024-11-28]
24.6 버전부터 HNSW 벡터 인덱스를 생성한 테이블에 대한 DML을 지원합니다.