ES4D: Accelerating Exact Similarity Search for High-Dimensional Vectors via Vector Slicing and In-SSD Distance Calculation
TimeTuesday, July 12th6pm - 7pm PDT
LocationLevel 2 Lobby
Event Type
Networking Reception
Work-in-Progress Poster
DescriptionSearching top-k nearest neighbor (kNN) based on the vector similarity is a common problem in many domains. Nowadays, deep learning models are frequently used to generate high-dimensional vectors
that represent the various types of data. For such high-dimensional vectors, finding exact kNN is especially challenging since traditional pruning approaches suffer from the curse of dimensionality. When the data volume is large, the dataset cannot fit in the main memory and has to be stored on a storage device such as flash memory. Unlike approximate kNN search, exact kNN needs to check a large portion of the entire dataset, if not the entire set.

ES4D is a kNN search platform implemented near-data on the solid-state drive (SSD). ES4D accelerates kNN search by using two levels of early termination, using pre-clustering of the dataset vectors and vector sharding. ES4D incorporates several optimization techniques to aid such early terminations. Using near-data processing, ES4D further enhances performance and reduces energy consumption. By performing the kNN search on the SSD side using an added distance calculation module, ES4D eliminates the I/O stack overhead and improves energy efficiency. Also, ES4D optimizes the physical page placement of dataset vectors on SSD to maximize the search throughput. Compared to the na¨ıve linear search, on the host machine, ES4D achieves 2.5× search performance improvement.