CPPDescent 80d9539
(with uncommitted changes)
A C++ KNN-Graph creation library
|
Part of the “Software Development for Computing Systems” course of the Department of Informatics and Telecommunications, UoA for the first semester of the academic year 2023-2024.
A library that implements the K-NN graph creation algorithm described in [1], [2], [4].
Konstantinos Chousos (1115202000215, sdi2000215 at di.uoa.gr)
Anastasios-Fedon Seitanidis (1115202000179, sdi2000179 at di.uoa.gr)
getMin()
property to Priority Queueremove()
property to Priority Queuefind()
helper function for Priority QueueTo run the main executable, the following commands suffice:
$ cmake -S . -B build $ cmake --build build $ ./build/app/app -K <number of neighbors> <filepath-to-dataset>
For example:
$ ./build/app/app -K 100 ./datasets/00000200.bin -D 20 -T 5 -d 0.001 -r 0.3 For K = 100 For δ = 0.001 For ρ = 0.3 5 random projection trees are used, for D = 20 Dataset: ./datasets/00000200.bin Dimensions: 100 ---------------------------------------------------------- NN-Descent Initializing starting graph... Using random projection tree... Starting graph has been created Number of changes in the graph (c) = 1950 Number of changes in the graph (c) = 540 Number of changes in the graph (c) = 84 Number of changes in the graph (c) = 36 Number of changes in the graph (c) = 15 NN-Descent iterations: 5 NN-Descent K-NN Graph created in 3007 milliseconds for δ = 0.001, ρ = 0.3 Computing recall... Total recall is 98.9451%
Flag | Default value | Usage |
---|---|---|
-K | - | The $K$ number of neighbors for each datapoint. This value must be given. |
-d | 0.01 | The precision parameter $\delta$ used for early termination. If the updates during an iteration are less than $\delta KN$, then the algorithm concludes. |
-r | 0.5 | The sample rate $\rho$ used for sampling. Before local join, we sample $\rho K$ out of the K-NN items marked true for each object to use in local join. |
-D | 0 | The upper bound of vertices a random projection tree's leaf must have. If set to 0, then the starting graph is created randomly. Must be less than $K$. |
-T | 4 | The number of random projection trees to create. Must be at least 2. |
-q | For "quiet". If this flag is set, then the output is a line of comma separated values. Mainly used for automating experiments with the corresponding script. |
The recall percentage that is printed during execution is computed by the following formula, derived from [1, Sec. 3.2].
The recall of one object is the number of its true K-NN members found divided by K. The recall of an approximate K-NNG is the average recall of all objects.
In the wiki, you can find documentation generated by Doxygen. It is also available in pdf form.
This project uses LCOV for its code coverage needs. The results are uploaded upon commit here.
For local development, you will need Make
and CMake
. To install them on an apt-based linux distro, run the following commands:
$ sudo apt update && sudo apt upgrade $ sudo apt install make $ sudo apt install cmake
doxygen
installed.[1] W. Dong, C. Moses, and K. Li, “Efficient k-nearest neighbor graph construction for generic similarity measures,” in Proceedings of the 20th international conference on World wide web, in WWW ’11. New York, NY, USA: Association for Computing Machinery, Mar. 2011, pp. 577–586. doi: 10.1145/1963405.1963487.
[2] “How PyNNDescent works — pynndescent 0.5.0 documentation.” Accessed: Oct. 10, 2023. [Online]. Available: https://pynndescent.readthedocs.io/en/latest/how_pynndescent_works.html
[3] D. Kluser, J. Bokstaller, S. Rutz, and T. Buner, “Fast Single-Core K-Nearest Neighbor Graph Computation.” arXiv, Dec. 13, 2021. doi: 10.48550/arXiv.2112.06630.
[4] PyNNDescent Fast Approximate Nearest Neighbor Search with Numba | SciPy 2021, (Jul. 23, 2021). Accessed: Dec. 05, 2023. [Online Video]. Available: https://www.youtube.com/watch?v=xPadY4_kt3o
[5] J. Brugger, “brj0/nndescent.” Dec. 11, 2023. Accessed: Dec. 11, 2023. [Online]. Available: https://github.com/brj0/nndescent
[6] S. Dasgupta and Y. Freund, “Random projection trees and low dimensional manifolds,” in Proceedings of the fortieth annual ACM symposium on Theory of computing, Victoria British Columbia Canada: ACM, May 2008, pp. 537–546. doi: 10.1145/1374376.1374452.
[7] “GSL - GNU Scientific Library - GNU Project - Free Software Foundation.” Accessed: Jan. 15, 2024. [Online]. Available: https://www.gnu.org/software/gsl/