Finding Similar Functions with joern-toolsΒΆ
Embed functions in vector space.
- Represents functions by the API symbols used
- Applies TF-IDF weighting
- Dumps data in libsvm format
joern-stream-apiembedder
To allow this to scale to large code bases:
- database requests are chunked to not keep all results in memory at any point in time
- data is streamed onto disk
Determine nearest neighbors.
Get a list of available functions first:
joern-list-funcs
Get id of function by name:
joern-list-funcs -p VLCEyeTVPluginInitialize | awk -F "\t" '{print $2}'
where VLCEyeTVPluginInitialize
is the name of the function in this example.
Lookup nearest neighbors.
joern-list-funcs -p VLCEyeTVPluginInitialize | awk -F "\t" '{print $2}' | joern-knn
Show location name or location.
joern-list-funcs -p VLCEyeTVPluginInitialize | awk -F "\t" '{print $2}' | joern-knn
joern-list-funcs -p VLCEyeTVPluginInitialize | awk -F "\t" '{print $2}' | joern-knn | joern-location
Dump code or open in editor.
joern-list-funcs -p VLCEyeTVPluginInitialize | awk -F "\t" '{print $2}' | joern-knn | joern-location | joern-code
joern-list-funcs -p VLCEyeTVPluginInitialize | awk -F "\t" '{print $2}' | joern-knn | joern-location | joern-editor