Importing Code

Populating the Database

Once joern has been installed, you can begin to import code into the database by simply pointing joern.jar to the directory containing the source code:

java -jar $JOERN/bin/joern.jar $CodeDirectory

or, if you want to ensure that the JVM has access to your heap memory

java -Xmx$SIZEg -jar $JOERN/bin/joern.jar $CodeDirectory

where $SIZE is the maximum size of the Java Heap in GB. As an example, you can import $JOERN/testCode.

This will create a Neo4J database directory .joernIndex in your current working directory. Note that if the directory already exists and contains a Neo4J database, joern.jar will add the code to the existing database. You can thus import additional code at any time. If however, you want to create a new database, make sure to delete .joernIndex prior to running joern.jar.

Tainting Arguments (Optional)

Many times, an argument to a library function (e.g., the first argument to recv) is tainted by the library function. There is no way to statically determine this when the code of the library function is not available. Also, Joern does not perform inter-procedural taint-analysis and therefore, by default, symbols passed to functions as arguments are considered used but not defined.

To instruct Joern to consider arguments of a function to be tainted by calls to that function, you can use the tool argumentTainter. For example, by executing

java -jar ./bin/argumentTainter.jar recv 0

from the Joern root directory, all first arguments to recv will be considered tainted and dependency graphs will be recalculated accordingly.

Starting the Database Server

It is possible to access the graph database directly from your scripts by loading the database into memory on script startup. However, it is highly recommended to access data via the Neo4J server instead. The advantage of doing so is that the data is loaded only once for all scripts you may want to execute allowing you to benefit from Neo4J’s caching for increased speed.

To install the neo4j server, download version textbf{1.9.7} from http://www.neo4j.org/download/other_versions.

Once downloaded, unpack the archive into a directory of your choice, which we will call $Neo4jDir in the following.

Next, specificy the location of the database created by joern in your Neo4J server configuration file in $Neo4jDir/conf/neo4j-server.properties:

# neo4j-server.properties
org.neo4j.server.database.location=/$path_to_index/.joernIndex/

For example, if your .joernIndex is located in /home/user/joern/.joernIndex, your configuration file should contain the line:

# neo4j-server.properties
org.neo4j.server.database.location=/home/user/joern/.joernIndex/

Please also make sure that org.neo4j.server.database.location is set only once.

You can now start the database server by issuing the following command:

$Neo4jDir/bin/neo4j console

If your installation of Neo4J is more recent than the libraries bundled with joern, the database might fail to start and request an upgrade of the stored data. This upgrade can be performed on the fly by enabling allow_store_upgrade in neo4j.properties as follows:

# neo4j.properties
allow_store_upgrade=true

The Neo4J server offers a web interface and a web-based API (REST API) to explore and query the database. Once your database server has been launched, point your browser to http://localhost:7474/ .

Next, visit http://localhost:7474/db/data/node/0 . This is the reference node, which is the root node of the graph database. Starting from this node, the entire database contents can be accessed. In particular, you can get an overview of all existing edge types as well as the properties attached to nodes and edges.

Of course, in practice, you will not want to use your browser to query the database. Instead, you can use python-joern to access the REST API using Python as described in the following section.