Importing Code¶
Populating the Database¶
Once joern has been installed, you can begin to import code into the
database by simply pointing joern.jar
to the directory containing
the source code:
java -jar $JOERN/bin/joern.jar $CodeDirectory
or, if you want to ensure that the JVM has access to your heap memory
java -Xmx$SIZEg -jar $JOERN/bin/joern.jar $CodeDirectory
where $SIZE
is the maximum size of the Java Heap in GB. As an
example, you can import $JOERN/testCode
.
This will create a Neo4J database directory .joernIndex
in your
current working directory. Note that if the directory already exists
and contains a Neo4J database, joern.jar
will add the code to the
existing database. You can thus import additional code at any time. If
however, you want to create a new database, make sure to delete
.joernIndex
prior to running joern.jar
.
Tainting Arguments (Optional)¶
Many times, an argument to a library function (e.g., the first
argument to recv
) is tainted by the library function. There is
no way to statically determine this when the code of the library
function is not available. Also, Joern does not perform
inter-procedural taint-analysis and therefore, by default, symbols
passed to functions as arguments are considered used but not
defined.
To instruct Joern to consider arguments of a function to be tainted by
calls to that function, you can use the tool argumentTainter
. For
example, by executing
java -jar ./bin/argumentTainter.jar recv 0
from the Joern root directory, all first arguments to recv
will be
considered tainted and dependency graphs will be recalculated
accordingly.
Starting the Database Server¶
It is possible to access the graph database directly from your scripts by loading the database into memory on script startup. However, it is highly recommended to access data via the Neo4J server instead. The advantage of doing so is that the data is loaded only once for all scripts you may want to execute allowing you to benefit from Neo4J’s caching for increased speed.
To install the neo4j server, download version textbf{1.9.7} from http://www.neo4j.org/download/other_versions.
Once downloaded, unpack the archive into a directory of your choice,
which we will call $Neo4jDir
in the following.
Next, specificy the location of the database created by joern in your
Neo4J server configuration file in
$Neo4jDir/conf/neo4j-server.properties
:
# neo4j-server.properties
org.neo4j.server.database.location=/$path_to_index/.joernIndex/
For example, if your .joernIndex
is located in
/home/user/joern/.joernIndex
, your configuration file should
contain the line:
# neo4j-server.properties
org.neo4j.server.database.location=/home/user/joern/.joernIndex/
Please also make sure that org.neo4j.server.database.location
is
set only once.
You can now start the database server by issuing the following command:
$Neo4jDir/bin/neo4j console
If your installation of Neo4J is more recent than the libraries bundled
with joern, the database might fail to start and request an upgrade of
the stored data. This upgrade can be performed on the fly by enabling
allow_store_upgrade
in neo4j.properties
as follows:
# neo4j.properties
allow_store_upgrade=true
The Neo4J server offers a web interface and a web-based API (REST API) to explore and query the database. Once your database server has been launched, point your browser to http://localhost:7474/ .
Next, visit http://localhost:7474/db/data/node/0 . This is the reference node, which is the root node of the graph database. Starting from this node, the entire database contents can be accessed. In particular, you can get an overview of all existing edge types as well as the properties attached to nodes and edges.
Of course, in practice, you will not want to use your browser to query the database. Instead, you can use python-joern to access the REST API using Python as described in the following section.