25 July 2012: Showing the Dexter autotagging feature
                       The Dexter Autotagging feature
                            Felix Matenaar 2012
                             team@dexlabs.org

0. Outline
	1.		Motivation
	2.		Autotagging
	2.1.		Design overview
	2.2.		Implemented tagging aspects
	2.3.		Future tagging aspects
	3.		Example Use-Case
	4.		Limitations
	5.		Conclusion

1. Motivation

	When doing Reverse Engineering, it is often comfortable to be provided
	with information about the code one is currently looking at. If
	documented API functionality is called for example, it is nice to see
	parameter types and semantics. Often this information essentially
	speeds up the process of understanding what a certain code snippet
	actually does.

	However since most native system APIs like the Windows API or the Linux
	system call interface are procedural, information that is gathered by
	most RE tools boils down to resolving caller/callee relations. On the 
	Android system, an additional object oriented API is provided which is
	also used by most legitimate applications today. This additionally allows
	information derivation through class/interface hierarchy relations.
	The Android API defines permissions which are conceptually
	mandatory to be requested by an application at installation time in case
        it wants to use them. Information about where exactly an application
        uses such a requested permission is valuable to the analyst.

	So far we introduced possible vectors to automatically derive information
	based on the interaction of an application with the object-oriented
	Android API. However one can also use the well structured Dalvik
	Executable Format to search for patterns in string constants for example.
	Guess what a string like "SELECT user,password FROM vault where user='%s'"
	probably means.

	In the following, we will introduce the autotagging feature used in our
	Dexter tool for Android application analysis. It uses some of the
	methods described so far. Section 3 shows a classic use-case.

2. Autotagging

	Dexter is a web-based Android application audit and reverse engineering
	tool. As such the autotagging feature is designed to speed up an
	analyst's understanding of certain program parts. In the following we
	will describe the design of our autotagging subsystem. Then a choice of
	the currently supported tagging aspects are described.


	2.1 Design Overview
	The first question when we started to build the autotagging feature was:
	What algorithmic primitive have most (if not all) of these information
	gathering methods and aspects in common?
	If we look at the above examples, they all include a fix-point like
	an API function or class/interface or an object that matched a certain
	pattern as the SQL query string. Starting at each object, information can
	be propagated by resolving relations to other objects like callers,
	inherited classes or the object itself (SQL string again). The
	algorithmic primitive is obviously a marking algorithm.

        All we have to do in order to define a new autotagging method is finding an
	initial set of objects that matches certain criteria, is associated with
	this information and propagate it by resolving relations. Here are two
	simple example definitions for autotagging:

		1. Annotation(javaclassbuilder, "android.app.Activity", SubclassMarker,\
			tag="api:activity",
			doc="http://developer.android.com/reference/android/..."),
		2. Annotation(methodbuilder, ("java.lang.System","load"), XrefMarker,\
			tag="loadlibrary",
			doc="http://docs.oracle.com/javase/1.5.0/...")

	What does it actually mean? Each Annotation represents an autotagging
	definition. The first Annotation is constructed using the following
	parameters:
		javaclassbuilder
		"android.app.Activity"
		SubclassMarker
		tag="api:activity"
		doc="..."

	The "javaclassbuilder" defines what method is to be used to find our
	initial set of objects which are associated with the information we want
	to propagate to further objects. The second parameter,
	"android.app.Activity", defines what information is used to construct
	this initial set. In this example we want to find and tag classes in an
	application which are called an Activity. In a nutshell: Activities are
	UI Elements used by an application to interact with the user.
	So far we defined how exactly to find the initial set of objects which in
	this case is the class with the name "android.app.Activity" using the
	first two parameters. The third parameter "SubclassMarker" defines how we
	want to propagate this information to further objects. The Subclass
	Marker takes a set of classes and returns all transitively inherited
	classes. In this case, this set includes the activities implemented by the
	application which is analyzed. Last but not least a tag name and
	optionally a link to further documentation is assigned. The tag is used
	to provide the type of information gained by the autotagging to the user
	and is attached to all objects returned by the marking algorithm.

	The second example uses a different definition but the same concept. In
	this case we want to tag methods that call the "System.load()" function.
	The java documentation states: "Loads a code file with the specified
	filename from the local file system as a dynamic library."
	Of course in this case we can not use our subclass marking algorithm
	because we are resolving a call-relation. Thus we use our XrefMarker. The
	tagging and docstring parameters obviously have to be different.


	2.2 Implemented tagging aspects
	So far we described the underlying concept of our autotagging feature.
	Currently we have about 80 different definitions, without counting
	the API-permission based tagging. The definitions can be divided into the
	following classes:
		Package Tagging
			known api/library packages or based on statistical measurements
		Class Tagging
			mostly inheritance/association based
		Method Tagging
			caller/callee relation based, some also class tagging related
		String Tagging
			substring/regular expression based patterns


	2.3 Future tagging aspects
	Further tagging aspects may include analyses on the use of parameters or
	return values in a method. We appreciate further ideas for improvement.
	It might also make sense to extend the annotation definition schema to
	chain different marking algorithms, being able to even transitively
	propagate over different relations. However we do not have a use-case for
	that yet.


	This section explained the rough concept of the Dexter autotagging
	feature used to provide the analyst with additional information to speed
	up the reversing process. The next section will show the features from
	the users perspective. 

3. Example Use-Case

	As an example use-case, we take the joyn android app by vodafone. Let's
	see how Dexter can assist an analyst in order to faster get an overview to build a list
	of interesting application aspects.

	After a successful analysis with Dexter, we open the general view so get
	an intuition about the complexity of joyn. In the picture you can see a
	list of permissions that are requested by the application, containing their
	description if available - quite a few ;)

	

	Next we look into the list of classes and manually look for some
	interesting tags. The picture shows a snippet of the list containing all classes
	of an internal package called com.summit.vvm.provider. We can directly
	see that there are two content providers which are possible entry points
	into the application. Additionally we see that the classes with the
	obfuscated names "c" and "g" make use of the networking api. In this case
	we verified that both classes use the "java.net.URI" class.

	

	Now let's use the search functionality to get some more information about
	our analysis target. First we want to see some SQL query related strings
	so we use the "tagged objects" search and query it for "SQL". The output
	is seen in the following picture. Opening the details for each object
	enables the user to see cross-references from the code.
	
	

	As a second example for the search functionality, lets look for string
	tagged with "http". The following picture shows the output. 

	

	Last but not least it is possible to use DXQL to search for tags assigned
	to methods for example. In this case we use the query "tag of method of
	internal in set(c());" which results in the set of tags that have been
	assigned to methods belonging to classes that are defined directly in
	the dexfile.

	

	Future work would be to provide some kind of prepared search statements
	so the analyst does not have to construct the queries himself.

4. Limitations

	Of course the just described autotagging methods are just
	heuristics which can lead to false positives/negatives. Anyone can
	include Activities that do not directly act as such and thus our tags
	could provide wrong results. However for most legitimate applications
	this isn't likely to occur. Make sure to always verify the assigned tags
	in case you use this information in your analysis.

5. Conclusion

	The Dexter autotagging feature can provide valuable hints for analysts to
	speed up the process of getting a rough idea what an applications does
	and	how complex it is. We will add further tagging methods in the future
	but we also rely on the users feedback to provide some more comfortable
	access to the autotagging information.
< Blog List