25 July 2012: Showing the Dexter autotagging feature
The Dexter Autotagging feature Felix Matenaar 2012 email@example.com 0. Outline 1. Motivation 2. Autotagging 2.1. Design overview 2.2. Implemented tagging aspects 2.3. Future tagging aspects 3. Example Use-Case 4. Limitations 5. Conclusion 1. Motivation When doing Reverse Engineering, it is often comfortable to be provided with information about the code one is currently looking at. If documented API functionality is called for example, it is nice to see parameter types and semantics. Often this information essentially speeds up the process of understanding what a certain code snippet actually does. However since most native system APIs like the Windows API or the Linux system call interface are procedural, information that is gathered by most RE tools boils down to resolving caller/callee relations. On the Android system, an additional object oriented API is provided which is also used by most legitimate applications today. This additionally allows information derivation through class/interface hierarchy relations. The Android API defines permissions which are conceptually mandatory to be requested by an application at installation time in case it wants to use them. Information about where exactly an application uses such a requested permission is valuable to the analyst. So far we introduced possible vectors to automatically derive information based on the interaction of an application with the object-oriented Android API. However one can also use the well structured Dalvik Executable Format to search for patterns in string constants for example. Guess what a string like "SELECT user,password FROM vault where user='%s'" probably means. In the following, we will introduce the autotagging feature used in our Dexter tool for Android application analysis. It uses some of the methods described so far. Section 3 shows a classic use-case. 2. Autotagging Dexter is a web-based Android application audit and reverse engineering tool. As such the autotagging feature is designed to speed up an analyst's understanding of certain program parts. In the following we will describe the design of our autotagging subsystem. Then a choice of the currently supported tagging aspects are described. 2.1 Design Overview The first question when we started to build the autotagging feature was: What algorithmic primitive have most (if not all) of these information gathering methods and aspects in common? If we look at the above examples, they all include a fix-point like an API function or class/interface or an object that matched a certain pattern as the SQL query string. Starting at each object, information can be propagated by resolving relations to other objects like callers, inherited classes or the object itself (SQL string again). The algorithmic primitive is obviously a marking algorithm. All we have to do in order to define a new autotagging method is finding an initial set of objects that matches certain criteria, is associated with this information and propagate it by resolving relations. Here are two simple example definitions for autotagging: 1. Annotation(javaclassbuilder, "android.app.Activity", SubclassMarker,\ tag="api:activity", doc="http://developer.android.com/reference/android/..."), 2. Annotation(methodbuilder, ("java.lang.System","load"), XrefMarker,\ tag="loadlibrary", doc="http://docs.oracle.com/javase/1.5.0/...") What does it actually mean? Each Annotation represents an autotagging definition. The first Annotation is constructed using the following parameters: javaclassbuilder "android.app.Activity" SubclassMarker tag="api:activity" doc="..." The "javaclassbuilder" defines what method is to be used to find our initial set of objects which are associated with the information we want to propagate to further objects. The second parameter, "android.app.Activity", defines what information is used to construct this initial set. In this example we want to find and tag classes in an application which are called an Activity. In a nutshell: Activities are UI Elements used by an application to interact with the user. So far we defined how exactly to find the initial set of objects which in this case is the class with the name "android.app.Activity" using the first two parameters. The third parameter "SubclassMarker" defines how we want to propagate this information to further objects. The Subclass Marker takes a set of classes and returns all transitively inherited classes. In this case, this set includes the activities implemented by the application which is analyzed. Last but not least a tag name and optionally a link to further documentation is assigned. The tag is used to provide the type of information gained by the autotagging to the user and is attached to all objects returned by the marking algorithm. The second example uses a different definition but the same concept. In this case we want to tag methods that call the "System.load()" function. The java documentation states: "Loads a code file with the specified filename from the local file system as a dynamic library." Of course in this case we can not use our subclass marking algorithm because we are resolving a call-relation. Thus we use our XrefMarker. The tagging and docstring parameters obviously have to be different. 2.2 Implemented tagging aspects So far we described the underlying concept of our autotagging feature. Currently we have about 80 different definitions, without counting the API-permission based tagging. The definitions can be divided into the following classes: Package Tagging known api/library packages or based on statistical measurements Class Tagging mostly inheritance/association based Method Tagging caller/callee relation based, some also class tagging related String Tagging substring/regular expression based patterns 2.3 Future tagging aspects Further tagging aspects may include analyses on the use of parameters or return values in a method. We appreciate further ideas for improvement. It might also make sense to extend the annotation definition schema to chain different marking algorithms, being able to even transitively propagate over different relations. However we do not have a use-case for that yet. This section explained the rough concept of the Dexter autotagging feature used to provide the analyst with additional information to speed up the reversing process. The next section will show the features from the users perspective. 3. Example Use-Case As an example use-case, we take the joyn android app by vodafone. Let's see how Dexter can assist an analyst in order to faster get an overview to build a list of interesting application aspects. After a successful analysis with Dexter, we open the general view so get an intuition about the complexity of joyn. In the picture you can see a list of permissions that are requested by the application, containing their description if available - quite a few ;) Next we look into the list of classes and manually look for some interesting tags. The picture shows a snippet of the list containing all classes of an internal package called com.summit.vvm.provider. We can directly see that there are two content providers which are possible entry points into the application. Additionally we see that the classes with the obfuscated names "c" and "g" make use of the networking api. In this case we verified that both classes use the "java.net.URI" class. Now let's use the search functionality to get some more information about our analysis target. First we want to see some SQL query related strings so we use the "tagged objects" search and query it for "SQL". The output is seen in the following picture. Opening the details for each object enables the user to see cross-references from the code. As a second example for the search functionality, lets look for string tagged with "http". The following picture shows the output. Last but not least it is possible to use DXQL to search for tags assigned to methods for example. In this case we use the query "tag of method of internal in set(c());" which results in the set of tags that have been assigned to methods belonging to classes that are defined directly in the dexfile. Future work would be to provide some kind of prepared search statements so the analyst does not have to construct the queries himself. 4. Limitations Of course the just described autotagging methods are just heuristics which can lead to false positives/negatives. Anyone can include Activities that do not directly act as such and thus our tags could provide wrong results. However for most legitimate applications this isn't likely to occur. Make sure to always verify the assigned tags in case you use this information in your analysis. 5. Conclusion The Dexter autotagging feature can provide valuable hints for analysts to speed up the process of getting a rough idea what an applications does and how complex it is. We will add further tagging methods in the future but we also rely on the users feedback to provide some more comfortable access to the autotagging information.