mirror of
https://github.com/github/codeql.git
synced 2026-04-26 17:25:19 +02:00
Tidy up some references
This commit is contained in:
@@ -3,7 +3,7 @@ Analyzing control flow in Python
|
||||
|
||||
You can write CodeQL queries to explore the control flow graph of a Python program, for example, to discover unreachable code or mutually exclusive blocks of code.
|
||||
|
||||
To analyze the `Control-flow graph <http://en.wikipedia.org/wiki/Control_flow_graph>`__ of a ``Scope`` we can use the two CodeQL classes ``ControlFlowNode`` and ``BasicBlock``. These classes allow you to ask such questions as "can you reach point A from point B?" or "Is it possible to reach point B *without* going through point A?". To report results we use the class ``AstNode``, which represents a syntactic element and corresponds to the source code - allowing the results of the query to be more easily understood.
|
||||
To analyze the control-flow graph of a ``Scope`` we can use the two CodeQL classes ``ControlFlowNode`` and ``BasicBlock``. These classes allow you to ask such questions as "can you reach point A from point B?" or "Is it possible to reach point B *without* going through point A?". To report results we use the class ``AstNode``, which represents a syntactic element and corresponds to the source code - allowing the results of the query to be more easily understood. For more information, see `Control-flow graph <http://en.wikipedia.org/wiki/Control_flow_graph>`__ in Wikipedia.
|
||||
|
||||
The ``ControlFlowNode`` class
|
||||
-----------------------------
|
||||
@@ -55,12 +55,12 @@ Example finding unreachable statements
|
||||
where not exists(s.getAFlowNode())
|
||||
select s
|
||||
|
||||
➤ `See this in the query console <https://lgtm.com/query/670720181/>`__. This query gives fewer results, but most of the projects have some unreachable nodes. These are also highlighted by the standard query: `Unreachable code <https://lgtm.com/rules/3980095>`__.
|
||||
➤ `See this in the query console <https://lgtm.com/query/670720181/>`__. This query gives fewer results, but most of the projects have some unreachable nodes. These are also highlighted by the standard query: unreachable code. For more information, see `Unreachable code <https://lgtm.com/rules/3980095>`__ on LGTM.com.
|
||||
|
||||
The ``BasicBlock`` class
|
||||
------------------------
|
||||
|
||||
The ``BasicBlock`` class represents a `basic block <http://en.wikipedia.org/wiki/Basic_block>`__ of control flow nodes. The ``BasicBlock`` class is not that useful for writing queries directly, but is very useful for building complex analyses, such as data flow. The reason it is useful is that it shares many of the interesting properties of control flow nodes, such as what can reach what and what `dominates <http://en.wikipedia.org/wiki/Dominator_%28graph_theory%29>`__ what, but there are fewer basic blocks than control flow nodes - resulting in queries that are faster and use less memory.
|
||||
The ``BasicBlock`` class represents a basic block of control flow nodes. The ``BasicBlock`` class is not that useful for writing queries directly, but is very useful for building complex analyses, such as data flow. The reason it is useful is that it shares many of the interesting properties of control flow nodes, such as, what can reach what, and what dominates what, but there are fewer basic blocks than control flow nodes - resulting in queries that are faster and use less memory. For more information, see `basic block <http://en.wikipedia.org/wiki/Basic_block>`__ and `dominates <http://en.wikipedia.org/wiki/Dominator_%28graph_theory%29>`__ on Wikipedia.
|
||||
|
||||
Example finding mutually exclusive basic blocks
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
@@ -57,7 +57,7 @@ We can modify the query further to include only methods whose body consists of a
|
||||
and count(f.getAStmt()) = 1
|
||||
select f, "This function is (probably) a getter."
|
||||
|
||||
➤ `See this in the query console <https://lgtm.com/query/667290044/>`__. This query returns fewer results, but if you examine the results you can see that there are still refinements to be made. This is refined further in :doc:`Tutorial: Statements and expressions <statements-expressions>`.
|
||||
➤ `See this in the query console <https://lgtm.com/query/667290044/>`__. This query returns fewer results, but if you examine the results you can see that there are still refinements to be made. This is refined further in ":doc:`Tutorial: Statements and expressions <statements-expressions>`."
|
||||
|
||||
Finding a call to a specific function
|
||||
-------------------------------------
|
||||
|
||||
@@ -22,9 +22,7 @@ The CodeQL library for Python incorporates a large number of classes. Each class
|
||||
Syntactic classes
|
||||
-----------------
|
||||
|
||||
This part of the library represents the Python source code. The ``Module``, ``Class``, and ``Function`` classes correspond to Python modules, classes, and functions respectively, collectively these are known as ``Scope`` classes. Each ``Scope`` contains a list of statements each of which is represented by a subclass of the class ``Stmt``. Statements themselves can contain other statements or expressions which are represented by subclasses of ``Expr``. Finally, there are a few additional classes for the parts of more complex expressions such as list comprehensions. Collectively these classes are subclasses of ``AstNode`` and form an Abstract syntax tree (AST). The root of each AST is a ``Module``. For more information, see `Abstract syntax tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`__.
|
||||
|
||||
Symbolic information is attached to the AST in the form of variables (represented by the class ``Variable``). For more information, see `Symbolic information <http://en.wikipedia.org/wiki/Symbol_table>`__.
|
||||
This part of the library represents the Python source code. The ``Module``, ``Class``, and ``Function`` classes correspond to Python modules, classes, and functions respectively, collectively these are known as ``Scope`` classes. Each ``Scope`` contains a list of statements each of which is represented by a subclass of the class ``Stmt``. Statements themselves can contain other statements or expressions which are represented by subclasses of ``Expr``. Finally, there are a few additional classes for the parts of more complex expressions such as list comprehensions. Collectively these classes are subclasses of ``AstNode`` and form an Abstract syntax tree (AST). The root of each AST is a ``Module``. Symbolic information is attached to the AST in the form of variables (represented by the class ``Variable``). For more information, see `Abstract syntax tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`__ and `Symbolic information <http://en.wikipedia.org/wiki/Symbol_table>`__ in Wikipedia.
|
||||
|
||||
Scope
|
||||
^^^^^
|
||||
@@ -239,7 +237,7 @@ Other
|
||||
Control flow classes
|
||||
--------------------
|
||||
|
||||
This part of the library represents the control flow graph of each ``Scope`` (classes, functions, and modules). Each ``Scope`` contains a graph of ``ControlFlowNode`` elements. Each scope has a single entry point and at least one (potentially many) exit points. To speed up control and data flow analysis, control flow nodes are grouped into `basic blocks <http://en.wikipedia.org/wiki/Basic_block>`__.
|
||||
This part of the library represents the control flow graph of each ``Scope`` (classes, functions, and modules). Each ``Scope`` contains a graph of ``ControlFlowNode`` elements. Each scope has a single entry point and at least one (potentially many) exit points. To speed up control and data flow analysis, control flow nodes are grouped into basic blocks. For more information, see `basic blocks <http://en.wikipedia.org/wiki/Basic_block>`__ in Wikipedia.
|
||||
|
||||
Example
|
||||
^^^^^^^
|
||||
@@ -309,7 +307,7 @@ For example, which ``ClassValue``\ s are iterable can be determined using the qu
|
||||
where cls.hasAttribute("__iter__")
|
||||
select cls
|
||||
|
||||
➤ `See this in the query console <https://lgtm.com/query/5151030165280978402/>`__ This query returns a list of classes for the projects analyzed. If you want to include the results for `builtin classes <http://docs.python.org/library/stdtypes.html>`__, which do not have any Python source code, show the non-source results.
|
||||
➤ `See this in the query console <https://lgtm.com/query/5151030165280978402/>`__ This query returns a list of classes for the projects analyzed. If you want to include the results for ``builtin`` classes, which do not have any Python source code, show the non-source results. For more information, see `builtin classes <http://docs.python.org/library/stdtypes.html>`__ in the Python documentation.
|
||||
|
||||
Summary
|
||||
^^^^^^^
|
||||
@@ -320,7 +318,7 @@ Summary
|
||||
- ``CallableValue``
|
||||
- ``ModuleValue``
|
||||
|
||||
For more information about these classes, see :doc:`Pointer analysis and type inference in Python <pointsto-type-infer>`.
|
||||
For more information about these classes, see ":doc:`Pointer analysis and type inference in Python <pointsto-type-infer>`."
|
||||
|
||||
Taint-tracking classes
|
||||
----------------------
|
||||
@@ -334,7 +332,7 @@ Summary
|
||||
- `TaintKind <https://help.semmle.com/qldoc/python/semmle/python/dataflow/TaintTracking.qll/type.TaintTracking$TaintKind.html>`__
|
||||
- `Configuration <https://help.semmle.com/qldoc/python/semmle/python/dataflow/Configuration.qll/type.Configuration$TaintTracking$Configuration.html>`__
|
||||
|
||||
For more information about these classes, see :doc:`Analyzing data flow and tracking tainted data in Python <taint-tracking>`.
|
||||
For more information about these classes, see ":doc:`Analyzing data flow and tracking tainted data in Python <taint-tracking>`".
|
||||
|
||||
|
||||
Further reading
|
||||
|
||||
@@ -25,9 +25,7 @@ Class hierarchy for ``Value``:
|
||||
Points-to analysis and type inference
|
||||
-------------------------------------
|
||||
|
||||
Points-to analysis, sometimes known as `pointer analysis <http://en.wikipedia.org/wiki/Pointer_analysis>`__, allows us to determine which objects an expression may "point to" at runtime.
|
||||
|
||||
`Type inference <http://en.wikipedia.org/wiki/Type_inference>`__ allows us to infer what the types (classes) of an expression may be at runtime.
|
||||
Points-to analysis, sometimes known as pointer analysis, allows us to determine which objects an expression may "point to" at runtime. Type inference allows us to infer what the types (classes) of an expression may be at runtime. For more information, see `pointer analysis <http://en.wikipedia.org/wiki/Pointer_analysis>`__ and `Type inference <http://en.wikipedia.org/wiki/Type_inference>`__ on Wikipedia.
|
||||
|
||||
The predicate ``ControlFlowNode.pointsTo(...)`` shows which object a control flow node may "point to" at runtime.
|
||||
|
||||
@@ -126,7 +124,7 @@ Combining the parts of the query we get this:
|
||||
)
|
||||
select t, ex1, ex2
|
||||
|
||||
➤ `See this in the query console <https://lgtm.com/query/669950027/>`__. This query finds only one result in the demo projects on LGTM.com (`youtube-dl <https://lgtm.com/projects/g/ytdl-org/youtube-dl/rev/39e9d524e5fe289936160d4c599a77f10f6e9061/files/devscripts/buildserver.py?sort=name&dir=ASC&mode=heatmap#L413>`__). The result is also highlighted by the standard query: `Unreachable 'except' block <https://lgtm.com/rules/7900089>`__.
|
||||
➤ `See this in the query console <https://lgtm.com/query/669950027/>`__. This query finds only one result in the demo projects on LGTM.com (`youtube-dl <https://lgtm.com/projects/g/ytdl-org/youtube-dl/rev/39e9d524e5fe289936160d4c599a77f10f6e9061/files/devscripts/buildserver.py?sort=name&dir=ASC&mode=heatmap#L413>`__). The result is also highlighted by the standard query: Unreachable 'except' block. For more information, see `Unreachable 'except' block <https://lgtm.com/rules/7900089>`__ on LGTM.com.
|
||||
|
||||
.. pull-quote::
|
||||
|
||||
@@ -186,7 +184,7 @@ The ``Value`` class has a method ``getACall()`` which allows us to find calls to
|
||||
|
||||
If we wish to restrict the callables to actual functions we can use the ``FunctionValue`` class, which is a subclass of ``Value`` and corresponds to function objects in Python, in much the same way as the ``ClassValue`` class corresponds to class objects in Python.
|
||||
|
||||
Returning to an example from :doc:`Tutorial: Functions <functions>`, we wish to find calls to the ``eval`` function.
|
||||
Returning to an example from ":doc:`Tutorial: Functions <functions>`," we wish to find calls to the ``eval`` function.
|
||||
|
||||
The original query looked this:
|
||||
|
||||
|
||||
@@ -178,7 +178,7 @@ If there are duplicate keys in a Python dictionary, then the second key will ove
|
||||
and k1 != k2 and same_key(k1, k2)
|
||||
select k1, "Duplicate key in dict literal"
|
||||
|
||||
➤ `See this in the query console <https://lgtm.com/query/663330305/>`__. When we ran this query on LGTM.com, the source code of the *saltstack/salt* project contained an example of duplicate dictionary keys. The results were also highlighted as alerts by the standard `Duplicate key in dict literal <https://lgtm.com/rules/3980087>`__ query. Two of the other demo projects on LGTM.com refer to duplicate dictionary keys in library files.
|
||||
➤ `See this in the query console <https://lgtm.com/query/663330305/>`__. When we ran this query on LGTM.com, the source code of the *saltstack/salt* project contained an example of duplicate dictionary keys. The results were also highlighted as alerts by the standard "Duplicate key in dict literal" query. Two of the other demo projects on LGTM.com refer to duplicate dictionary keys in library files. For more information, see `Duplicate key in dict literal <https://lgtm.com/rules/3980087>`__ on LGTM.com.
|
||||
|
||||
The supporting predicate ``same_key`` checks that the keys have the same identifier. Separating this part of the logic into a supporting predicate, instead of directly including it in the query, makes it easier to understand the query as a whole. The casts defined in the predicate restrict the expression to the type specified and allow predicates to be called on the type that is cast-to. For example:
|
||||
|
||||
@@ -197,7 +197,7 @@ The short version is usually used as this is easier to read.
|
||||
Example finding Java-style getters
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Returning to the example from :doc:`Tutorial: Functions <functions>`, the query identified all methods with a single line of code and a name starting with ``get``.
|
||||
Returning to the example from ":doc:`Tutorial: Functions <functions>`," the query identified all methods with a single line of code and a name starting with ``get``.
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
|
||||
@@ -18,7 +18,7 @@ even though there is no data flow from ``path`` to ``path + "/"``.
|
||||
Separate CodeQL libraries have been written to handle 'normal' data flow and taint tracking in :doc:`C/C++ <../cpp/dataflow>`, :doc:`C# <../csharp/dataflow>`, :doc:`Java <../java/dataflow>`, and :doc:`JavaScript <../javascript/dataflow>`. You can access the appropriate classes and predicates that reason about these different modes of data flow by importing the appropriate library in your query.
|
||||
In Python analysis, we can use the same taint tracking library to model both 'normal' data flow and taint flow, but we are still able make the distinction between steps that preserve values and those that don't by defining additional data flow properties.
|
||||
|
||||
For further information on data flow and taint tracking with CodeQL, see :doc:`Introduction to data flow <../intro-to-data-flow>`.
|
||||
For further information on data flow and taint tracking with CodeQL, see ":doc:`Introduction to data flow <../intro-to-data-flow>`."
|
||||
|
||||
Fundamentals of taint tracking using data flow analysis
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
@@ -231,8 +231,8 @@ The ``TaintKind`` itself is just a string (a QL string, not a CodeQL entity repr
|
||||
which provides methods to extend flow and allow the kind of taint to change along the path.
|
||||
The ``TaintKind`` class has many predicates allowing flow to be modified.
|
||||
This simplest ``TaintKind`` does not override any predicates, meaning that it only flows as opaque data.
|
||||
An example of this is the `Hard-coded credentials query <https://lgtm.com/query/rule:1506421276400/lang:python/>`_,
|
||||
which defines the simplest possible taint kind class, ``HardcodedValue``, and custom source and sink classes.
|
||||
An example of this is the "Hard-coded credentials" query,
|
||||
which defines the simplest possible taint kind class, ``HardcodedValue``, and custom source and sink classes. For more information, see `Hard-coded credentials <https://lgtm.com/query/rule:1506421276400/lang:python/>`_ on LGTM.com.
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
|
||||
Reference in New Issue
Block a user