Update main Python articles

This commit is contained in:
Felicity Chapman
2022-11-24 12:19:29 +00:00
committed by Arthur Baars
parent 8eeba92a47
commit da4c178534
7 changed files with 19 additions and 47 deletions

View File

@@ -47,7 +47,7 @@ Example finding unreachable AST nodes
where not exists(node.getAFlowNode())
select node
`See this in the query console on LGTM.com <https://lgtm.com/query/669220024/>`__. The demo projects on LGTM.com all have some code that has no control flow node, and is therefore unreachable. However, since the ``Module`` class is also a subclass of the ``AstNode`` class, the query also finds any modules implemented in C or with no source code. Therefore, it is better to find all unreachable statements.
Many codebases have some code that has no control flow node, and is therefore unreachable. However, since the ``Module`` class is also a subclass of the ``AstNode`` class, the query also finds any modules implemented in C or with no source code. Therefore, it is better to find all unreachable statements.
Example finding unreachable statements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -60,7 +60,7 @@ Example finding unreachable statements
where not exists(s.getAFlowNode())
select s
`See this in the query console on LGTM.com <https://lgtm.com/query/670720181/>`__. This query gives fewer results, but most of the projects have some unreachable nodes. These are also highlighted by the standard "Unreachable code" query. For more information, see `Unreachable code <https://lgtm.com/rules/3980095>`__ on LGTM.com.
This query should give fewer results. You can also find unreachable code using the standard "Unreachable code" query. For more information, see `Unreachable code <https://codeql.github.com/codeql-query-help/python/py-unreachable-statement/>`__.
The ``BasicBlock`` class
------------------------
@@ -114,7 +114,7 @@ Example finding mutually exclusive blocks within the same function
)
select b1, b2
`See this in the query console on LGTM.com <https://lgtm.com/query/671000028/>`__. This typically gives a very large number of results, because it is a common occurrence in normal control flow. It is, however, an example of the sort of control-flow analysis that is possible. Control-flow analyses such as this are an important aid to data flow analysis. For more information, see ":doc:`Analyzing data flow in Python <analyzing-data-flow-in-python>`."
This typically gives a very large number of results, because it is a common occurrence in normal control flow. It is, however, an example of the sort of control-flow analysis that is possible. Control-flow analyses such as this are an important aid to data flow analysis. For more information, see ":doc:`Analyzing data flow in Python <analyzing-data-flow-in-python>`."
Further reading
---------------

View File

@@ -97,11 +97,9 @@ Python has builtin functionality for reading and writing files, such as the func
call = API::moduleImport("os").getMember("open").getACall()
select call.getArg(0)
`See this in the query console on LGTM.com <https://lgtm.com/query/8635258505893505141/>`__. Two of the demo projects make use of this low-level API.
Notice the use of the ``API`` module for referring to library functions. For more information, see ":doc:`Using API graphs in Python <using-api-graphs-in-python>`."
Unfortunately this will only give the expression in the argument, not the values which could be passed to it. So we use local data flow to find all expressions that flow into the argument:
Unfortunately this query will only give the expression in the argument, not the values which could be passed to it. So we use local data flow to find all expressions that flow into the argument:
.. code-block:: ql
@@ -115,9 +113,7 @@ Unfortunately this will only give the expression in the argument, not the values
DataFlow::localFlow(expr, call.getArg(0))
select call, expr
`See this in the query console on LGTM.com <https://lgtm.com/query/8213643003890447109/>`__. Many expressions flow to the same call.
We see that we get several data-flow nodes for an expression as it flows towards a call (notice repeated locations in the ``call`` column). We are mostly interested in the "first" of these, what might be called the local source for the file name. To restrict attention to such local sources, and to simultaneously make the analysis more performant, we have the QL class ``LocalSourceNode``. We could demand that ``expr`` is such a node:
Typically, you will see several data-flow nodes for an expression as it flows towards a call (notice repeated locations in the ``call`` column). We are mostly interested in the "first" of these, what might be called the local source for the file name. To restrict attention to such local sources, and to simultaneously make the analysis more performant, we have the QL class ``LocalSourceNode``. We could demand that ``expr`` is such a node:
.. code-block:: ql
@@ -160,9 +156,9 @@ As an alternative, we can ask more directly that ``expr`` is a local source of t
expr = call.getArg(0).getALocalSource()
select call, expr
`See this in the query console on LGTM.com <https://lgtm.com/query/6602079735954016687/>`__. All these three queries give identical results. We now mostly have one expression per call.
These three queries all give identical results. We now mostly have one expression per call.
We still have some cases of more than one expression flowing to a call, but then they flow through different code paths (possibly due to control-flow splitting, as in the second case).
We still have some cases of more than one expression flowing to a call, but then they flow through different code paths (possibly due to control-flow splitting).
We might want to make the source more specific, for example a parameter to a function or method. This query finds instances where a parameter is used as the name when opening a file:
@@ -178,7 +174,7 @@ We might want to make the source more specific, for example a parameter to a fun
DataFlow::localFlow(p, call.getArg(0))
select call, p
`See this in the query console on LGTM.com <https://lgtm.com/query/3998032643497238063/>`__. Very few results now; these could feasibly be inspected manually.
For most codebases, this will return only a few results and these could be inspected manually.
Using the exact name supplied via the parameter may be too strict. If we want to know if the parameter influences the file name, we can use taint tracking instead of data flow. This query finds calls to ``os.open`` where the filename is derived from a parameter:
@@ -194,7 +190,7 @@ Using the exact name supplied via the parameter may be too strict. If we want to
TaintTracking::localTaint(p, call.getArg(0))
select call, p
`See this in the query console on LGTM.com <https://lgtm.com/query/2129957933670836953/>`__. Now we get more results and in more projects.
Typically, this finds more results.
Global data flow
----------------
@@ -369,8 +365,6 @@ This data flow configuration tracks data flow from environment variables to open
select fileOpen, "This call to 'os.open' uses data from $@.",
environment, "call to 'os.getenv'"
`Running this in the query console on LGTM.com <https://lgtm.com/query/6582374907796191895/>`__ unsurprisingly yields no results in the demo projects.
Further reading
---------------

View File

@@ -16,7 +16,7 @@ Experiment and learn how to write effective and efficient queries for CodeQL dat
expressions-and-statements-in-python
analyzing-control-flow-in-python
- :doc:`Basic query for Python code <basic-query-for-python-code>`: Learn to write and run a simple CodeQL query using LGTM.
- :doc:`Basic query for Python code <basic-query-for-python-code>`: Learn to write and run a simple CodeQL query.
- :doc:`CodeQL library for Python <codeql-library-for-python>`: When you need to analyze a Python program, you can make use of the large collection of classes in the CodeQL library for Python.

View File

@@ -53,7 +53,7 @@ All scopes are basically a list of statements, although ``Scope`` classes have a
where f.getScope() instanceof Function
select f
`See this in the query console on LGTM.com <https://lgtm.com/query/665620040/>`__. Many projects have nested functions.
Many codebases use nested functions.
Statement
^^^^^^^^^
@@ -95,7 +95,7 @@ As an example, to find expressions of the form ``a+2`` where the left is a simpl
where bin.getLeft() instanceof Name and bin.getRight() instanceof Num
select bin
`See this in the query console on LGTM.com <https://lgtm.com/query/669950026/>`__. Many projects include examples of this pattern.
Many codebases include examples of this pattern.
Variable
^^^^^^^^
@@ -126,7 +126,7 @@ For our first example, we can find all ``finally`` blocks by using the ``Try`` c
from Try t
select t.getFinalbody()
`See this in the query console on LGTM.com <https://lgtm.com/query/659662193/>`__. Many projects include examples of this pattern.
Many codebases include examples of this pattern.
2. Finding ``except`` blocks that do nothing
''''''''''''''''''''''''''''''''''''''''''''
@@ -157,7 +157,7 @@ Both forms are equivalent. Using the positive expression, the whole query looks
where forall(Stmt s | s = ex.getAStmt() | s instanceof Pass)
select ex
`See this in the query console on LGTM.com <https://lgtm.com/query/690010036/>`__. Many projects include pass-only ``except`` blocks.
Many codebases include pass-only ``except`` blocks.
Summary
^^^^^^^
@@ -278,8 +278,6 @@ Using this predicate we can select the longest ``BasicBlock`` by selecting the `
where bb_length(b) = max(bb_length(_))
select b
`See this in the query console on LGTM.com <https://lgtm.com/query/666730036/>`__. When we ran it on the LGTM.com demo projects, the *openstack/nova* and *ytdl-org/youtube-dl* projects both contained source code results for this query.
.. pull-quote::
Note

View File

@@ -54,8 +54,6 @@ The ``global`` statement in Python declares a variable with a global (module-lev
where g.getScope() instanceof Module
select g
`See this in the query console on LGTM.com <https://lgtm.com/query/686330052/>`__. None of the demo projects on LGTM.com has a global statement that matches this pattern.
The line: ``g.getScope() instanceof Module`` ensures that the ``Scope`` of ``Global g`` is a ``Module``, rather than a class or function.
Example finding 'if' statements with redundant branches
@@ -81,7 +79,7 @@ To find statements like this that could be simplified we can write a query.
and forall(Stmt p | p = l.getAnItem() | p instanceof Pass)
select i
`See this in the query console on LGTM.com <https://lgtm.com/query/672230053/>`__. Many projects have some ``if`` statements that match this pattern.
Many codebases have some ``if`` statements that match this pattern.
The line: ``(l = i.getBody() or l = i.getOrelse())`` restricts the ``StmtList l`` to branches of the ``if`` statement.
@@ -150,8 +148,6 @@ We can check for these using a query.
and cmp.getOp(0) instanceof Is and cmp.getComparator(0) = literal
select cmp
`See this in the query console on LGTM.com <https://lgtm.com/query/688180010/>`__. Two of the demo projects on LGTM.com use this pattern: *saltstack/salt* and *openstack/nova*.
The clause ``cmp.getOp(0) instanceof Is and cmp.getComparator(0) = literal`` checks that the first comparison operator is "is" and that the first comparator is a literal.
.. pull-quote::
@@ -180,7 +176,7 @@ If there are duplicate keys in a Python dictionary, then the second key will ove
and k1 != k2 and same_key(k1, k2)
select k1, "Duplicate key in dict literal"
`See this in the query console on LGTM.com <https://lgtm.com/query/663330305/>`__. When we ran this query on LGTM.com, the source code of the *saltstack/salt* project contained an example of duplicate dictionary keys. The results were also highlighted as alerts by the standard "Duplicate key in dict literal" query. Two of the other demo projects on LGTM.com refer to duplicate dictionary keys in library files. For more information, see `Duplicate key in dict literal <https://lgtm.com/rules/3980087>`__ on LGTM.com.
When we ran this query on some test codebases, we found examples of duplicate dictionary keys. The results were also highlighted as alerts by the standard "Duplicate key in dict literal" query. For more information, see `Duplicate key in dict literal <https://codeql.github.com/codeql-query-help/python/py-duplicate-key-dict-literal/>`__.
The supporting predicate ``same_key`` checks that the keys have the same identifier. Separating this part of the logic into a supporting predicate, instead of directly including it in the query, makes it easier to understand the query as a whole. The casts defined in the predicate restrict the expression to the type specified and allow predicates to be called on the type that is cast-to. For example:
@@ -222,8 +218,6 @@ This basic query can be improved by checking that the one line of code is a Java
and attr.getObject() = self and self.getId() = "self"
select f, "This function is a Java-style getter."
`See this in the query console on LGTM.com <https://lgtm.com/query/669220054/>`__. Of the demo projects on LGTM.com, only the *openstack/nova* project has examples of functions that appear to be Java-style getters.
.. code-block:: ql
ret = f.getStmt(0) and ret.getValue() = attr

View File

@@ -28,7 +28,7 @@ Using the member predicate ``Function.getName()``, we can list all of the getter
where f.getName().matches("get%")
select f, "This is a function called get..."
`See this in the query console on LGTM.com <https://lgtm.com/query/669220031/>`__. This query typically finds a large number of results. Usually, many of these results are for functions (rather than methods) which we are not interested in.
This query typically finds a large number of results. Usually, many of these results are for functions (rather than methods) which we are not interested in.
Finding all methods called "get..."
-----------------------------------
@@ -43,7 +43,7 @@ You can modify the query above to return more interesting results. As we are onl
where f.getName().matches("get%") and f.isMethod()
select f, "This is a method called get..."
`See this in the query console on LGTM.com <https://lgtm.com/query/690010035/>`__. This finds methods whose name starts with ``"get"``, but many of those are not the sort of simple getters we are interested in.
This finds methods whose name starts with ``"get"``, but many of those are not the sort of simple getters we are interested in.
Finding one line methods called "get..."
----------------------------------------
@@ -59,7 +59,7 @@ We can modify the query further to include only methods whose body consists of a
and count(f.getAStmt()) = 1
select f, "This function is (probably) a getter."
`See this in the query console on LGTM.com <https://lgtm.com/query/667290044/>`__. This query returns fewer results, but if you examine the results you can see that there are still refinements to be made. This is refined further in ":doc:`Expressions and statements in Python <expressions-and-statements-in-python>`."
This query returns fewer results, but if you examine the results you can see that there are still refinements to be made. This is refined further in ":doc:`Expressions and statements in Python <expressions-and-statements-in-python>`."
Finding a call to a specific function
-------------------------------------
@@ -74,8 +74,6 @@ This query uses ``Call`` and ``Name`` to find calls to the function ``eval`` - w
where call.getFunc() = name and name.getId() = "eval"
select call, "call to 'eval'."
`See this in the query console on LGTM.com <https://lgtm.com/query/6718356557331218618/>`__. Some of the demo projects on LGTM.com use this function.
The ``Call`` class represents calls in Python. The ``Call.getFunc()`` predicate gets the expression being called. ``Name.getId()`` gets the identifier (as a string) of the ``Name`` expression.
Due to the dynamic nature of Python, this query will select any call of the form ``eval(...)`` regardless of whether it is a call to the built-in function ``eval`` or not.
In a later tutorial we will see how to use the type-inference library to find calls to the built-in function ``eval`` regardless of name of the variable called.

View File

@@ -29,8 +29,6 @@ following snippet demonstrates.
select API::moduleImport("re")
`See this in the query console on LGTM.com <https://lgtm.com/query/1876172022264324639/>`__.
This query selects the API graph node corresponding to the ``re`` module. This node represents the fact that the ``re`` module has been imported rather than a specific location in the program where the import happens. Therefore, there will be at most one result per project, and it will not have a useful location, so you'll have to click `Show 1 non-source result` in order to see it.
To find where the ``re`` module is referenced in the program, you can use the ``getAUse`` method. The following query selects all references to the ``re`` module in the current database.
@@ -42,8 +40,6 @@ To find where the ``re`` module is referenced in the program, you can use the ``
select API::moduleImport("re").getAUse()
`See this in the query console on LGTM.com <https://lgtm.com/query/8072356519514905526/>`__.
Note that the ``getAUse`` method accounts for local flow, so that ``my_re_compile``
in the following snippet is
correctly recognized as a reference to the ``re.compile`` function.
@@ -77,8 +73,6 @@ the above ``re.compile`` example, you can now find references to ``re.compile``.
select API::moduleImport("re").getMember("compile").getAUse()
`See this in the query console on LGTM.com <https://lgtm.com/query/7970570434725297676/>`__.
In addition to ``getMember``, you can use the ``getUnknownMember`` method to find references to API
components where the name is not known statically. You can use the ``getAMember`` method to
access all members, both known and unknown.
@@ -97,15 +91,11 @@ where the return value of ``re.compile`` is used:
select API::moduleImport("re").getMember("compile").getReturn().getAUse()
`See this in the query console on LGTM.com <https://lgtm.com/query/4346050399960356921/>`__.
Note that this includes all uses of the result of ``re.compile``, including those reachable via
local flow. To get just the *calls* to ``re.compile``, you can use ``getAnImmediateUse`` instead of
``getAUse``. As this is a common occurrence, you can use ``getACall`` instead of
``getReturn`` followed by ``getAnImmediateUse``.
`See this in the query console on LGTM.com <https://lgtm.com/query/8143347716552092926/>`__.
Note that the API graph does not distinguish between class instantiations and function calls. As far
as it's concerned, both are simply places where an API graph node is called.
@@ -134,8 +124,6 @@ all subclasses of ``View``, you must explicitly include the subclasses of ``Meth
select viewClass().getAUse()
`See this in the query console on LGTM.com <https://lgtm.com/query/288293322319747121/>`__.
Note the use of the set literal ``["View", "MethodView"]`` to match both classes simultaneously.
Built-in functions and classes