Files
codeql/python/ql/test/experimental/dataflow/coverage/datamodel.py
Rasmus Wriedt Larsen 9c275c177a Python: Implement call-graph with type-trackers
This commit is a squash of 80 other commits. While developing, things
changed majorly 2-3 times, and it just wasn't feasible to go back and
write a really nice commit history.

My apologies for this HUGE commit.

Also, later on this is where I solved merge conflicts after flow-summaries
PR was merged.

For your amusement, I've included the original commit messages below.

Python: Add proper argument/parameter positions

Python: Handle normal function calls

Python: Reduce dataflow-consistency warnings

Previously there was a lot of failures for `uniqueEnclosingCallable` and
`argHasPostUpdate`

Removing the override of `getEnclosingCallable` in ParameterNode is
probably the most controversial... although from my point of view it's a
change for the better, since we're able to provide data-flow
ParameterNodes for more of the AST parameter nodes.

Python: Adjust `dataflow/calls` test

Python: Implement `isParameterOf`/`argumentOf`/`OutNode`

This makes the tests under `dataflow/basic` work as well 👍

(initially I had these as separate commits, but it felt like it was too much noise)

Python: Accept fix for `dataflow/consistency`

Python: Changes to `coverage/argumentRoutingTest.ql`

Notice we gain a few new resolved arguments.

We loose out on stuff due to:

1. not handling `*` or `**` in either arguments/parameters (yet)
2. not handling special calls (yet)

Python: Small fix for `TestUtil/RoutingTest.qll`

Since the helper predicates do not depend on this, moved outside class.

Python: Accept changes to `dataflow/coverage/NormalDataflowTest.ql`

Most of this is due to:

- not handling any kinds of methods yet
- not handling `*` or `**`

Python: Small investigation of `test_deep_callgraph`

Python: Accept changes to `coverage/localFlow.ql`

I don't fully understand why the .expected file changed.

Since we still have the desired flow, I'm not going to worry too much
about it.

with this commit, the `dataflow/coverage` tests passes 👍

Python: Minor doc update

Python: Add staticmethod/classmethod to `dataflow/calls`

Python: Handle method calls on class instances

without trying to deal with any class inheritance, or
staticmethod/classmethod at all.

Notice that with this change, we only have a DataFlowCall for the calls
that we can actually resolve. I'm not 100% sure if we need to add a
`UnresolvedCall` subclass of `DataFlowCall` for MaD in the future, but
it should be easy to do.

I'm still unsure about the value of `classesCallGraph`, but have just
accepted the changes.

Python: Handle direct method calls `C.foo(C, arg0)`

Python: Handle `@staticmethod`

Python: Handle class method calls... but the code is shit

WIP todo

Rewrite method calls to be better

also fixed a problem with `self` being an argument to the `x.staticmethod()` call :|

Python: Add subclass tests

Python: Split `class_advanced` test

Python: Rewrite call-graph tests to be inline expectation (1/2)

This adds inline expectations, next commit will remove old annotations
code... but I thought it would be easier to review like this.

Minor fixup

Python: Add simple subclass support

Python: more precise subclass lookup

Still not 100% precise.. but it's better

New ambiguous

Python: Add test for `self.m()` and `cls.m()` calls

Python: Handle `self.m()` and `cls.m()` calls

Python: Add tests for `__init__` and `__new__`

Python: Handle class calls

Python: Fix `self` argument passing for class calls

Now field-flow tests also pass 💪 (although the crosstalk
fieldflow test changes were due to this specific commit)

I also copied much of the setup for pre/post update nodes from Ruby,
specifically having the abstract `PostUpdateNodeImpl` in DataFlowPrivate
seemed like a nice change.

Same for the setup with `TNode` definition having the specification
directly in the body, instead of a `NeedsSyntheticPostUpdateNode` class.

Python: Add new crosstalk test WIP

Maybe needs a bit of refactoring, and to see how it all behaves with points-to

Python: Add `super()` call-graph tests

Python: Refactor MethodCall char-pred

In anticipation of supporting `super(MyClass, self).foo()`, where the
`self` argument doesn't come from an AttrNode, but from the second
argument to super.

Without `pragma[inline]` the optimizer found a terrible join-order --
this won't guarantee a good join-order for the future, but for now it
was just so simple and could let me move on with life.

Python: Add basic `super()` support

I debated a little (with myself) whether I should really do
`superTracker`, but I thought "why not" and just rolled with it. I did
not confirm whether it was actually needed anywhere, that is if anyone
does `ref = super; ref().foo()` -- although I certainly doubt it's very
wide-spread.

Python: InlineCallGraphTest: Allow non-unique callable name in different files

Python: more MRO tests

Python: Add MRO approximation for `super()`

Although it's not 100% accurate, it seems to be on level with the one in
points-to.

Python: Remove some spurious targets for direct calls

removal of TODO from refactoring

remove TODOs class call support

Python: Add contrived subclass call example

Python: Remove more spurious call targets

NOTE: I initially forgot to use
`findFunctionAccordingToMroKnownStartingClass` instead of
`findFunctionAccordingToMro` for __init__ and __new__, and since I did
make that mistake myself, I wanted to add something to the test to
highlight this fact, and make it viewable by PR reviewer... this will be
fixed in the next commit.

Python: Proper fix for spurious __init__ targets

Python: Add call-graph example of class decorator

Python: Support decorated classes in new call-graph

Python: Add call-graph tests for `type(obj).meth()`

Python: support `type(obj).meth()`

Python: Add test for callable defined in function

Python: Add test for callable as argument

Current'y we don't find these with type-tracking, which is super
mysterious. I did check that we have proper flow from the arguments to
the parameters.

Python: Found problem for callable as argument :| MAJOR WIP

WIP commit

IT WORKS AGAIN (but terrible performance)

remove pragma[inline]

remove oops

Fix performance problem

I tried to optimize it even further, but I didn't end up achieving anything :|

Fix call-graph comparison

add comparison version with easy lookup

incomplete missing call-graph tests

unhandled tests

trying to replicate missing call-edge due to missing imports ... but it's hard

also seems to be problems with the inline-expectation-value that I used, seems like it has both missing/unexpected results with same value

Python: Add import-problem test

Python: Add shadowing problem

some cleanup of rewrite fix

a little more cleanup

Add consistency queries to call-graph tests

Python: Add post-update nodes for `self` in implicit `super()` uses

But we do need to discuss whether this is the right approach :O

Fix for field-flow tests

This came from more precise argument passing

Fixed results in type-tracking

Comes from better argument passing with super() and handling of
functions with decorators

fix of inline call graph tests

Fixup call annotation test

Many minor cleanups/fixes

NewNormalCall -> NormalCall

Python: Major restructuring + qldoc writing

Python: Accept changes from pre/post update node .toString changes

Python: Reduce `super` complexity !! WIP !!

Python: Only pass self-reference if in same enclosing-callable

Python: Add call-graph test with nested class

This was inspired by the ImpliesDataflow test that showed missing flow
for q_super, but at least for the call-graph, I'm not able to reproduce
this missing result :|

Python: Restrict `super()` to function defined directly on class

Python: Accept fixes to ImpliesDataflow

Python: Expand field-flow crosstalk tests
2022-11-22 14:46:29 +01:00

248 lines
9.8 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# User-defined methods, both instance methods and class methods, can be called in many non-standard ways
# i.e. differently from simply `c.f()` or `C.f()`. For example, a user-defined `__await__` method on a
# class `C` will be called by the syntactic construct `await c` when `c` is an instance of `C`.
#
# These tests are based on the first part of https://docs.python.org/3/reference/datamodel.html.
# A thorough covering of methods in that document is found in classes.py.
#
# Intended sources should be the variable `SOURCE` and intended sinks should be
# arguments to the function `SINK` (see python/ql/test/experimental/dataflow/testConfig.qll).
import sys
import os
import functools
sys.path.append(os.path.dirname(os.path.dirname((__file__))))
from testlib import expects
# These are defined so that we can evaluate the test code.
NONSOURCE = "not a source"
SOURCE = "source"
arg1 = "source1"
arg2 = "source2"
arg3 = "source3"
arg4 = "source4"
arg5 = "source5"
arg6 = "source6"
arg7 = "source7"
def is_source(x):
return x == "source" or x == b"source" or x == 42 or x == 42.0 or x == 42j
def SINK(x, expected=SOURCE):
if is_source(x) or x == expected:
print("OK")
else:
print("Unexpected flow", x)
def SINK_F(x):
if is_source(x):
print("Unexpected flow", x)
else:
print("OK")
SINK1 = functools.partial(SINK, expected=arg1)
SINK2 = functools.partial(SINK, expected=arg2)
SINK3 = functools.partial(SINK, expected=arg3)
SINK4 = functools.partial(SINK, expected=arg4)
SINK5 = functools.partial(SINK, expected=arg5)
SINK6 = functools.partial(SINK, expected=arg6)
SINK7 = functools.partial(SINK, expected=arg7)
# Callable types
# These are the types to which the function call operation (see section Calls) can be applied:
# User-defined functions
# A user-defined function object is created by a function definition (see section Function definitions). It should be called with an argument list containing the same number of items as the function's formal parameter list.
def f(a, b):
return a
SINK(f(SOURCE, 3)) #$ flow="SOURCE -> f(..)"
# Instance methods
# An instance method object combines a class, a class instance and any callable object (normally a user-defined function).
class C(object):
def method(self, x, y):
SINK1(x)
SINK2(y)
@classmethod
def classmethod(cls, x, y):
SINK1(x)
SINK2(y)
@staticmethod
def staticmethod(x, y):
SINK1(x)
SINK2(y)
def gen(self, x, count):
n = count
while n > 0:
yield x
n -= 1
async def coro(self, x):
return x
c = C()
@expects(6)
def test_method_call():
# When an instance method object is created by retrieving a user-defined function object from a class via one of its instances, its __self__ attribute is the instance, and the method object is said to be bound. The new methods __func__ attribute is the original function object.
func_obj = c.method.__func__
# When an instance method object is called, the underlying function (__func__) is called, inserting the class instance (__self__) in front of the argument list. For instance, when C is a class which contains a definition for a function f(), and x is an instance of C, calling x.f(1) is equivalent to calling C.f(x, 1).
c.method(arg1, arg2) # $ func=C.method arg1 arg2
C.method(c, arg1, arg2) # $ func=C.method arg1 arg2
func_obj(c, arg1, arg2) # $ MISSING: func=C.method arg1 arg2
@expects(6)
def test_classmethod_call():
# When an instance method object is created by retrieving a class method object from a class or instance, its __self__ attribute is the class itself, and its __func__ attribute is the function object underlying the class method.
c_func_obj = C.classmethod.__func__
# When an instance method object is derived from a class method object, the “class instance” stored in __self__ will actually be the class itself, so that calling either x.f(1) or C.f(1) is equivalent to calling f(C,1) where f is the underlying function.
c.classmethod(arg1, arg2) # $ func=C.classmethod arg1 arg2
C.classmethod(arg1, arg2) # $ func=C.classmethod arg1 arg2
c_func_obj(C, arg1, arg2) # $ MISSING: func=C.classmethod arg1 arg2
@expects(5)
def test_staticmethod_call():
# staticmethods does not have a __func__ attribute
try:
C.staticmethod.__func__
except AttributeError:
print("OK")
# When an instance method object is derived from a class method object, the “class instance” stored in __self__ will actually be the class itself, so that calling either x.f(1) or C.f(1) is equivalent to calling f(C,1) where f is the underlying function.
c.staticmethod(arg1, arg2) # $ func=C.staticmethod arg1 arg2
C.staticmethod(arg1, arg2) # $ func=C.staticmethod arg1 arg2
# subclass
class SC(C):
pass
sc = SC()
@expects(6)
def test_subclass_method_call():
func_obj = sc.method.__func__
sc.method(arg1, arg2) # $ func=C.method arg1 arg2
SC.method(sc, arg1, arg2) # $ func=C.method arg1 arg2
func_obj(sc, arg1, arg2) # $ MISSING: func=C.method arg1 arg2
@expects(6)
def test_subclass_classmethod_call():
c_func_obj = SC.classmethod.__func__
sc.classmethod(arg1, arg2) # $ func=C.classmethod arg1 arg2
SC.classmethod(arg1, arg2) # $ func=C.classmethod arg1 arg2
c_func_obj(SC, arg1, arg2) # $ MISSING: func=C.classmethod arg1 arg2
@expects(5)
def test_subclass_staticmethod_call():
try:
SC.staticmethod.__func__
except AttributeError:
print("OK")
sc.staticmethod(arg1, arg2) # $ func=C.staticmethod arg1 arg2
SC.staticmethod(arg1, arg2) # $ func=C.staticmethod arg1 arg2
# Generator functions
# A function or method which uses the yield statement (see section The yield statement) is called a generator function. Such a function, when called, always returns an iterator object which can be used to execute the body of the function: calling the iterators iterator.__next__() method will cause the function to execute until it provides a value using the yield statement. When the function executes a return statement or falls off the end, a StopIteration exception is raised and the iterator will have reached the end of the set of values to be returned.
def gen(x, count):
n = count
while n > 0:
yield x
n -= 1
iter = gen(SOURCE, 1)
SINK(iter.__next__()) # $ MISSING: flow
# SINK_F(iter.__next__()) # throws StopIteration, FP
oiter = c.gen(SOURCE, 1)
SINK(oiter.__next__()) # $ MISSING: flow
# SINK_F(oiter.__next__()) # throws StopIteration, FP
# Coroutine functions
# A function or method which is defined using async def is called a coroutine function. Such a function, when called, returns a coroutine object. It may contain await expressions, as well as async with and async for statements. See also the Coroutine Objects section.
async def coro(x):
return x
import asyncio
SINK(asyncio.run(coro(SOURCE))) # $ MISSING: flow
SINK(asyncio.run(c.coro(SOURCE))) # $ MISSING: flow
class A:
def __await__(self):
# yield SOURCE -- see https://groups.google.com/g/dev-python/c/_lrrc-vp9TI?pli=1
return (yield from asyncio.coroutine(lambda: SOURCE)())
async def agen(x):
a = A()
return await a
SINK(asyncio.run(agen(SOURCE))) # $ MISSING: flow
# Asynchronous generator functions
# A function or method which is defined using async def and which uses the yield statement is called a asynchronous generator function. Such a function, when called, returns an asynchronous iterator object which can be used in an async for statement to execute the body of the function.
# Calling the asynchronous iterators aiterator.__anext__() method will return an awaitable which when awaited will execute until it provides a value using the yield expression. When the function executes an empty return statement or falls off the end, a StopAsyncIteration exception is raised and the asynchronous iterator will have reached the end of the set of values to be yielded.
# Built-in functions
# A built-in function object is a wrapper around a C function. Examples of built-in functions are len() and math.sin() (math is a standard built-in module). The number and type of the arguments are determined by the C function. Special read-only attributes: __doc__ is the functions documentation string, or None if unavailable; __name__ is the functions name; __self__ is set to None (but see the next item); __module__ is the name of the module the function was defined in or None if unavailable.
# Built-in methods
# This is really a different disguise of a built-in function, this time containing an object passed to the C function as an implicit extra argument. An example of a built-in method is alist.append(), assuming alist is a list object. In this case, the special read-only attribute __self__ is set to the object denoted by alist.
# Classes
# Classes are callable. These objects normally act as factories for new instances of themselves, but variations are possible for class types that override __new__(). The arguments of the call are passed to __new__() and, in the typical case, to __init__() to initialize the new instance.
# Class Instances
# Instances of arbitrary classes can be made callable by defining a __call__() method in their class.
# If a class sets __iter__() to None, calling iter() on its instances will raise a TypeError (without falling back to __getitem__()).
# 3.3.1. Basic customization
class Customized:
a = NONSOURCE
b = NONSOURCE
def __new__(cls):
cls.a = SOURCE
return super().__new__(cls)
def __init__(self):
self.b = SOURCE
# testing __new__ and __init__
customized = Customized()
SINK(Customized.a) #$ MISSING:flow="SOURCE, l:-8 -> customized.a"
SINK_F(Customized.b)
SINK(customized.a) #$ flow="SOURCE, l:-10 -> customized.a"
SINK(customized.b) #$ flow="SOURCE, l:-7 -> customized.b"
class Test2:
def __init__(self, arg):
self.x = SOURCE
self.y = arg
t = Test2(SOURCE)
SINK(t.x) # $ flow="SOURCE, l:-4 -> t.x"
SINK(t.y) # $ flow="SOURCE, l:-2 -> t.y"