Tutorial

<center style="margin-left: -50px;"><a href="https://mybinder.org/v2/gh/hakkelt/FunctionOperators.jl/master?filepath=examples%2FTutorial.ipynb" target="blank"><img src="https://mybinder.org/badgelogo.svg" title="Binder" alt="binderlink" style="display:inline"></a> or view on <a href="https://github.com/hakkelt/FunctionOperators.jl/blob/master/examples/Tutorial.ipynb" target="blank">GitHub</a></center>

using FunctionOperators
using BenchmarkTools

Generate some 3D data

data = [sin(i+j+k)^2 for i=1:300, j=1:300, k=1:50]
size(data)
(300, 300, 50)

Define some operators

The following constructors are available:

  • Positional constructor #1: FunctionOperator{eltype}(forw, inDims, outDims)
  • Positional constructor #2: FunctionOperator{eltype}(forw, backw, inDims, outDims)
  • Positional constructor #3: FunctionOperator{eltype}(name, forw, inDims, outDims)
  • Positional constructor #4: FunctionOperator{eltype}(name, forw, backw, inDims, outDims)
  • Keyword constructor: FunctionOperator{eltype}(;kwargs...)

where eltype is the type enforced on elements of input array.

Arguments

  • name::String (Optional but strongly recommended) The operator is referenced later in error messages by this string. Warning! It is also used to check equality of (composite) FunctionOperators. Default value: OpX where X is a number incremented in each constructor-call.
  • forw::Function Function defining the mapping. Must accept one or two arguments. In case of two arguments, the first argument is a preallocated buffer to write the result into (to speed up code by avoiding repeated allocations). In case of both one and two arguments, the return value must be the result of the mapping.
  • backw::Function (Optional) Same as backw, but defines the backward mapping
  • inDims::Tuple{Vararg{Int}} Size of input array
  • outDims::Tuple{Vararg{Int}} Size of output array

Squaring operator and square root as its adjoint operation ⟶ Dimension preserving

# Using the keyword constructor:
Op₁ = FunctionOperator{Float64}(name = "Op₁",
    forw = x -> x.^2, backw = x -> sqrt.(x),
    inDims = (300, 300, 50), outDims = (300, 300, 50))
FunctionOperator with eltype Float64
    Name: Op₁
    Input dimensions: (300, 300, 50)
    Output dimensions: (300, 300, 50)

A weighting operator that collapses the new dimension on adjoint operation ⟶ Changes size

weights = [sin((i-j)*l) + 1 for i=1:300, j=1:300, k=1:50, l=1:10]
# Using the positional constructor:
Op₂ = FunctionOperator{Float64}("Op₂",
    x -> reshape(x, 300, 300, 50, 1) .* weights, # broadcasting: 3D to 4D
    x -> reshape(sum(x ./ weights, dims=4), 300, 300, 50),
    (300, 300, 50), (300, 300, 50, 10))
FunctionOperator with eltype Float64
    Name: Op₂
    Input dimensions: (300, 300, 50)
    Output dimensions: (300, 300, 50, 10)

Apply these operators to the data

Apply the first operator: Left multiplication by the operator is equal to calling the forw function

Op₁ * data == Op₁.forw(data)
true

Result of application of the second operator: size increased

size(data), size(Op₂ * data)
((300, 300, 50), (300, 300, 50, 10))

Combine the two operators:

Op₂ * Op₁ * data == Op₂.forw(Op₁.forw(data))
true

Adjoint of operator == calling the backw function

Op₁' * Op₁ * data == Op₁.backw(Op₁.forw(data))
true

Combine operators with addition and substraction

Op₂ * (Op₁ + Op₂'*Op₂) * Op₁ * data ==
    Op₂.forw(Op₁.forw(Op₁.forw(data)) + Op₂.backw(Op₂.forw(Op₁.forw(data))))
true

I is also possible to combine with UniformScaling from LinearAlgebra library

using LinearAlgebra
Op₁ * I * data == Op₁.forw(data),
Op₁ * 3I * data == Op₁.forw(3 * data),
Op₂*(Op₁ - 2.5*I)*Op₁'*data == Op₂.forw(Op₁.forw(Op₁.backw(data))-2.5*Op₁.backw(data))
(true, true, true)

Adjoint of nested operators also work:

(Op₂ * Op₁)' * (Op₂ * Op₁) * data == (Op₂ * Op₁)' * Op₂ * Op₁ * data ==
    Op₁.backw(Op₂.backw(Op₂.forw(Op₁.forw(data))))
true

...but not with addition or substraction:

(Op₁ + 3I)' * data
Sorry, I don't know how to calculate the adjoint of ((Op₁ + (3*I)))'



Stacktrace:

 [1] error(::String) at ./error.jl:33

 [2] getPlanAddSub(::FunctionOperators.FunctionOperatorComposite{Float64}, ::FunctionOperators.Buffer, ::Bool, ::String, ::Symbol, ::Array{FunctionOperators.Buffer,1}) at /home/hakkelt/csmri/FunctionOperators/src/getPlan.jl:105

 [3] getPlan at /home/hakkelt/csmri/FunctionOperators/src/getPlan.jl:133 [inlined]

 [4] *(::FunctionOperators.FunctionOperatorComposite{Float64}, ::Array{Float64,3}) at /home/hakkelt/csmri/FunctionOperators/src/mul.jl:46

 [5] top-level scope at In[12]:1

You can store a combination of some operators, and apply it later to data:

comb_OP = 5I * Op₁
comb_OP' * comb_OP * data == (5I * Op₁)' * (5I * Op₁) * data ==
    Op₁.backw(conj(5)*(5*Op₁.forw(data)))
true

Note that adjoint operation of scaling by a constant (in this case: 5) is the scaling by the conjugate of the constant (which is equal to the original constant in case of real numbers).

Performance

Unfortunately, our naive approach above allocates a lot of memory and is quite slow (at least compared to speed we can possibly achieve)...

@benchmark Op₂*(Op₁ - 2.5*I)*Op₁'*data
BenchmarkTools.Trial: 
  memory estimate:  823.99 MiB
  allocs estimate:  320
  --------------
  minimum time:     221.485 ms (1.00% GC)
  median time:      223.937 ms (1.73% GC)
  mean time:        233.019 ms (5.57% GC)
  maximum time:     325.965 ms (32.51% GC)
  --------------
  samples:          22
  evals/sample:     1

A possible reason is that Op₂ accesses a global variable, and it is considered to be a bad practice. (See: Performance Tips)

We can avoid that by wrapping the definition of Op₂ with a function:

function getOp₂()
    weights = [sin((i-j)*l) + 1 for i=1:300, j=1:300, k=1:50, l=1:10]
    Op₂ = FunctionOperator{Float64}(name="Op₂",
        forw = x -> reshape(x, 300, 300, 50, 1) .* weights, # broadcasting: 3D to 4D
        backw = x -> reshape(sum(x ./ weights, dims=4), 300, 300, 50),
        inDims=(300, 300, 50), outDims=(300, 300, 50, 10))
end
Op₂ = getOp₂()
FunctionOperator with eltype Float64
    Name: Op₂
    Input dimensions: (300, 300, 50)
    Output dimensions: (300, 300, 50, 10)
@benchmark Op₂*(Op₁ - 2.5*I)*Op₁'*data
BenchmarkTools.Trial: 
  memory estimate:  823.99 MiB
  allocs estimate:  318
  --------------
  minimum time:     222.509 ms (0.89% GC)
  median time:      234.822 ms (1.73% GC)
  mean time:        244.348 ms (5.53% GC)
  maximum time:     340.294 ms (31.26% GC)
  --------------
  samples:          21
  evals/sample:     1

Well, it didn't solved our problem... In fact, the main reason of slowness is the excessive memory allocations; namely, all the intermediate results allocates a new array.

We can avoid that by defining the forw and backw function a bit differently: They can also accept two arguments, where the first is a preallocated buffer (with appropriate size) that is supposed to hold the output of the operation:

function getBufferedOps()
    Op₁ = FunctionOperator{Float64}(name="Op₁",
        forw = (buffer, x) -> buffer .= x.^2,
        backw = (buffer, x) -> broadcast!(sqrt, buffer, x),
        inDims = (300, 300, 50), outDims = (300, 300, 50))
    weights = [sin((i-j)*l) + 1 for i=1:300, j=1:300, k=1:50, l=1:10]
    Op₂ = FunctionOperator{Float64}(name="Op₂",
        forw = (buffer,x) -> buffer .= reshape(x, 300, 300, 50, 1) .* weights,
        backw = (buffer,x) -> dropdims(sum!(reshape(buffer, 300, 300, 50, 1), x ./ weights), dims=4),
        inDims=(300, 300, 50), outDims=(300, 300, 50, 10))
    Op₁, Op₂
end
bOp₁, bOp₂ = getBufferedOps()
(FunctionOperator{Float64}(Op₁, (300, 300, 50), (300, 300, 50)), FunctionOperator{Float64}(Op₂, (300, 300, 50), (300, 300, 50, 10)))
@benchmark bOp₂*(bOp₁ - 2.5*I)*bOp₁'*data
BenchmarkTools.Trial: 
  memory estimate:  412.00 MiB
  allocs estimate:  312
  --------------
  minimum time:     203.125 ms (0.00% GC)
  median time:      220.364 ms (0.00% GC)
  mean time:        215.907 ms (1.00% GC)
  maximum time:     221.152 ms (0.00% GC)
  --------------
  samples:          24
  evals/sample:     1

Better, but it still should be much faster...

Let's have a look at what is under the hood!

When we combine operators, nothing special happens, just a wrapper object is created that defines the connections between the operators:

Op₂*(Op₁ - 2.5*I)*Op₁'
FunctionOperatorComposite with eltype Float64
    Name: Op₂ * (Op₁ - (2.5*I)) * Op₁'
    Input dimensions: (300, 300, 50)
    Output dimensions: (300, 300, 50, 10)
    Plan: no plan

Note the last last line: "Plan: no plan" ⟶ it is going to have a significance later...

The real magic happens when we apply this composite operator to data. To see what is going on behind the scenes, let's enable verbosity.

FunctionOperators_global_settings.verbose = true
true

Now, we can see, how this composite operators work: When we apply it to data, it creates a function that aggregates the functionality of all combined operators, and preallocates buffers for the intermediate results.

Op₂*(Op₁ - 2.5*I)*Op₁' * data;
Allocation of buffer1, size: (300, 300, 50, 10)
Allocation of buffer2, size: (300, 300, 50)
Allocation of buffer3, size: (300, 300, 50)
Plan calculated: buffer1 .= Op₂.forw((buffer2 .= Op₁.backw(x); broadcast!(-, buffer3, Op₁.forw(buffer2), broadcast!(*, buffer3, 2.5, buffer2))))

On the other hand, bOp₁ and bOp₂ has a bit different aggregated function:

bOp₂*(bOp₁ - 2.5*I) * bOp₁' * data;
Allocation of buffer1, size: (300, 300, 50, 10)
Allocation of buffer2, size: (300, 300, 50)
Allocation of buffer3, size: (300, 300, 50)
Plan calculated: buffer1 .= Op₂.forw(buffer1, (buffer2 .= Op₁.backw(buffer2, x); broadcast!(-, buffer3, Op₁.forw(buffer3, buffer2), broadcast!(*, buffer2, 2.5, buffer2))))

The good thing is that the plan (along with the preallocated buffers) is cached, so if we save the combined operator to a variable, then the plan is created only once. See the difference:

bOp₂*(bOp₁ - 2.5*I) * bOp₁' * data
bOp₂*(bOp₁ - 2.5*I) * bOp₁' * data;
Allocation of buffer1, size: (300, 300, 50, 10)
Allocation of buffer2, size: (300, 300, 50)
Allocation of buffer3, size: (300, 300, 50)
Plan calculated: buffer1 .= Op₂.forw(buffer1, (buffer2 .= Op₁.backw(buffer2, x); broadcast!(-, buffer3, Op₁.forw(buffer3, buffer2), broadcast!(*, buffer2, 2.5, buffer2))))
Allocation of buffer1, size: (300, 300, 50, 10)
Allocation of buffer2, size: (300, 300, 50)
Allocation of buffer3, size: (300, 300, 50)
Plan calculated: buffer1 .= Op₂.forw(buffer1, (buffer2 .= Op₁.backw(buffer2, x); broadcast!(-, buffer3, Op₁.forw(buffer3, buffer2), broadcast!(*, buffer2, 2.5, buffer2))))
combined = bOp₂*(bOp₁ - 2.5*I) * bOp₁'
combined * data
combined * data;
Allocation of buffer1, size: (300, 300, 50, 10)
Allocation of buffer2, size: (300, 300, 50)
Allocation of buffer3, size: (300, 300, 50)
Plan calculated: buffer1 .= Op₂.forw(buffer1, (buffer2 .= Op₁.backw(buffer2, x); broadcast!(-, buffer3, Op₁.forw(buffer3, buffer2), broadcast!(*, buffer2, 2.5, buffer2))))
Allocation of buffer1, size: (300, 300, 50, 10)

Now we can see that the combined object carries the plan already created:

combined
FunctionOperatorComposite with eltype Float64
    Name: Op₂ * (Op₁ - (2.5*I)) * Op₁'
    Input dimensions: (300, 300, 50)
    Output dimensions: (300, 300, 50, 10)
    Plan: Op₂.forw(buffer1, (buffer2 .= Op₁.backw(buffer2, x); broadcast!(-, buffer3, Op₁.forw(buffer3, buffer2), broadcast!(*, buffer2, 2.5, buffer2))))

And a side-note here: We can also set this plan manually, if the computed one is wrong, or FunctionOperators was not possible to compute. For example, adjoint of addition:

tricky = (bOp₁ + 2.5I)'
FunctionOperatorComposite with eltype Float64
    Name: ((Op₁ + (2.5*I)))'
    Input dimensions: (300, 300, 50)
    Output dimensions: (300, 300, 50)
    Plan: no plan
tricky * data
Allocation of buffer1, size: (300, 300, 50)
Sorry, I don't know how to calculate the adjoint of ((Op₁ + (2.5*I)))'



Stacktrace:

 [1] error(::String) at ./error.jl:33

 [2] getPlanAddSub(::FunctionOperators.FunctionOperatorComposite{Float64}, ::FunctionOperators.Buffer, ::Bool, ::String, ::Symbol, ::Array{FunctionOperators.Buffer,1}) at /home/hakkelt/csmri/FunctionOperators/src/getPlan.jl:105

 [3] getPlan at /home/hakkelt/csmri/FunctionOperators/src/getPlan.jl:133 [inlined]

 [4] *(::FunctionOperators.FunctionOperatorComposite{Float64}, ::Array{Float64,3}) at /home/hakkelt/csmri/FunctionOperators/src/mul.jl:46

 [5] top-level scope at In[27]:1
setPlan(tricky, (buffer, x) -> @.(√(2 - x) / √(2x)), "√(2 - x) / √(2x)")
tricky
FunctionOperatorComposite with eltype Float64
    Name: ((Op₁ + (2.5*I)))'
    Input dimensions: (300, 300, 50)
    Output dimensions: (300, 300, 50)
    Plan: √(2 - x) / √(2x)
tricky * data == @. √(2 - data) / √(2data)
Allocation of buffer1, size: (300, 300, 50)
true

But back to the question of performance: If we preallocate an array for the output manually, and use mul!, then we can save also the reallocation of buffer1:

combined = bOp₂ * (bOp₁ - 2.5*I) * bOp₁'
output = Array{Float64}(undef, 300, 300, 50, 10)
mul!(output, combined, data)
mul!(output, combined, data);
buffer1 = <previously allocated>
Allocation of buffer2, size: (300, 300, 50)
Allocation of buffer3, size: (300, 300, 50)
Plan calculated: buffer1 .= Op₂.forw(buffer1, (buffer2 .= Op₁.backw(buffer2, x); broadcast!(-, buffer3, Op₁.forw(buffer3, buffer2), broadcast!(*, buffer2, 2.5, buffer2))))

If we apply the combined operator multiple times, we can save a lot on computation time:

FunctionOperators_global_settings.verbose = false
@benchmark mul!(output, combined, data)
BenchmarkTools.Trial: 
  memory estimate:  256 bytes
  allocs estimate:  7
  --------------
  minimum time:     137.604 ms (0.00% GC)
  median time:      137.707 ms (0.00% GC)
  mean time:        137.705 ms (0.00% GC)
  maximum time:     137.920 ms (0.00% GC)
  --------------
  samples:          37
  evals/sample:     1

Let's compare it to a manually function with identical function and optimizations

function getAggregatedFunction()
    weights = [sin((i-j)*l) + 1 for i=1:300, j=1:300, k=1:50, l=1:10]
    buffer2 = Array{Float64}(undef, 300, 300, 50)
    buffer3 = Array{Float64}(undef, 300, 300, 50)
    buffer4 = Array{Float64}(undef, 300, 300, 50)
    (buffer, x) -> begin
        broadcast!(sqrt, buffer2, x)  # Of course, this two lines can be optimized to
        buffer3 .= buffer2 .^ 2       # (√x)^2 = |x|, but let's now avoid this fact
        broadcast!(-, buffer3, buffer3, broadcast!(*, buffer4, 2.5, buffer2))
        buffer .= reshape(buffer3, 300, 300, 50, 1) .* weights
    end
end
getAggregatedFunction (generic function with 1 method)
aggrFun = getAggregatedFunction()
@benchmark aggrFun(output, data)
BenchmarkTools.Trial: 
  memory estimate:  128 bytes
  allocs estimate:  2
  --------------
  minimum time:     146.955 ms (0.00% GC)
  median time:      147.060 ms (0.00% GC)
  mean time:        147.075 ms (0.00% GC)
  maximum time:     147.415 ms (0.00% GC)
  --------------
  samples:          34
  evals/sample:     1

Basically, there is no overhead of using FunctionOperators!

Syntactic sugar

Let's consider the following function:

function foo1(A, bOp₁, bOp₂)
    for i in 1:10
        C = (bOp₁ - 2.5*I) * bOp₁ * A
        B = bOp₁ * (C - 3A)
        A .= bOp₁ * (C + 2B)
        A ./= maximum(bOp₂ * A)
    end
end
foo1 (generic function with 1 method)
@benchmark foo1(copy(data), bOp₁, bOp₂)
BenchmarkTools.Trial: 
  memory estimate:  6.07 GiB
  allocs estimate:  2292
  --------------
  minimum time:     3.743 s (1.29% GC)
  median time:      3.840 s (2.67% GC)
  mean time:        3.840 s (2.67% GC)
  maximum time:     3.937 s (3.99% GC)
  --------------
  samples:          2
  evals/sample:     1

Using the methods we have seen earlier, we can quickly optimize this code, and we get something like that:

function foo2(A, bOp₁, bOp₂)
    combOp = (bOp₁ - 2.5*I) * bOp₁
    C = similar(A)
    buffer1 = similar(A)
    B = similar(A)
    buffer2 = Array{Float64}(undef, (300, 300, 50, 10))
    for i = 1:10
        mul!(C, combOp, A)
        @. buffer1 = C - 3A
        mul!(B, bOp₁, buffer1)
        @. buffer1 = C + 2B
        mul!(A, bOp₁, buffer1)
        A ./= maximum(mul!(buffer2, bOp₂, A))
    end
end
foo2 (generic function with 1 method)
@benchmark foo2(copy(data), bOp₁, bOp₂)
BenchmarkTools.Trial: 
  memory estimate:  515.00 MiB
  allocs estimate:  284
  --------------
  minimum time:     2.597 s (0.00% GC)
  median time:      2.626 s (0.08% GC)
  mean time:        2.626 s (0.08% GC)
  maximum time:     2.656 s (0.16% GC)
  --------------
  samples:          2
  evals/sample:     1

This speedup is pretty much pleasing, but the tradeoff is that the code is much less readable now. To avoid the mess caused by manual optimization, the FunctionOperators library offers the @♻ macro that does the same automatically using the following markers: 🔝, 🔃, and @🔃.

?@♻

Recycling macro: Reduce the number of allocations inside a for loop by preallocation of arrays for the outputs of marked operations. Markers: @♻ (\:recycle:), 🔝 (\:top:), 🔃 (\:arrows_clockwise:), and @🔃

Macro @♻ should be placed right before a for loop, and then it executes the following substitutions:

  • Expressions marked by 🔝:

They are going to be calculated before the loop, the result is stored in a variable, and the expression will be replaced by that variable. It also can be useful when a constant expression is used in the loop, but the idea behind creating that substitution is to allow caching of composite FunctionMatrices. Eg:

@♻ for i=1:5
    result = 🔝((FuncOp₁ + 2I) * FuncOp₂) * data
end

will be transformed to

🔝_1 = (FuncOp₁ + 2I) * FuncOp₂
for i = 1:5
    result = 🔝_1 * data
end

so that way plan is calculated only once, and also buffers for intermediate results of the composite operator are allocated once.

  • Expressions marked by 🔃:

They are going to be calculated before the loop (to allocate an array to store the result), but the expression is also evaluated in each loop iteration. The difference after the substitution is that the result of the expression is always saved to the preallocated array. Eg:

@♻ for i=1:5
    result = FuncOp₁ * 🔃(A + B)
end

will be transformed to

🔃_1 = A + B
for i = 1:5
    result = FuncOp₁ * @.(🔃_1 = A + B)
end

This transformation first allocates an array named 🔃_1, and then in every iteration it is recalculated, saved to 🔃_1, and the this value is used for the rest of the operation (i.e.: FuncOp₁ * 🔃_1. Note that @. macro is inserted before the inline assignment. This is needed otherwise A + B would allocate a new array before it is stored in 🔃_1. Warning! It can break your code, e.g. @.(🔃_1 = A * B) ≠ (🔃_1 = A .* B) {matrix multiplication vs. elementwise multiplication}! On the other hand, when the marked expression consists only a multiplication, then it is transformed into a call of mul!. Eg:

@♻ for i=1:5
    result = FuncOp₁ * 🔃(A * B)
end

will be transformed to

🔃_1 = A * B
for i = 1:5
    result = FuncOp₁ * mul!(🔃_1, A, B)
end
  • Lastly, assignments marked by @🔃:

They will be transformed into a call of mul!. Of course, it works only if @🔃 is directly followed by an assignment that has a single multiplication on the right side. Eg:

@♻ for i=1:5
    @🔃 result = FuncOp₁ * A
end

will be transformed to

result = FuncOp₁ * A
for i = 1:5
    mul!(result, FuncOp₁, A)
end

Final note: 🔝 can be arbitrarily nested, and it can be embedded in expressions marked by 🔃. 🔃 can also be nested, and it can be used in assigments marked by @🔃 (along with 🔝, of course).

In our example:

FunctionOperators_global_settings.macro_verbose = true # if true, @♻ prints the transformed loop
function foo3(A, bOp₁, bOp₂)
    @♻ for i in 1:10
        @🔃 C = 🔝((bOp₁ - 2.5*I) * bOp₁) * A
        @🔃 B = bOp₁ * 🔃(C - 3A)
        @🔃 A .= bOp₁ * 🔃(C + 2B)
        A ./= maximum(🔃(bOp₂ * A))
    end
end
begin
    🔝_1 = (bOp₁ - 2.5I) * bOp₁
    C = 🔝_1 * A
    🔃_2 = C - 3A
    🔃_3 = 3A
    B = bOp₁ * (🔃_3 .= C .- mul!(🔃_3, 3, A))
    🔃_5 = C + 2B
    🔃_6 = 2B
    🔃_7 = bOp₂ * A
    for i = 1:10
        mul!(C, 🔝_1, A)
        mul!(B, bOp₁, 🔃_3 .= C .- mul!(🔃_3, 3, A))
        mul!(A, bOp₁, 🔃_6 .= C .+ mul!(🔃_6, 2, B))
        A ./= maximum(mul!(🔃_7, bOp₂, A))
    end
end
foo3 (generic function with 1 method)
@benchmark foo3(copy(data), bOp₁, bOp₂)
BenchmarkTools.Trial: 
  memory estimate:  686.66 MiB
  allocs estimate:  419
  --------------
  minimum time:     2.962 s (0.00% GC)
  median time:      3.017 s (0.12% GC)
  mean time:        3.017 s (0.12% GC)
  maximum time:     3.072 s (0.24% GC)
  --------------
  samples:          2
  evals/sample:     1

It is slightly slower and requires a bit more memory allocations because it can't detect if a buffer can be reused. But when the loop body consists of a lot of computationally heavy operations, then the difference is mostly negligible.

However, use of @♻ is still tedious for more complex algorithms. Fortunately, the same (or even better) optimization can be achieved by using the @recycle macro!

?@recycle

Speed up iteratively executed code fragments with many matrix operations by transforming code in such a way that preserves arrays allocated for intermediate results, and re-use them for subsequent iterations.

First variant:

@recycle <code to be optimized>

Second variant:

@recycle(arrays = [<list of array variables>], funops = [<list of funop variables], numbers = [<list of number variables>], <code to be optimized>)

The first variant is the more convenient one that tries to guess the type of variables (the other variant requires its user to declare explicitly the list of variables which are type of Array, FunOp, and Number. As a tradeoff, this variant fails when the optimized code contains either a closure or non-const global variable.

The second variant is the more flexible (and also more verbose) one one that requires its user to declare explicitly the list of variables which are type of Array, FunOp, and Number. The other variant tries to guess the type of variables, thus it is more convenient, but as a tradeoff, that variant fails when the optimized code contains either a closure or non-const global variable. On the other hand, this (more verbose) variant is free from these limitations. Note: All of the "keyword arguments" are optional, and also their order is arbitrary.

An example to first variant: This function

function foo()
    A = rand(100,100)
    B = rand(100,100)
    @recycle for i = 1:5
        A += A / 2 + B
        C = A * B + 5
    end
end

is turned into the following:

function foo()
    A = rand(100,100)
    B = rand(100,100)
    (C, 🔃₂) = fill(nothing, 2)
    (is_first_run₁,) = fill(true, 1)
    for i = 1:5
        A .+= A ./ 2 .+ B
        if is_first_run₁
            is_first_run₁ = false
            🔃₂ = A * B
            C = 🔃₂ .+ 5
        else
            mul!(🔃₂, A, B)
            C .= 🔃₂ .+ 5
        end
    end
end

Another example showing what second variant can do (and the first can't):

bar = @recycle(arrays=[A,B], (A, B) -> begin
    A += A / 2 + B
    B = A * B .+ 5
end)
function baz()
    A = rand(100,100)
    B = rand(100,100)
    for i = 1:5
        bar(A,B)
    end
end

is turned into the following:

bar = begin
    (🔃₁,) = fill(nothing, 1)
    (is_first_run₁,) = fill(true, 1)
    (A, B)->begin
            A .+= A ./ 2 .+ B
            begin
                if is_first_run₁
                    is_first_run₁ = false
                    🔃₁ = A * B
                else
                    mul!(🔃₁, A, B)
                end
                B .= 🔃₁ .+ 5
            end
        end
end
function baz()
    A = rand(100,100)
    B = rand(100,100)
    for i = 1:5
        bar(A,B)
    end
end
function foo4(A, bOp₁, bOp₂)
    @recycle for i in 1:10
        C = (bOp₁ - 2.5*I) * bOp₁ * A
        B = bOp₁ * (C - 3A)
        A .= bOp₁ * (C + 2B)
        A ./= maximum(bOp₂ * A)
    end
end
foo4 (generic function with 1 method)
@benchmark foo4(copy(data), bOp₁, bOp₂)
1 | begin
  2 |     🔝₁ = bOp₁ - 2.5I
  3 |     🔝₂ = 🔝₁ * bOp₁
  4 |     (🔃₁, 🔃₁, 🔃₃, 🔃₃, 🔃₅) = fill(nothing, 5)
  5 |     (is_first_run₁, is_first_run₂, is_first_run₃, is_first_run₄) = fill(true, 4)
  6 |     for i = 1:10
  7 |         if is_first_run₁
  8 |             is_first_run₁ = false
  9 |             C = 🔝₂ * A
 10 |         else
 11 |             mul!(C, 🔝₂, A)
 12 |         end
 13 |         if is_first_run₂
 14 |             is_first_run₂ = false
 15 |             🔃₁ = C .- 3 .* A
 16 |             B = bOp₁ * 🔃₁
 17 |         else
 18 |             🔃₁ .= C .- 3 .* A
 19 |             mul!(B, bOp₁, 🔃₁)
 20 |         end
 21 |         if is_first_run₃
 22 |             is_first_run₃ = false
 23 |             🔃₃ = C .+ 2 .* B
 24 |         else
 25 |             🔃₃ .= C .+ 2 .* B
 26 |         end
 27 |         mul!(A, bOp₁, 🔃₃)
 28 |         if is_first_run₄
 29 |             is_first_run₄ = false
 30 |             🔃₅ = bOp₂ * A
 31 |         else
 32 |             mul!(🔃₅, bOp₂, A)
 33 |         end
 34 |         A ./= maximum(🔃₅)
 35 |     end
 36 | end
BenchmarkTools.Trial: 
  memory estimate:  549.33 MiB
  allocs estimate:  367
  --------------
  minimum time:     2.350 s (0.00% GC)
  median time:      2.560 s (0.00% GC)
  mean time:        2.521 s (0.10% GC)
  maximum time:     2.652 s (0.00% GC)
  --------------
  samples:          3
  evals/sample:     1

Notice that it is even better than foo3 (which is optimized with @♻)!

Another comment: the first (more conveniable) variant of @recycle uses a generated function to infer the types of variables. That's why it is unable to optimize closures, and codes containing non-const global variables, and also, this is the reason why it prints the optimized code when the optimized function executed first. Of course, if you set FunctionOperators_global_settings to false, then it is not printed at first execution.

Third comment: both variant of @recycle macro increases significantly the compilation time.

Furthermore, it is also possible to optimize small functions/closures with @recycle, if they called multiple times, and the size of their input arrays are always the same. However, we also need to declare explicitly which variables are Arrays, FunOps, or Numbers in this case.

function getCost(Op)
    (A,scaler) -> norm(Op' * (A / scaler), 1) + norm(A, 2)
end
cost = getCost(bOp₁)
#50 (generic function with 1 method)
@benchmark cost(data, 2)
BenchmarkTools.Trial: 
  memory estimate:  68.67 MiB
  allocs estimate:  103
  --------------
  minimum time:     56.325 ms (0.00% GC)
  median time:      57.134 ms (0.00% GC)
  mean time:        58.370 ms (1.57% GC)
  maximum time:     80.134 ms (27.98% GC)
  --------------
  samples:          86
  evals/sample:     1
function getCost_recycle(Op)
    @recycle(arrays=[A], funops=[Op], numbers=[scaler],
        (A,scaler) -> norm(Op' * (A / scaler), 1) + norm(A, 2))
end
cost2 = getCost_recycle(bOp₁)
1 | begin
  2 |     🔝₁ = Op'
  3 |     (🔃₁, 🔃₂, 🔃₂) = fill(nothing, 3)
  4 |     (is_first_run₁,) = fill(true, 1)
  5 |     (A, scaler)->begin
  6 |             if is_first_run₁
  7 |                 is_first_run₁ = false
  8 |                 🔃₂ = A ./ scaler
  9 |                 🔃₁ = 🔝₁ * 🔃₂
 10 |             else
 11 |                 🔃₂ .= A ./ scaler
 12 |                 mul!(🔃₁, 🔝₁, 🔃₂)
 13 |             end
 14 |             norm(🔃₁, 1) + norm(A, 2)
 15 |         end
 16 | end
#53 (generic function with 1 method)
@benchmark cost2(data, 2)
BenchmarkTools.Trial: 
  memory estimate:  96 bytes
  allocs estimate:  5
  --------------
  minimum time:     41.945 ms (0.00% GC)
  median time:      42.068 ms (0.00% GC)
  mean time:        42.076 ms (0.00% GC)
  maximum time:     43.024 ms (0.00% GC)
  --------------
  samples:          119
  evals/sample:     1

Further notes

Global settings

?FunctionOperators_global_settings
search: FunctionOperators_global_settings

Object that holds global settings for FunctionOperators library

Fields:

  • verbose::Bool If set to true, then allocation information and calculated plan function will be displayed upon creation (i.e., when a composite operator is first used). Default: false
  • macro_verbose::Bool If set to true, then recycling macros (@♻ and @recycle) will print the transformed code. Default: false
  • auto_reshape::Bool If set to true, then input and output is reshaped according to the inDims and outDims values of the FunctionOperator before and after any multiplication. Default: false

Example for auto_reshape

op = FunctionOperator{Float64}(forw = x -> x[:], inDims = (3,2), outDims = (2,3))
size(op * ones(3,2))
(6,)
FunctionOperators_global_settings.auto_reshape = true
size(op * ones(6, 1)) # reshaped both before and after!
(2, 3)

Equality operator

Equality check based on name

HEADS UP! This works only when operators with the same name has also the same functionality!

It performs basic arithmetic transformations on the expressions, so it recognizes even some less obvious equalities. The rules it uses:

  • Associativity: $op1 * op2 * op3 = (op1 * op2) * op3 = op1 * (op2 * op3)$, $op1 + op2 + op3 = (op1 + op2) + op3 = op1 + (op2 + op3)$, $op1 + (op2 - op3) == (op1 + op2) - op3$, $op1 - (op2 + op3) == (op1 - op2) - op3$
  • Commutativity: $op1 - op2 - op3 = op1 - op3 - op2$, $op1 + op2 = op2 + op1$
  • Distributivity: $(op1 + op2) * op3 = op1 * op3 + op2 * op3$ (Note that $op1 * (op2 + op3) ≠ op1 * op2 + op1 * op3$)

In our case, this implies:

# Note that Op₁ and bOp₁ are defined with the same name field!
(bOp₁ + 2.5*I) * bOp₁ == (Op₁ + 2.5*I) * Op₁ == 2.5*I * Op₁ + bOp₁ * Op₁
true

Superclass

Combination of FunctionOperator objects are type of FunctionOperatorComposite. Both class is subclass of FunOp.

bOp₁ isa FunctionOperator,
bOp₁ isa FunOp,
bOp₂ * bOp₁ isa FunctionOperator, # false bacause it is FunctionOperatorComposite
bOp₂ * bOp₁ isa FunOp
(true, true, false, true)