Understand scalers
Module#
Author: Zakariya Abugrin | Date: May 2025
Introduction#
In this tutorial, we will see how the scalers
module can be used to scale the simulation results which can be obtained as a DataFrame
using model.get_df()
.
Import reservoirflow
#
We start with importing reservoirflow
as rf
. The abbreviation rf
refers to reservoirflow
where all modules under this library can be accessed. rf
is also used throughout the documentation. We recommend our users to stick with this convention.
1import numpy as np
2import reservoirflow as rf
3
4print(rf.__version__)
0.1.0b3
Simple Example#
Letβs say we have a vector of values between 0 and 100 with 10 as a step size. This might represent the time dimension of our simulation run. This vector can be created using numpy
as following:
1t = np.arange(0, 101, 10)
2t
array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
Now, we would like to scale these values between -1 and 1. Then we can just use MinMax
scaler as following:
1rf.scalers.MinMax((-1, 1)).fit_transform(t, axis=0)
array([-1. , -0.8, -0.6, -0.4, -0.2, 0. , 0.2, 0.4, 0.6, 0.8, 1. ])
Important
For a vector of values, either vertical or horizontal, axis=0
is required and is the default behavior. For more information, check the documentation
The same applies if we reshape this vector to a vertical vector using t.reshape(-1, 1)
but now we try to scale between 0 and 1:
1rf.scalers.MinMax((0, 1)).fit_transform(t.reshape(-1, 1), axis=0)
array([[0. ],
[0.1],
[0.2],
[0.3],
[0.4],
[0.5],
[0.6],
[0.7],
[0.8],
[0.9],
[1. ]])
Letβs say now we have a two-dimensional array with two features for time (t
) and space (x
) as following:
1t = np.arange(0, 101, 10).reshape(-1, 1)
2x = np.arange(100, 1101, 100).reshape(-1, 1)
3a = np.concatenate([t, x], axis=1)
4a
array([[ 0, 100],
[ 10, 200],
[ 20, 300],
[ 30, 400],
[ 40, 500],
[ 50, 600],
[ 60, 700],
[ 70, 800],
[ 80, 900],
[ 90, 1000],
[ 100, 1100]])
Now, we can scale each feature individually between 0 and 1 by setting axis=0
as following:
1rf.scalers.MinMax((0, 1)).fit_transform(a, axis=0)
array([[0. , 0. ],
[0.1, 0.1],
[0.2, 0.2],
[0.3, 0.3],
[0.4, 0.4],
[0.5, 0.5],
[0.6, 0.6],
[0.7, 0.7],
[0.8, 0.8],
[0.9, 0.9],
[1. , 1. ]])
To be able to transfrom
(i.e. scale) and then inverse_transform
(i.e. descale), a scaler must be created:
1scaler = rf.scalers.MinMax((0, 1))
2a_scaled = scaler.fit_transform(a, axis=0)
Now, we can use the same scaler to inverse transform the original scale:
1a_descaled = scaler.inverse_transform(a_scaled)
2a_descaled
array([[ 0., 100.],
[ 10., 200.],
[ 20., 300.],
[ 30., 400.],
[ 40., 500.],
[ 50., 600.],
[ 60., 700.],
[ 70., 800.],
[ 80., 900.],
[ 90., 1000.],
[ 100., 1100.]])
Note
Similarly, the simulation run values use transform
and inverse_transform
, to transform values using scalers
module.
Now, we try to see how these scalers can be using for a Model
class in reservoirflow
.
Build a model#
A reservoir simulation model requires two objects: Grid
and Fluid
. The function below create_model()
starts by creating these objects which are used to initiate a Model
object using BlackOil
class.
1def create_model():
2 # Grid:
3 grid = rf.grids.RegularCartesian(
4 nx=3,
5 ny=1,
6 nz=1,
7 dx=300,
8 dy=350,
9 dz=20,
10 phi=0.27,
11 kx=1,
12 ky=1, # not needed because flow direction is only x
13 kz=0.1, # not needed because flow direction is only x
14 comp=1 * 10**-6,
15 dtype="double",
16 )
17 # Fluid:
18 fluid = rf.fluids.SinglePhase(
19 mu=0.5, B=1, rho=50, comp=1 * 10**-5, dtype="double"
20 )
21 # Model:
22 model = rf.models.BlackOil(
23 grid, fluid, pi=6000, dt=5, start_date="10.10.2018", dtype="double"
24 )
25 # Production well at the first cell:
26 model.set_well(cell_id=1, q=-300, pwf=100, s=1.5, r=3.5)
27
28 # Injection well at the last cell:
29 model.set_well(cell_id=3, q=100, s=0, r=3.5)
30
31 return model
32
33
34model = create_model()
Compile the model#
Before you can run the model, you need to compile a solution for it. By compiling a solution, you actually decide the solution you want to use for your model. Interestingly, reservoirflow
provides multiple solutions for the same model based on your configuration.
Hint
Compiling solutions is the most interesting idea introduced in reservoirflow
which allows to solve the same model using different solutions so we can compare them with each other and/or combine them together.
Currently, a numerical
solution based on Finite-Difference-Method (FDM
) is available. There are two modes available under this solution which are vectorized
and symbolized
. In addition, you can also select a solver which can be direct
, iterative
, or neurical
. Developing solvers based on neural-networks is also a new idea introduced by reservoirflow
. You can read more about the available solution in the documentation.
Tip
Use vectorized
mode for better performance especially when you have a large model. Use symbolized
mode only to see how the equations of the linear equation system are built which might be very useful for small models. Use direct
for a lower computing errors as long as iterative
does not offer any additional performance boost. For now, neurical
solvers remains one of our research topics.
Below, we compile our model using a numerical
solution based on FDM
using vectorized
mode and direct
solver.
1model.compile(stype="numerical", method="FDM")
[info] FDM was assigned as model.solution.
Simulation Run#
To perform the simulation run, method model.run() can be used. The code below performs a simulation run for nsteps=10
(i.e. number of steps) and using isolver=cgs
:
1model.run(nsteps=10, isolver="cgs")
[info] Simulation run started: 10 timesteps.
[info] Simulation run of 10 steps finished in 0.05 seconds.
[info] Material Balance Error: 1.2878587085651816e-14.
Show results as pandas data frame#
The results of the Simulation run can be accessed as a pandas DateFrame
using get_df()
method. You can select columns
and scale
values. For more information, check the documentation. Using this function with the default settings:
1model.get_df(
2 columns=["time", "cells", "wells"],
3 scale=False,
4 units=True,
5)
Time [days] | Q1 [stb/day] | Q3 [stb/day] | P1 [psia] | P2 [psia] | P3 [psia] | Qw1 [stb/day] | Qw3 [stb/day] | Pwf1 [psia] | Pwf3 [psia] | |
---|---|---|---|---|---|---|---|---|---|---|
Step | ||||||||||
0 | 0 | 0.000000 | 0.0 | 6000.000000 | 6000.000000 | 6000.000000 | 0.000000 | 0.0 | 6000.0 | 6000.000000 |
1 | 5 | -210.261660 | 100.0 | 5221.608107 | 5931.302823 | 6350.798192 | -210.261660 | 100.0 | 100.0 | 8257.049147 |
2 | 10 | -187.135552 | 100.0 | 4658.296354 | 5836.844343 | 6616.368821 | -187.135552 | 100.0 | 100.0 | 8522.619775 |
3 | 15 | -170.190252 | 100.0 | 4245.538359 | 5737.879854 | 6812.162975 | -170.190252 | 100.0 | 100.0 | 8718.413930 |
4 | 20 | -157.628510 | 100.0 | 3939.556180 | 5644.106976 | 6952.530464 | -157.628510 | 100.0 | 100.0 | 8858.781419 |
5 | 25 | -148.213227 | 100.0 | 3710.216276 | 5559.252778 | 7049.748448 | -148.213227 | 100.0 | 100.0 | 8955.999403 |
6 | 30 | -141.073776 | 100.0 | 3536.311663 | 5484.032614 | 7113.998988 | -141.073776 | 100.0 | 100.0 | 9020.249943 |
7 | 35 | -135.598989 | 100.0 | 3402.955378 | 5417.869131 | 7153.286692 | -135.598989 | 100.0 | 100.0 | 9059.537647 |
8 | 40 | -131.349563 | 100.0 | 3299.446752 | 5359.648972 | 7173.910200 | -131.349563 | 100.0 | 100.0 | 9080.161155 |
9 | 45 | -128.008139 | 100.0 | 3218.055478 | 5308.155398 | 7180.729618 | -128.008139 | 100.0 | 100.0 | 9086.980573 |
10 | 50 | -125.343714 | 100.0 | 3153.154729 | 5262.255139 | 7177.457846 | -125.343714 | 100.0 | 100.0 | 9083.708800 |
Tip
using units=True
is not the default behavior but using this setting is useful to keep tracking the units used in the model.
As can be seen above, values are as expected with the corresponding units. However, it is normally required to scale some of these values (e.g. between -1 and 1 or 0 and 1) for different purposes such as comparing the solution with the analytical solution or using the simulation output to train a neural network.
When scale=True
is used then values are scaled based on the settings defined in model.scalers_dict
which is defined as following:
1model.scalers_dict
{'time': ['MinMax', (0, 1)],
'space': ['MinMax', (-1, 1)],
'pressure': ['MinMax', (-1, 1)],
'rate': [None, None]}
In this case values are scaled as following:
'time'
: scaled between 0 and 1 usingMinMax
scaler.'space'
: scaled between -1 and 1 usingMinMax
scaler.'pressure'
: scaled between -1 and 1 usingMinMax
scaler.'rate'
: not scaled.
Therefore, using scale=True
gives:
1model.get_df(
2 columns=["time", "cells", "wells"],
3 scale=True,
4 units=True,
5)
Time [scaled] | Q1 [stb/day] | Q3 [stb/day] | P1 [scaled] | P2 [scaled] | P3 [scaled] | Qw1 [stb/day] | Qw3 [stb/day] | Pwf1 [scaled] | Pwf3 [scaled] | |
---|---|---|---|---|---|---|---|---|---|---|
Step | ||||||||||
0 | 0.0 | 0.000000 | 0.0 | 0.413677 | 0.413677 | 0.413677 | 0.000000 | 0.0 | 0.413677 | 0.413677 |
1 | 0.1 | -210.261660 | 100.0 | 0.027146 | 0.379564 | 0.587875 | -210.261660 | 100.0 | -2.516126 | 1.534475 |
2 | 0.2 | -187.135552 | 100.0 | -0.252582 | 0.332658 | 0.719752 | -187.135552 | 100.0 | -2.516126 | 1.666351 |
3 | 0.3 | -170.190252 | 100.0 | -0.457548 | 0.283514 | 0.816978 | -170.190252 | 100.0 | -2.516126 | 1.763578 |
4 | 0.4 | -157.628510 | 100.0 | -0.609491 | 0.236949 | 0.886682 | -157.628510 | 100.0 | -2.516126 | 1.833281 |
5 | 0.5 | -148.213227 | 100.0 | -0.723376 | 0.194812 | 0.934958 | -148.213227 | 100.0 | -2.516126 | 1.881558 |
6 | 0.6 | -141.073776 | 100.0 | -0.809733 | 0.157460 | 0.966863 | -141.073776 | 100.0 | -2.516126 | 1.913463 |
7 | 0.7 | -135.598989 | 100.0 | -0.875955 | 0.124604 | 0.986372 | -135.598989 | 100.0 | -2.516126 | 1.932972 |
8 | 0.8 | -131.349563 | 100.0 | -0.927355 | 0.095694 | 0.996614 | -131.349563 | 100.0 | -2.516126 | 1.943214 |
9 | 0.9 | -128.008139 | 100.0 | -0.967772 | 0.070123 | 1.000000 | -128.008139 | 100.0 | -2.516126 | 1.946600 |
10 | 1.0 | -125.343714 | 100.0 | -1.000000 | 0.047330 | 0.998375 | -125.343714 | 100.0 | -2.516126 | 1.944975 |
Note
Since rates are not scaled, they are still shown using the original units. On the other hand, pressure values are scaled between -1 and 1 only for cells while values of the bottom-hole following pressure (pwf) are affected by this scaling with values beyond the limits. The reason of such behavior is that normally pwf is not considered to fit the scaler since these values are not part of the PDE solution both analytical or machine learning.
Custom Scalers#
You can define your scaling using set_scalers()
which accepts a dictionary. For more information, check the documentation.
The example below defines a time scaling between -10 an 10, while also no scaling for rate:
1model.set_scalers({"time": ["minmax", (-10, 10)], "rate": ["MinMax", None]})
2model.scalers_dict
{'time': ['MinMax', (-10, 10)],
'space': ['MinMax', (-1, 1)],
'pressure': ['MinMax', (-1, 1)],
'rate': [None, None]}
Note
Since there was no range defined for MinMax
in case of rate
, the scaling for this dimension was set to None
. Note also that other dimensions retained the default settings. Therefore, the same function can be used to update a single scaler.
1model.get_df(
2 columns=["time", "cells", "wells"],
3 scale=True,
4 units=True,
5)
Time [scaled] | Q1 [stb/day] | Q3 [stb/day] | P1 [scaled] | P2 [scaled] | P3 [scaled] | Qw1 [stb/day] | Qw3 [stb/day] | Pwf1 [scaled] | Pwf3 [scaled] | |
---|---|---|---|---|---|---|---|---|---|---|
Step | ||||||||||
0 | -10.0 | 0.000000 | 0.0 | 0.413677 | 0.413677 | 0.413677 | 0.000000 | 0.0 | 0.413677 | 0.413677 |
1 | -8.0 | -210.261660 | 100.0 | 0.027146 | 0.379564 | 0.587875 | -210.261660 | 100.0 | -2.516126 | 1.534475 |
2 | -6.0 | -187.135552 | 100.0 | -0.252582 | 0.332658 | 0.719752 | -187.135552 | 100.0 | -2.516126 | 1.666351 |
3 | -4.0 | -170.190252 | 100.0 | -0.457548 | 0.283514 | 0.816978 | -170.190252 | 100.0 | -2.516126 | 1.763578 |
4 | -2.0 | -157.628510 | 100.0 | -0.609491 | 0.236949 | 0.886682 | -157.628510 | 100.0 | -2.516126 | 1.833281 |
5 | 0.0 | -148.213227 | 100.0 | -0.723376 | 0.194812 | 0.934958 | -148.213227 | 100.0 | -2.516126 | 1.881558 |
6 | 2.0 | -141.073776 | 100.0 | -0.809733 | 0.157460 | 0.966863 | -141.073776 | 100.0 | -2.516126 | 1.913463 |
7 | 4.0 | -135.598989 | 100.0 | -0.875955 | 0.124604 | 0.986372 | -135.598989 | 100.0 | -2.516126 | 1.932972 |
8 | 6.0 | -131.349563 | 100.0 | -0.927355 | 0.095694 | 0.996614 | -131.349563 | 100.0 | -2.516126 | 1.943214 |
9 | 8.0 | -128.008139 | 100.0 | -0.967772 | 0.070123 | 1.000000 | -128.008139 | 100.0 | -2.516126 | 1.946600 |
10 | 10.0 | -125.343714 | 100.0 | -1.000000 | 0.047330 | 0.998375 | -125.343714 | 100.0 | -2.516126 | 1.944975 |
Tip
Each time you use model.get_df()
, scalers are updated automatically to adapt for any new simulation run steps added to the model. To update these scalers manually, use model.update_scalers()
function.
Comments π¬#
Feel free to make a comment, ask a question, or share your opinion about this specific content. Please keep in mind the Commenting Guidelines β.