Understand `scalers` Module#

Author: Zakariya Abugrin | Date: May 2025

Introduction#

In this tutorial, we will see how the scalers module can be used to scale the simulation results which can be obtained as a DataFrame using model.get_df().

Import `reservoirflow`#

We start with importing reservoirflow as rf. The abbreviation rf refers to reservoirflow where all modules under this library can be accessed. rf is also used throughout the documentation. We recommend our users to stick with this convention.

import numpy as np
import reservoirflow as rf

print(rf.__version__)

0.1.0b3

Simple Example#

Let’s say we have a vector of values between 0 and 100 with 10 as a step size. This might represent the time dimension of our simulation run. This vector can be created using numpy as following:

t = np.arange(0, 101, 10)
t

array([  0,  10,  20,  30,  40,  50,  60,  70,  80,  90, 100])

Now, we would like to scale these values between -1 and 1. Then we can just use MinMax scaler as following:

rf.scalers.MinMax((-1, 1)).fit_transform(t, axis=0)

array([-1. , -0.8, -0.6, -0.4, -0.2,  0. ,  0.2,  0.4,  0.6,  0.8,  1. ])

Important

For a vector of values, either vertical or horizontal, axis=0 is required and is the default behavior. For more information, check the documentation

The same applies if we reshape this vector to a vertical vector using t.reshape(-1, 1) but now we try to scale between 0 and 1:

rf.scalers.MinMax((0, 1)).fit_transform(t.reshape(-1, 1), axis=0)

array([[0. ],
       [0.1],
       [0.2],
       [0.3],
       [0.4],
       [0.5],
       [0.6],
       [0.7],
       [0.8],
       [0.9],
       [1. ]])

Let’s say now we have a two-dimensional array with two features for time (t) and space (x) as following:

t = np.arange(0, 101, 10).reshape(-1, 1)
x = np.arange(100, 1101, 100).reshape(-1, 1)
a = np.concatenate([t, x], axis=1)
a

array([[   0,  100],
       [  10,  200],
       [  20,  300],
       [  30,  400],
       [  40,  500],
       [  50,  600],
       [  60,  700],
       [  70,  800],
       [  80,  900],
       [  90, 1000],
       [ 100, 1100]])

Now, we can scale each feature individually between 0 and 1 by setting axis=0 as following:

rf.scalers.MinMax((0, 1)).fit_transform(a, axis=0)

array([[0. , 0. ],
       [0.1, 0.1],
       [0.2, 0.2],
       [0.3, 0.3],
       [0.4, 0.4],
       [0.5, 0.5],
       [0.6, 0.6],
       [0.7, 0.7],
       [0.8, 0.8],
       [0.9, 0.9],
       [1. , 1. ]])

To be able to transfrom (i.e. scale) and then inverse_transform (i.e. descale), a scaler must be created:

scaler = rf.scalers.MinMax((0, 1))
a_scaled = scaler.fit_transform(a, axis=0)

Now, we can use the same scaler to inverse transform the original scale:

a_descaled = scaler.inverse_transform(a_scaled)
a_descaled

array([[   0.,  100.],
       [  10.,  200.],
       [  20.,  300.],
       [  30.,  400.],
       [  40.,  500.],
       [  50.,  600.],
       [  60.,  700.],
       [  70.,  800.],
       [  80.,  900.],
       [  90., 1000.],
       [ 100., 1100.]])

Note

Similarly, the simulation run values use transform and inverse_transform, to transform values using scalers module.

Now, we try to see how these scalers can be using for a Model class in reservoirflow.

Build a model#

A reservoir simulation model requires two objects: Grid and Fluid. The function below create_model() starts by creating these objects which are used to initiate a Model object using BlackOil class.

def create_model():
    # Grid:
    grid = rf.grids.RegularCartesian(
        nx=3,
        ny=1,
        nz=1,
        dx=300,
        dy=350,
        dz=20,
        phi=0.27,
        kx=1,
        ky=1,  # not needed because flow direction is only x
        kz=0.1,  # not needed because flow direction is only x
        comp=1 * 10**-6,
        dtype="double",
    )
    # Fluid:
    fluid = rf.fluids.SinglePhase(
        mu=0.5, B=1, rho=50, comp=1 * 10**-5, dtype="double"
    )
    # Model:
    model = rf.models.BlackOil(
        grid, fluid, pi=6000, dt=5, start_date="10.10.2018", dtype="double"
    )
    # Production well at the first cell:
    model.set_well(cell_id=1, q=-300, pwf=100, s=1.5, r=3.5)

    # Injection well at the last cell:
    model.set_well(cell_id=3, q=100, s=0, r=3.5)

    return model


model = create_model()

Compile the model#

Before you can run the model, you need to compile a solution for it. By compiling a solution, you actually decide the solution you want to use for your model. Interestingly, reservoirflow provides multiple solutions for the same model based on your configuration.

Hint

Compiling solutions is the most interesting idea introduced in reservoirflow which allows to solve the same model using different solutions so we can compare them with each other and/or combine them together.

Currently, a numerical solution based on Finite-Difference-Method (FDM) is available. There are two modes available under this solution which are vectorized and symbolized. In addition, you can also select a solver which can be direct, iterative, or neurical. Developing solvers based on neural-networks is also a new idea introduced by reservoirflow. You can read more about the available solution in the documentation.

Tip

Use vectorized mode for better performance especially when you have a large model. Use symbolized mode only to see how the equations of the linear equation system are built which might be very useful for small models. Use direct for a lower computing errors as long as iterative does not offer any additional performance boost. For now, neurical solvers remains one of our research topics.

Below, we compile our model using a numerical solution based on FDM using vectorized mode and direct solver.

model.compile(stype="numerical", method="FDM")

[info] FDM was assigned as model.solution.

Simulation Run#

To perform the simulation run, method model.run() can be used. The code below performs a simulation run for nsteps=10 (i.e. number of steps) and using isolver=cgs:

model.run(nsteps=10, isolver="cgs")

[info] Simulation run started: 10 timesteps.
[info] Simulation run of 10 steps finished in 0.05 seconds.
[info] Material Balance Error: 1.2878587085651816e-14.

Show results as pandas data frame#

The results of the Simulation run can be accessed as a pandas DateFrame using get_df() method. You can select columns and scale values. For more information, check the documentation. Using this function with the default settings:

model.get_df(
    columns=["time", "cells", "wells"],
    scale=False,
    units=True,
)

	Time [days]	Q1 [stb/day]	Q3 [stb/day]	P1 [psia]	P2 [psia]	P3 [psia]	Qw1 [stb/day]	Qw3 [stb/day]	Pwf1 [psia]	Pwf3 [psia]
Step
0	0	0.000000	0.0	6000.000000	6000.000000	6000.000000	0.000000	0.0	6000.0	6000.000000
1	5	-210.261660	100.0	5221.608107	5931.302823	6350.798192	-210.261660	100.0	100.0	8257.049147
2	10	-187.135552	100.0	4658.296354	5836.844343	6616.368821	-187.135552	100.0	100.0	8522.619775
3	15	-170.190252	100.0	4245.538359	5737.879854	6812.162975	-170.190252	100.0	100.0	8718.413930
4	20	-157.628510	100.0	3939.556180	5644.106976	6952.530464	-157.628510	100.0	100.0	8858.781419
5	25	-148.213227	100.0	3710.216276	5559.252778	7049.748448	-148.213227	100.0	100.0	8955.999403
6	30	-141.073776	100.0	3536.311663	5484.032614	7113.998988	-141.073776	100.0	100.0	9020.249943
7	35	-135.598989	100.0	3402.955378	5417.869131	7153.286692	-135.598989	100.0	100.0	9059.537647
8	40	-131.349563	100.0	3299.446752	5359.648972	7173.910200	-131.349563	100.0	100.0	9080.161155
9	45	-128.008139	100.0	3218.055478	5308.155398	7180.729618	-128.008139	100.0	100.0	9086.980573
10	50	-125.343714	100.0	3153.154729	5262.255139	7177.457846	-125.343714	100.0	100.0	9083.708800

Tip

using units=True is not the default behavior but using this setting is useful to keep tracking the units used in the model.

As can be seen above, values are as expected with the corresponding units. However, it is normally required to scale some of these values (e.g. between -1 and 1 or 0 and 1) for different purposes such as comparing the solution with the analytical solution or using the simulation output to train a neural network.

When scale=True is used then values are scaled based on the settings defined in model.scalers_dict which is defined as following:

model.scalers_dict

{'time': ['MinMax', (0, 1)],
 'space': ['MinMax', (-1, 1)],
 'pressure': ['MinMax', (-1, 1)],
 'rate': [None, None]}

In this case values are scaled as following:

'time': scaled between 0 and 1 using MinMax scaler.
'space': scaled between -1 and 1 using MinMax scaler.
'pressure': scaled between -1 and 1 using MinMax scaler.
'rate': not scaled.

Therefore, using scale=True gives:

model.get_df(
    columns=["time", "cells", "wells"],
    scale=True,
    units=True,
)

	Time [scaled]	Q1 [stb/day]	Q3 [stb/day]	P1 [scaled]	P2 [scaled]	P3 [scaled]	Qw1 [stb/day]	Qw3 [stb/day]	Pwf1 [scaled]	Pwf3 [scaled]
Step
0	0.0	0.000000	0.0	0.413677	0.413677	0.413677	0.000000	0.0	0.413677	0.413677
1	0.1	-210.261660	100.0	0.027146	0.379564	0.587875	-210.261660	100.0	-2.516126	1.534475
2	0.2	-187.135552	100.0	-0.252582	0.332658	0.719752	-187.135552	100.0	-2.516126	1.666351
3	0.3	-170.190252	100.0	-0.457548	0.283514	0.816978	-170.190252	100.0	-2.516126	1.763578
4	0.4	-157.628510	100.0	-0.609491	0.236949	0.886682	-157.628510	100.0	-2.516126	1.833281
5	0.5	-148.213227	100.0	-0.723376	0.194812	0.934958	-148.213227	100.0	-2.516126	1.881558
6	0.6	-141.073776	100.0	-0.809733	0.157460	0.966863	-141.073776	100.0	-2.516126	1.913463
7	0.7	-135.598989	100.0	-0.875955	0.124604	0.986372	-135.598989	100.0	-2.516126	1.932972
8	0.8	-131.349563	100.0	-0.927355	0.095694	0.996614	-131.349563	100.0	-2.516126	1.943214
9	0.9	-128.008139	100.0	-0.967772	0.070123	1.000000	-128.008139	100.0	-2.516126	1.946600
10	1.0	-125.343714	100.0	-1.000000	0.047330	0.998375	-125.343714	100.0	-2.516126	1.944975

Note

Since rates are not scaled, they are still shown using the original units. On the other hand, pressure values are scaled between -1 and 1 only for cells while values of the bottom-hole following pressure (pwf) are affected by this scaling with values beyond the limits. The reason of such behavior is that normally pwf is not considered to fit the scaler since these values are not part of the PDE solution both analytical or machine learning.

Custom Scalers#

You can define your scaling using set_scalers() which accepts a dictionary. For more information, check the documentation.

The example below defines a time scaling between -10 an 10, while also no scaling for rate:

model.set_scalers({"time": ["minmax", (-10, 10)], "rate": ["MinMax", None]})
model.scalers_dict

{'time': ['MinMax', (-10, 10)],
 'space': ['MinMax', (-1, 1)],
 'pressure': ['MinMax', (-1, 1)],
 'rate': [None, None]}

Note

Since there was no range defined for MinMax in case of rate, the scaling for this dimension was set to None. Note also that other dimensions retained the default settings. Therefore, the same function can be used to update a single scaler.

model.get_df(
    columns=["time", "cells", "wells"],
    scale=True,
    units=True,
)

	Time [scaled]	Q1 [stb/day]	Q3 [stb/day]	P1 [scaled]	P2 [scaled]	P3 [scaled]	Qw1 [stb/day]	Qw3 [stb/day]	Pwf1 [scaled]	Pwf3 [scaled]
Step
0	-10.0	0.000000	0.0	0.413677	0.413677	0.413677	0.000000	0.0	0.413677	0.413677
1	-8.0	-210.261660	100.0	0.027146	0.379564	0.587875	-210.261660	100.0	-2.516126	1.534475
2	-6.0	-187.135552	100.0	-0.252582	0.332658	0.719752	-187.135552	100.0	-2.516126	1.666351
3	-4.0	-170.190252	100.0	-0.457548	0.283514	0.816978	-170.190252	100.0	-2.516126	1.763578
4	-2.0	-157.628510	100.0	-0.609491	0.236949	0.886682	-157.628510	100.0	-2.516126	1.833281
5	0.0	-148.213227	100.0	-0.723376	0.194812	0.934958	-148.213227	100.0	-2.516126	1.881558
6	2.0	-141.073776	100.0	-0.809733	0.157460	0.966863	-141.073776	100.0	-2.516126	1.913463
7	4.0	-135.598989	100.0	-0.875955	0.124604	0.986372	-135.598989	100.0	-2.516126	1.932972
8	6.0	-131.349563	100.0	-0.927355	0.095694	0.996614	-131.349563	100.0	-2.516126	1.943214
9	8.0	-128.008139	100.0	-0.967772	0.070123	1.000000	-128.008139	100.0	-2.516126	1.946600
10	10.0	-125.343714	100.0	-1.000000	0.047330	0.998375	-125.343714	100.0	-2.516126	1.944975

Tip

Each time you use model.get_df(), scalers are updated automatically to adapt for any new simulation run steps added to the model. To update these scalers manually, use model.update_scalers() function.

Comments 💬#

Feel free to make a comment, ask a question, or share your opinion about this specific content. Please keep in mind the Commenting Guidelines ⚖.

Understand scalers Module#

Introduction#

Import reservoirflow#

Simple Example#

Build a model#

Compile the model#

Simulation Run#

Show results as pandas data frame#

Custom Scalers#

Comments 💬#

Understand `scalers` Module#

Import `reservoirflow`#