Understand scalers Module#

Open in Colab

Author: Zakariya Abugrin | Date: May 2025

Introduction#

In this tutorial, we will see how the scalers module can be used to scale the simulation results which can be obtained as a DataFrame using model.get_df().

Import reservoirflow#

We start with importing reservoirflow as rf. The abbreviation rf refers to reservoirflow where all modules under this library can be accessed. rf is also used throughout the documentation. We recommend our users to stick with this convention.

1import numpy as np
2import reservoirflow as rf
3
4print(rf.__version__)
0.1.0b3

Simple Example#

Let’s say we have a vector of values between 0 and 100 with 10 as a step size. This might represent the time dimension of our simulation run. This vector can be created using numpy as following:

1t = np.arange(0, 101, 10)
2t
array([  0,  10,  20,  30,  40,  50,  60,  70,  80,  90, 100])

Now, we would like to scale these values between -1 and 1. Then we can just use MinMax scaler as following:

1rf.scalers.MinMax((-1, 1)).fit_transform(t, axis=0)
array([-1. , -0.8, -0.6, -0.4, -0.2,  0. ,  0.2,  0.4,  0.6,  0.8,  1. ])

Important

For a vector of values, either vertical or horizontal, axis=0 is required and is the default behavior. For more information, check the documentation

The same applies if we reshape this vector to a vertical vector using t.reshape(-1, 1) but now we try to scale between 0 and 1:

1rf.scalers.MinMax((0, 1)).fit_transform(t.reshape(-1, 1), axis=0)
array([[0. ],
       [0.1],
       [0.2],
       [0.3],
       [0.4],
       [0.5],
       [0.6],
       [0.7],
       [0.8],
       [0.9],
       [1. ]])

Let’s say now we have a two-dimensional array with two features for time (t) and space (x) as following:

1t = np.arange(0, 101, 10).reshape(-1, 1)
2x = np.arange(100, 1101, 100).reshape(-1, 1)
3a = np.concatenate([t, x], axis=1)
4a
array([[   0,  100],
       [  10,  200],
       [  20,  300],
       [  30,  400],
       [  40,  500],
       [  50,  600],
       [  60,  700],
       [  70,  800],
       [  80,  900],
       [  90, 1000],
       [ 100, 1100]])

Now, we can scale each feature individually between 0 and 1 by setting axis=0 as following:

1rf.scalers.MinMax((0, 1)).fit_transform(a, axis=0)
array([[0. , 0. ],
       [0.1, 0.1],
       [0.2, 0.2],
       [0.3, 0.3],
       [0.4, 0.4],
       [0.5, 0.5],
       [0.6, 0.6],
       [0.7, 0.7],
       [0.8, 0.8],
       [0.9, 0.9],
       [1. , 1. ]])

To be able to transfrom (i.e. scale) and then inverse_transform (i.e. descale), a scaler must be created:

1scaler = rf.scalers.MinMax((0, 1))
2a_scaled = scaler.fit_transform(a, axis=0)

Now, we can use the same scaler to inverse transform the original scale:

1a_descaled = scaler.inverse_transform(a_scaled)
2a_descaled
array([[   0.,  100.],
       [  10.,  200.],
       [  20.,  300.],
       [  30.,  400.],
       [  40.,  500.],
       [  50.,  600.],
       [  60.,  700.],
       [  70.,  800.],
       [  80.,  900.],
       [  90., 1000.],
       [ 100., 1100.]])

Note

Similarly, the simulation run values use transform and inverse_transform, to transform values using scalers module.

Now, we try to see how these scalers can be using for a Model class in reservoirflow.

Build a model#

A reservoir simulation model requires two objects: Grid and Fluid. The function below create_model() starts by creating these objects which are used to initiate a Model object using BlackOil class.

 1def create_model():
 2    # Grid:
 3    grid = rf.grids.RegularCartesian(
 4        nx=3,
 5        ny=1,
 6        nz=1,
 7        dx=300,
 8        dy=350,
 9        dz=20,
10        phi=0.27,
11        kx=1,
12        ky=1,  # not needed because flow direction is only x
13        kz=0.1,  # not needed because flow direction is only x
14        comp=1 * 10**-6,
15        dtype="double",
16    )
17    # Fluid:
18    fluid = rf.fluids.SinglePhase(
19        mu=0.5, B=1, rho=50, comp=1 * 10**-5, dtype="double"
20    )
21    # Model:
22    model = rf.models.BlackOil(
23        grid, fluid, pi=6000, dt=5, start_date="10.10.2018", dtype="double"
24    )
25    # Production well at the first cell:
26    model.set_well(cell_id=1, q=-300, pwf=100, s=1.5, r=3.5)
27
28    # Injection well at the last cell:
29    model.set_well(cell_id=3, q=100, s=0, r=3.5)
30
31    return model
32
33
34model = create_model()

Compile the model#

Before you can run the model, you need to compile a solution for it. By compiling a solution, you actually decide the solution you want to use for your model. Interestingly, reservoirflow provides multiple solutions for the same model based on your configuration.

Hint

Compiling solutions is the most interesting idea introduced in reservoirflow which allows to solve the same model using different solutions so we can compare them with each other and/or combine them together.

Currently, a numerical solution based on Finite-Difference-Method (FDM) is available. There are two modes available under this solution which are vectorized and symbolized. In addition, you can also select a solver which can be direct, iterative, or neurical. Developing solvers based on neural-networks is also a new idea introduced by reservoirflow. You can read more about the available solution in the documentation.

Tip

Use vectorized mode for better performance especially when you have a large model. Use symbolized mode only to see how the equations of the linear equation system are built which might be very useful for small models. Use direct for a lower computing errors as long as iterative does not offer any additional performance boost. For now, neurical solvers remains one of our research topics.

Below, we compile our model using a numerical solution based on FDM using vectorized mode and direct solver.

1model.compile(stype="numerical", method="FDM")
[info] FDM was assigned as model.solution.

Simulation Run#

To perform the simulation run, method model.run() can be used. The code below performs a simulation run for nsteps=10 (i.e. number of steps) and using isolver=cgs:

1model.run(nsteps=10, isolver="cgs")
[info] Simulation run started: 10 timesteps.
[info] Simulation run of 10 steps finished in 0.05 seconds.
[info] Material Balance Error: 1.2878587085651816e-14.

Show results as pandas data frame#

The results of the Simulation run can be accessed as a pandas DateFrame using get_df() method. You can select columns and scale values. For more information, check the documentation. Using this function with the default settings:

1model.get_df(
2    columns=["time", "cells", "wells"],
3    scale=False,
4    units=True,
5)
Time [days] Q1 [stb/day] Q3 [stb/day] P1 [psia] P2 [psia] P3 [psia] Qw1 [stb/day] Qw3 [stb/day] Pwf1 [psia] Pwf3 [psia]
Step
0 0 0.000000 0.0 6000.000000 6000.000000 6000.000000 0.000000 0.0 6000.0 6000.000000
1 5 -210.261660 100.0 5221.608107 5931.302823 6350.798192 -210.261660 100.0 100.0 8257.049147
2 10 -187.135552 100.0 4658.296354 5836.844343 6616.368821 -187.135552 100.0 100.0 8522.619775
3 15 -170.190252 100.0 4245.538359 5737.879854 6812.162975 -170.190252 100.0 100.0 8718.413930
4 20 -157.628510 100.0 3939.556180 5644.106976 6952.530464 -157.628510 100.0 100.0 8858.781419
5 25 -148.213227 100.0 3710.216276 5559.252778 7049.748448 -148.213227 100.0 100.0 8955.999403
6 30 -141.073776 100.0 3536.311663 5484.032614 7113.998988 -141.073776 100.0 100.0 9020.249943
7 35 -135.598989 100.0 3402.955378 5417.869131 7153.286692 -135.598989 100.0 100.0 9059.537647
8 40 -131.349563 100.0 3299.446752 5359.648972 7173.910200 -131.349563 100.0 100.0 9080.161155
9 45 -128.008139 100.0 3218.055478 5308.155398 7180.729618 -128.008139 100.0 100.0 9086.980573
10 50 -125.343714 100.0 3153.154729 5262.255139 7177.457846 -125.343714 100.0 100.0 9083.708800

Tip

using units=True is not the default behavior but using this setting is useful to keep tracking the units used in the model.

As can be seen above, values are as expected with the corresponding units. However, it is normally required to scale some of these values (e.g. between -1 and 1 or 0 and 1) for different purposes such as comparing the solution with the analytical solution or using the simulation output to train a neural network.

When scale=True is used then values are scaled based on the settings defined in model.scalers_dict which is defined as following:

1model.scalers_dict
{'time': ['MinMax', (0, 1)],
 'space': ['MinMax', (-1, 1)],
 'pressure': ['MinMax', (-1, 1)],
 'rate': [None, None]}

In this case values are scaled as following:

  • 'time': scaled between 0 and 1 using MinMax scaler.

  • 'space': scaled between -1 and 1 using MinMax scaler.

  • 'pressure': scaled between -1 and 1 using MinMax scaler.

  • 'rate': not scaled.

Therefore, using scale=True gives:

1model.get_df(
2    columns=["time", "cells", "wells"],
3    scale=True,
4    units=True,
5)
Time [scaled] Q1 [stb/day] Q3 [stb/day] P1 [scaled] P2 [scaled] P3 [scaled] Qw1 [stb/day] Qw3 [stb/day] Pwf1 [scaled] Pwf3 [scaled]
Step
0 0.0 0.000000 0.0 0.413677 0.413677 0.413677 0.000000 0.0 0.413677 0.413677
1 0.1 -210.261660 100.0 0.027146 0.379564 0.587875 -210.261660 100.0 -2.516126 1.534475
2 0.2 -187.135552 100.0 -0.252582 0.332658 0.719752 -187.135552 100.0 -2.516126 1.666351
3 0.3 -170.190252 100.0 -0.457548 0.283514 0.816978 -170.190252 100.0 -2.516126 1.763578
4 0.4 -157.628510 100.0 -0.609491 0.236949 0.886682 -157.628510 100.0 -2.516126 1.833281
5 0.5 -148.213227 100.0 -0.723376 0.194812 0.934958 -148.213227 100.0 -2.516126 1.881558
6 0.6 -141.073776 100.0 -0.809733 0.157460 0.966863 -141.073776 100.0 -2.516126 1.913463
7 0.7 -135.598989 100.0 -0.875955 0.124604 0.986372 -135.598989 100.0 -2.516126 1.932972
8 0.8 -131.349563 100.0 -0.927355 0.095694 0.996614 -131.349563 100.0 -2.516126 1.943214
9 0.9 -128.008139 100.0 -0.967772 0.070123 1.000000 -128.008139 100.0 -2.516126 1.946600
10 1.0 -125.343714 100.0 -1.000000 0.047330 0.998375 -125.343714 100.0 -2.516126 1.944975

Note

Since rates are not scaled, they are still shown using the original units. On the other hand, pressure values are scaled between -1 and 1 only for cells while values of the bottom-hole following pressure (pwf) are affected by this scaling with values beyond the limits. The reason of such behavior is that normally pwf is not considered to fit the scaler since these values are not part of the PDE solution both analytical or machine learning.

Custom Scalers#

You can define your scaling using set_scalers() which accepts a dictionary. For more information, check the documentation.

The example below defines a time scaling between -10 an 10, while also no scaling for rate:

1model.set_scalers({"time": ["minmax", (-10, 10)], "rate": ["MinMax", None]})
2model.scalers_dict
{'time': ['MinMax', (-10, 10)],
 'space': ['MinMax', (-1, 1)],
 'pressure': ['MinMax', (-1, 1)],
 'rate': [None, None]}

Note

Since there was no range defined for MinMax in case of rate, the scaling for this dimension was set to None. Note also that other dimensions retained the default settings. Therefore, the same function can be used to update a single scaler.

1model.get_df(
2    columns=["time", "cells", "wells"],
3    scale=True,
4    units=True,
5)
Time [scaled] Q1 [stb/day] Q3 [stb/day] P1 [scaled] P2 [scaled] P3 [scaled] Qw1 [stb/day] Qw3 [stb/day] Pwf1 [scaled] Pwf3 [scaled]
Step
0 -10.0 0.000000 0.0 0.413677 0.413677 0.413677 0.000000 0.0 0.413677 0.413677
1 -8.0 -210.261660 100.0 0.027146 0.379564 0.587875 -210.261660 100.0 -2.516126 1.534475
2 -6.0 -187.135552 100.0 -0.252582 0.332658 0.719752 -187.135552 100.0 -2.516126 1.666351
3 -4.0 -170.190252 100.0 -0.457548 0.283514 0.816978 -170.190252 100.0 -2.516126 1.763578
4 -2.0 -157.628510 100.0 -0.609491 0.236949 0.886682 -157.628510 100.0 -2.516126 1.833281
5 0.0 -148.213227 100.0 -0.723376 0.194812 0.934958 -148.213227 100.0 -2.516126 1.881558
6 2.0 -141.073776 100.0 -0.809733 0.157460 0.966863 -141.073776 100.0 -2.516126 1.913463
7 4.0 -135.598989 100.0 -0.875955 0.124604 0.986372 -135.598989 100.0 -2.516126 1.932972
8 6.0 -131.349563 100.0 -0.927355 0.095694 0.996614 -131.349563 100.0 -2.516126 1.943214
9 8.0 -128.008139 100.0 -0.967772 0.070123 1.000000 -128.008139 100.0 -2.516126 1.946600
10 10.0 -125.343714 100.0 -1.000000 0.047330 0.998375 -125.343714 100.0 -2.516126 1.944975

Tip

Each time you use model.get_df(), scalers are updated automatically to adapt for any new simulation run steps added to the model. To update these scalers manually, use model.update_scalers() function.

Comments πŸ’¬#

Feel free to make a comment, ask a question, or share your opinion about this specific content. Please keep in mind the Commenting Guidelines βš–.