Modelling and Propagation of Legacy Petrophysical Data for Mining Exploration (1/3)

Exploratory Data Analysis and Data Integration

Barcelona 25/09/24 GEO3BCN Manuel David Soto, Juan Alcalde, Adrià Hernàndez-Pineda, Ramón Carbonel

import os
import dotenv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
from plotly.subplots import make_subplots
# PEL - Filtering
from scipy.signal import butter, filtfilt

# Custom Libraries (_aux)
from _aux import basic_stat
from _aux import geo

# sphinx_gallery_thumbnail_number = 19

Introduction

The dispersion and scarcity of petrophysical data are well-known challenges in the mining sector. These issues are primarily driven by economic factors, but also by geological (such as sedimentary cover, weathering, erosion, or the depth of targets), geotechnical (e.g., slope or borehole stability), and technical limitations that have been addressed in other industries.

To tackle the challenges associated with sparse and incomplete petrophysical data, we developed three Jupyter Notebooks that make use of open-source Python tools. These tools assist researchers in the exploration, visualization, and integration of data from diverse sources, ultimately enabling the complete modeling (including ML-based models) of missing properties. The tools were applied to both recently acquired and legacy data from two holes in Collinstown, Ireland.

This first notebook primarily focuses on the Exploratory Data Analysis (EDA) of a new petrophysical dataset resulting from measurements on two keyholes:

  • TC 1319 008

  • TC 3660 005

Three other datasets were integrated into the petrophysical dataset:

  • New stratigraphic logging of the same keyholes, providing information on formations, photos, and observations.

  • Legacy data including whole rock geochemistry, XRF, magnetic susceptibility, and GR log.

  • Vp data from the Passive Seismic Survey.

Variables

The new petrophysical data were collected using various devices and methodologies described in the “New petrophysical data collected at site 2” document, and saved in collinstown_petrophys.csv. This file contains 17 columns and 329 rows. The following variables (3 objects/text, 8 numeric) are primarily for reference or location:

Name

New Name

Explanation

Unit

HoleID

Hole

Hole identification

Easting

X

Easting coordinate

m

Northing

Y

Northing coordinate

m

RL

RL

Azimuth

Azi

Hole azimuth

degree

Dip

Dip

Hole deviation

degree

SampleNo

Sample

Sample number

From

From

Top of the sample

m

To

To

Base of the sample

m

Length

Len

Length

cm

Observations

Obs

Observations on the sample

The following numerical variables are potential candidates for modeling:

Name

New Name

Explanation

Unit

From

From

Top of the sample

m

Density

Den

Density

g/cm³

Vp

Vp

Compressional velocity

m/s

Vs

Vs

Shear velocity

m/s

Xm

Mag

Magnetic susceptibility

Mx

Ip

Chargeability

mv/V

R

Res

Resistivity

ohm·m

Formation Major

Form

Major formations along hole

Libraries

The following are the Python libraries used throughout this notebook:

  • PSL: Python Standard Libraries

  • _aux: User Defined Libraries

  • PEL: Python External Libraries

Data Loading

Ten ‘blank’ entries in the original file were deleted and saved in a new file, collinstown_petrophys_no_blanc.csv. In this way, cells with ‘blank’ are now set to NaN in their corresponding positions.

# Load data
dotenv.load_dotenv()
base_path = os.getenv("PATH_TO_COLLINSTOWN_PETRO")
blanc_xlsx = f'{base_path}/collinstown_petrophys_no_blanc.xlsx'
df = pd.read_excel(blanc_xlsx)
df.head()
Python-dotenv could not parse statement starting at line 16
HoleID Easting Northing RL Azimuth Dip SampleNo From(m) To(m) lenght(cm) Density(g/cm3) Vp(m/s) Vs(m/s) Xm(e-3) Mx(mv/V) R(ohm·m) Observations Formation
0 TC-1319-005 252715.0 271403.0 70.0 30 -85 TC18397 265.77 265.88 11.2 2.610 2179.0 NaN 0.108 NaN NaN Broken sample parallel to z-axis prevented to ... CALP
1 TC-1319-005 252715.0 271403.0 70.0 30 -85 TC18396 271.66 271.77 11.8 4.318 5514.0 2222.0 0.108 26.06 4600.0 NaN CALP
2 TC-1319-005 252715.0 271403.0 70.0 30 -85 TC18395 277.05 277.16 10.3 3.399 5754.0 2085.0 0.237 17.52 10200.0 NaN CALP
3 TC-1319-005 252715.0 271403.0 70.0 30 -85 TC18394 282.55 282.66 11.2 4.342 5773.0 2011.0 0.151 28.13 5800.0 NaN CALP
4 TC-1319-005 252715.0 271403.0 70.0 30 -85 TC18393 287.20 287.31 10.8 3.293 6034.0 2122.0 0.176 6.88 15500.0 NaN CALP


Columns in the file

df.columns
Index(['HoleID', 'Easting', 'Northing', 'RL', 'Azimuth', 'Dip', 'SampleNo',
       'From(m)', 'To(m)', 'lenght(cm)', 'Density(g/cm3)', 'Vp(m/s)',
       'Vs(m/s)', 'Xm(e-3)', 'Mx(mv/V)', 'R(ohm·m)', 'Observations',
       'Formation'],
      dtype='object')

Rename columns for better readability

df = df.rename(
    columns={
            'HoleID'        : 'Hole',
            'Easting'       : 'X',
            'Northing'      : 'Y',
            'Azimuth'       : 'Azi',
            'SampleNo'      : 'Sample',
            'From(m)'       : 'From',
            'To(m)'         : 'To',
            'lenght(cm)'    : 'Len',
            'Density(g/cm3)': 'Den',
            'Vp(m/s)'       : 'Vp',
            'Vs(m/s)'       : 'Vs',
            'Xm(e-3)'       : 'Mag',
            'Mx(mv/V)'      : 'Ip',
            'R(ohm·m)'     : 'Res',
            'Observations'  : 'Obs',
            'Formation'     : 'Form'
    }
)
df.head()
Hole X Y RL Azi Dip Sample From To Len Den Vp Vs Mag Ip Res Obs Form
0 TC-1319-005 252715.0 271403.0 70.0 30 -85 TC18397 265.77 265.88 11.2 2.610 2179.0 NaN 0.108 NaN NaN Broken sample parallel to z-axis prevented to ... CALP
1 TC-1319-005 252715.0 271403.0 70.0 30 -85 TC18396 271.66 271.77 11.8 4.318 5514.0 2222.0 0.108 26.06 4600.0 NaN CALP
2 TC-1319-005 252715.0 271403.0 70.0 30 -85 TC18395 277.05 277.16 10.3 3.399 5754.0 2085.0 0.237 17.52 10200.0 NaN CALP
3 TC-1319-005 252715.0 271403.0 70.0 30 -85 TC18394 282.55 282.66 11.2 4.342 5773.0 2011.0 0.151 28.13 5800.0 NaN CALP
4 TC-1319-005 252715.0 271403.0 70.0 30 -85 TC18393 287.20 287.31 10.8 3.293 6034.0 2122.0 0.176 6.88 15500.0 NaN CALP


Columns in the file

df.columns
Index(['Hole', 'X', 'Y', 'RL', 'Azi', 'Dip', 'Sample', 'From', 'To', 'Len',
       'Den', 'Vp', 'Vs', 'Mag', 'Ip', 'Res', 'Obs', 'Form'],
      dtype='object')
df.describe()
X Y RL Azi Dip From To Len Den Vp Vs Mag Ip Res
count 329.000000 329.000000 329.000000 329.000000 329.0 329.000000 329.000000 321.000000 327.000000 310.000000 289.000000 326.000000 303.000000 303.000000
mean 258529.740426 271636.412766 83.480851 33.574468 -85.0 543.279696 543.095030 10.223364 2.761664 5441.990323 2772.937716 0.430543 49.738548 3283.762376
std 5700.981826 228.846318 13.217114 3.504538 0.0 262.864370 263.430806 1.070127 0.293938 700.858075 653.446129 5.192919 25.145208 3714.851096
min 252715.000000 271403.000000 70.000000 30.000000 -85.0 9.600000 9.700000 7.500000 2.420000 992.000000 0.000000 0.001000 5.390000 156.900000
25% 252715.000000 271403.000000 70.000000 30.000000 -85.0 355.580000 355.680000 9.700000 2.678500 5196.000000 2294.000000 0.064000 31.555000 1100.000000
50% 264102.200000 271860.100000 96.400000 37.000000 -85.0 547.800000 547.950000 10.000000 2.706000 5575.000000 3041.000000 0.114500 47.040000 2100.000000
75% 264102.200000 271860.100000 96.400000 37.000000 -85.0 750.000000 750.100000 10.400000 2.731000 5857.000000 3248.000000 0.205250 67.190000 3850.000000
max 264102.200000 271860.100000 96.400000 37.000000 -85.0 1049.200000 1049.300000 18.000000 5.074000 6478.000000 3740.000000 93.886000 171.730000 23900.000000


The keyholes are

df.Hole.unique()
array(['TC-1319-005', 'TC-3660-008'], dtype=object)

Data General Description

Among the variables in the dataset we have 4 texts (object) and 14 numeric variables (int64 and float64). The general behavior of the numerical variables can be seen in the following tables and graphs. Stand out the important anomaly in Mag, as well as the numerous missing values (NaN) in Vp, Vs, Ip, and Res:

Type of data in the dataset

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 329 entries, 0 to 328
Data columns (total 18 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   Hole    329 non-null    object
 1   X       329 non-null    float64
 2   Y       329 non-null    float64
 3   RL      329 non-null    float64
 4   Azi     329 non-null    int64
 5   Dip     329 non-null    int64
 6   Sample  329 non-null    object
 7   From    329 non-null    float64
 8   To      329 non-null    float64
 9   Len     321 non-null    float64
 10  Den     327 non-null    float64
 11  Vp      310 non-null    float64
 12  Vs      289 non-null    float64
 13  Mag     326 non-null    float64
 14  Ip      303 non-null    float64
 15  Res     303 non-null    float64
 16  Obs     73 non-null     object
 17  Form    329 non-null    object
dtypes: float64(12), int64(2), object(4)
memory usage: 46.4+ KB

NaN in Len

print('NaNs in Len:', 329 - 321)
NaNs in Len: 8

Main statistical parameters of the variables

df.describe()
X Y RL Azi Dip From To Len Den Vp Vs Mag Ip Res
count 329.000000 329.000000 329.000000 329.000000 329.0 329.000000 329.000000 321.000000 327.000000 310.000000 289.000000 326.000000 303.000000 303.000000
mean 258529.740426 271636.412766 83.480851 33.574468 -85.0 543.279696 543.095030 10.223364 2.761664 5441.990323 2772.937716 0.430543 49.738548 3283.762376
std 5700.981826 228.846318 13.217114 3.504538 0.0 262.864370 263.430806 1.070127 0.293938 700.858075 653.446129 5.192919 25.145208 3714.851096
min 252715.000000 271403.000000 70.000000 30.000000 -85.0 9.600000 9.700000 7.500000 2.420000 992.000000 0.000000 0.001000 5.390000 156.900000
25% 252715.000000 271403.000000 70.000000 30.000000 -85.0 355.580000 355.680000 9.700000 2.678500 5196.000000 2294.000000 0.064000 31.555000 1100.000000
50% 264102.200000 271860.100000 96.400000 37.000000 -85.0 547.800000 547.950000 10.000000 2.706000 5575.000000 3041.000000 0.114500 47.040000 2100.000000
75% 264102.200000 271860.100000 96.400000 37.000000 -85.0 750.000000 750.100000 10.400000 2.731000 5857.000000 3248.000000 0.205250 67.190000 3850.000000
max 264102.200000 271860.100000 96.400000 37.000000 -85.0 1049.200000 1049.300000 18.000000 5.074000 6478.000000 3740.000000 93.886000 171.730000 23900.000000


All numerical variables in line plots, the x axis is just the position in the file.

df.plot(figsize=(17, 14), subplots=True)
plt.tight_layout()
# plt.savefig('Output/lines.png')
plt.show()
eda integration delivery 1

Null Values (NaN) in the Variables

As can be seen in the previous cells and plots, most of the NaN are in Vp, Vs, Ip, and Re. Obs, a text variable, has also a lot (77.8%) of NaNs.

% of NaN in all variables

print('% of NaN in all variables:')
df.isna().sum()*100/len(df)
% of NaN in all variables:

Hole       0.000000
X          0.000000
Y          0.000000
RL         0.000000
Azi        0.000000
Dip        0.000000
Sample     0.000000
From       0.000000
To         0.000000
Len        2.431611
Den        0.607903
Vp         5.775076
Vs        12.158055
Mag        0.911854
Ip         7.902736
Res        7.902736
Obs       77.811550
Form       0.000000
dtype: float64

Plot of the number of NaN in the numerical variables

df.select_dtypes(include=['number']).isna().sum().plot.bar(figsize=(7, 4), ylabel='Count', edgecolor='k', color='skyblue', title='All Variables NaN', zorder=2)
plt.grid(zorder=1)
plt.show()
All Variables NaN

Reference Variables

Not all numerical variables are suitable for modeling. In addition to the text variables (Hole, Sample, Obs, Form), X, Y, RL, Azi, Dip, From, To, and Len are reference or location variables. Among these, Len represents the length of the cores used for measurements, with an average length of 10.2 cm.

# Plot core length histogram
df.Len.plot.hist(figsize=(6, 5), subplots=True, bins=40, edgecolor='k', color='skyblue', zorder=2)
plt.grid(zorder=1)
plt.xlabel('Core Length (cm)')
plt.show()
eda integration delivery 1

Core length influence on density

plt.figure(figsize=(11, 8))

plt.subplot(221)
plt.scatter(df.Len[df.Hole == 'TC-1319-005'], df.Den[df.Hole == 'TC-1319-005'], edgecolor='k', color='skyblue', zorder=2)
plt.grid(zorder=1)
plt.xlabel('Core Length (cm)')
plt.ylabel('Density (g/cm³)')
plt.axis([8, 16, 2, 5.5])
plt.title('Hole TC-1319-005')

plt.subplot(222)
plt.scatter(df.Len[df.Hole == 'TC-3660-008'], df.Den[df.Hole == 'TC-3660-008'], edgecolor='k', color='skyblue', zorder=2)
plt.grid(zorder=1)
plt.xlabel('Core Length (cm)')
plt.ylabel('Density (g/cm³)')
plt.title('Hole TC-3660-008')
plt.axis([8, 16, 2, 5.5])

plt.subplot(223)
plt.scatter(df.Len[df.Hole == 'TC-1319-005'], df.From[df.Hole == 'TC-1319-005'], edgecolor='k', color='skyblue', zorder=2)
plt.grid(zorder=1)
plt.xlabel('Core Length (cm)')
plt.ylabel('Depth (m)')
plt.axis([8, 16, 1100, 0])
plt.title('Hole TC-1319-005')

plt.subplot(224)
plt.scatter(df.Len[df.Hole == 'TC-3660-008'], df.From[df.Hole == 'TC-3660-008'], edgecolor='k', color='skyblue', zorder=2)
plt.grid(zorder=1)
plt.xlabel('Core Length (cm)')
plt.ylabel('Depth (m)')
plt.title('Hole TC-3660-008')
plt.axis([8, 16, 1100, 0])

plt.tight_layout()
plt.show()
Hole TC-1319-005, Hole TC-3660-008, Hole TC-1319-005, Hole TC-3660-008

Modelable Variables or Features

In the following section, we will review, through the main statistical parameters and plots, each of the features or variables suitable for modeling. The variable From, the sample’s top depth, could have a double meaning, as a reference and feature. Len is not for modeling, it is important because allows the assessment of the quality of the density.

features = df[['Hole', 'From', 'Len', 'Den', 'Vp', 'Vs', 'Mag', 'Ip', 'Res', 'Form']]
features
Hole From Len Den Vp Vs Mag Ip Res Form
0 TC-1319-005 265.77 11.2 2.610 2179.0 NaN 0.108 NaN NaN CALP
1 TC-1319-005 271.66 11.8 4.318 5514.0 2222.0 0.108 26.06 4600.0 CALP
2 TC-1319-005 277.05 10.3 3.399 5754.0 2085.0 0.237 17.52 10200.0 CALP
3 TC-1319-005 282.55 11.2 4.342 5773.0 2011.0 0.151 28.13 5800.0 CALP
4 TC-1319-005 287.20 10.8 3.293 6034.0 2122.0 0.176 6.88 15500.0 CALP
... ... ... ... ... ... ... ... ... ... ...
324 TC-3660-008 814.55 9.6 2.763 5680.0 3179.0 0.158 70.94 481.1 LIS
325 TC-3660-008 819.92 NaN 2.649 NaN NaN 0.024 NaN NaN LIS
326 TC-3660-008 825.60 9.9 2.762 5593.0 1787.0 0.337 78.44 334.6 PAL
327 TC-3660-008 828.80 9.9 2.804 5858.0 1964.0 0.311 73.73 466.5 PAL
328 TC-3660-008 833.40 10.1 2.735 5805.0 3217.0 0.308 33.86 2300.0 PAL

329 rows × 10 columns



features.describe()
From Len Den Vp Vs Mag Ip Res
count 329.000000 321.000000 327.000000 310.000000 289.000000 326.000000 303.000000 303.000000
mean 543.279696 10.223364 2.761664 5441.990323 2772.937716 0.430543 49.738548 3283.762376
std 262.864370 1.070127 0.293938 700.858075 653.446129 5.192919 25.145208 3714.851096
min 9.600000 7.500000 2.420000 992.000000 0.000000 0.001000 5.390000 156.900000
25% 355.580000 9.700000 2.678500 5196.000000 2294.000000 0.064000 31.555000 1100.000000
50% 547.800000 10.000000 2.706000 5575.000000 3041.000000 0.114500 47.040000 2100.000000
75% 750.000000 10.400000 2.731000 5857.000000 3248.000000 0.205250 67.190000 3850.000000
max 1049.200000 18.000000 5.074000 6478.000000 3740.000000 93.886000 171.730000 23900.000000


Boxplot of each feature

features.plot.box(subplots=True, grid=False, figsize=(12, 7), layout=(3, 4), flierprops={"marker": "."})
plt.tight_layout()
# plt.savefig('Output/histos.png')
plt.show()
eda integration delivery 1
features.to_csv('Output/features.csv')

Density (Den)

The distribution of the density variable has a wide tail to the right (positive skew), with anomalous values beyond +3 standard deviations. There is no mention, in the observations of the petrophysical data or in the stratigraphic logging of the hole, about the reason for these anomalous values, however, as it was seen previouly, the influence of long length (> 11.2 cm) in these high densities is clear.

Density statistical parameters by _aux

basic_stat.parameters(features.Den, 'Den')
---------------------------------------

Main statistical parameters of Den

NaN: 2
NaN(%): 0.608

Lenght: 329
Min: 2.42
Max: 5.074
Range: 2.654

Mean: 2.762
Median: 2.706
Mode: ModeResult(mode=np.float64(2.703), count=np.int64(9))

Mean < median: right tail
Skew: 4.618
Fisher's kurtosis: 24.334
Pearson's kurtosis: 27.334
Variance: 0.086
Standart deviation: 0.294
Mean - 3std: 1.880
Mean + 3std: 3.643

Density plots by _aux

basic_stat.plots(features.Den, 'Den')
Den Statistical Plots

8 observations with anomalous values in the density

features[features.Den > 4].sort_values(by='From')
Hole From Len Den Vp Vs Mag Ip Res Form
1 TC-1319-005 271.66 11.8 4.318 5514.0 2222.0 0.108 26.06 4600.0 CALP
3 TC-1319-005 282.55 11.2 4.342 5773.0 2011.0 0.151 28.13 5800.0 CALP
6 TC-1319-005 301.50 11.6 4.251 5686.0 2755.0 0.132 14.77 6000.0 CALP
7 TC-1319-005 313.65 11.4 4.193 5588.0 2039.0 0.068 20.37 5400.0 CALP
11 TC-1319-005 334.65 11.3 4.079 5678.0 2198.0 0.036 34.51 2000.0 CALP
12 TC-1319-005 340.00 11.9 4.444 5561.0 2581.0 0.024 14.92 4800.0 CALP
13 TC-1319-005 344.77 11.2 4.123 5359.0 NaN 0.041 45.93 1000.0 CALP
24 TC-1319-005 400.10 13.9 5.074 5367.0 2422.0 0.066 NaN NaN CALP


Observations related to density

df[df.Den > 4].Obs
1                                             NaN
3                                             NaN
6                                             NaN
7                                             NaN
11                                            NaN
12                                            NaN
13                              Vs signal to low.
24    Sample too long to acquire IP measurements.
Name: Obs, dtype: object
den_anomalous_index = list(features[features.Den > 4].index)

print('Hole             Vp:        Observation:')
for n in den_anomalous_index:
    print(df.Hole[n], '    ', df.Den[n], '    ', df.Obs[n])
Hole             Vp:        Observation:
TC-1319-005      4.318      nan
TC-1319-005      4.342      nan
TC-1319-005      4.251      nan
TC-1319-005      4.193      nan
TC-1319-005      4.079      nan
TC-1319-005      4.444      nan
TC-1319-005      4.123      Vs signal to low.
TC-1319-005      5.074      Sample too long to acquire IP measurements.

Anomalies in Hole TC-1319-005

Apart from the fact that the anomalies are only in the TC-1319-005, there is almost no information related to the density anomalies. The stratigraphic logging mentioned abundant pyrite in hole TC-1319-005, around 1046 m, but this is far from the location of the anomalies in the petrophysical data (271.7 - 400.1 m).

Compressional Velocity (Vp)

The distribution of the compressional velocity has a wide tail to the left (negative skew), with anomalous values below -3 standard deviations. The observations on these anomalous samples point toward open fractures on the core as responsible for the low Vp values.

Below are two reference values of Vp, at 20ºC (source: https://www.engineeringtoolbox.com/sound-speed-water-d_598.html):

Medium

Vp (m/s)

Fresh Water

1480

Air

343

Vp statistical parameters

basic_stat.parameters(features.Vp, 'Vp')
---------------------------------------

Main statistical parameters of Vp

NaN: 19
NaN(%): 5.775

Lenght: 329
Min: 992.0
Max: 6478.0
Range: 5486.0

Mean: 5441.990
Median: 5575.0
Mode: ModeResult(mode=np.float64(5575.0), count=np.int64(8))

Mean < median: left tail
Skew: -2.427
Fisher's kurtosis: 9.138
Pearson's kurtosis: 12.138
Variance: 491202.042
Standart deviation: 700.858
Mean - 3std: 3339.416
Mean + 3std: 7544.565

Vp plots

basic_stat.plots(features.Vp, 'Vp')
Vp Statistical Plots

5 obsevations with anomalous values in the Vp

features[features.Vp < 3000]
Hole From Len Den Vp Vs Mag Ip Res Form
0 TC-1319-005 265.77 11.2 2.610 2179.0 NaN 0.108 NaN NaN CALP
77 TC-1319-005 645.30 11.0 2.725 992.0 NaN 0.308 98.11 293.3 LABL
79 TC-1319-005 653.95 12.0 2.936 2321.0 1929.0 0.342 88.45 443.4 LABL
159 TC-1319-005 1046.35 10.6 2.455 2325.0 NaN 0.158 44.44 1300.0 LIS
174 TC-3660-008 64.54 9.4 2.677 2802.0 0.0 0.038 35.19 2500.0 CALP


Observations related to Vp

vp_anomalous_index = list(features[features.Vp < 3000].index)

print('Hole             Vp:        Observation:')
for n in vp_anomalous_index:
    print(df.Hole[n], '    ', df.Vp[n], '    ', df.Obs[n])
Hole             Vp:        Observation:
TC-1319-005      2179.0      Broken sample parallel to z-axis prevented to take IP and Vs measurements.
TC-1319-005      992.0      Broken sample parallel to z-axis prevented to take Vp and Vs measurements.
TC-1319-005      2321.0      nan
TC-1319-005      2325.0      The sample had an unflat base. Vs signal to low.
TC-3660-008      2802.0      Sample had a broken disk. Full length 10,4cm

The mayority of Vp anomalies are concentrate in the hole TC-1319-005 and they appear to be related with the geometry and fractures in the core sample. The stratigraphic logging has not direct mentions of the low Vp.

Shear Velocity (Vs)

The distribution of the shear velocity has an irregular tail to the left (negative skew), with anomalous values below -3 standard deviations. As well as in the case of Vp, the observations on these anomalous values point toward open fractures on the core as responsible for the low Vs. Values of zero are not admissible in the case of solid samples, especially considering the densities in these anomalies (mean of 2.72). To improve subsequent models, these anomalous values of Vs should be replaced by NaN and then imputed (replaced by a logical value).

Below are two reference values of Vs (source: Reynolds, J (2011) An Introduction to Applied and Environmental Geophysics):

Medium

Vs (m/s)

Unconsolidated sands

65-105

Plastic clays

80-130

Vs statistical parameters

basic_stat.parameters(features.Vs, 'Vs')
---------------------------------------

Main statistical parameters of Vs

NaN: 40
NaN(%): 12.158

Lenght: 329
Min: 0.0
Max: 3740.0
Range: nan

Mean: 2772.938
Median: 3041.0
Mode: ModeResult(mode=np.float64(3333.0), count=np.int64(6))

Mean < median: left tail
Skew: -1.540
Fisher's kurtosis: 3.346
Pearson's kurtosis: 6.346
Variance: 426991.843
Standart deviation: 653.446
Mean - 3std: 812.599
Mean + 3std: 4733.276

Vp statistical parameterss plots

basic_stat.plots(features.Vs, 'Vs')
Vs Statistical Plots

5 obsevations with anomalous values of Vs

features[features.Vs < 1000].sort_values(by='From')
Hole From Len Den Vp Vs Mag Ip Res Form
174 TC-3660-008 64.54 9.4 2.677 2802.0 0.0 0.038 35.19 2500.0 CALP
198 TC-3660-008 188.03 9.4 2.679 5345.0 0.0 0.098 NaN NaN CALP
243 TC-3660-008 432.95 9.7 2.779 4641.0 178.0 NaN 69.69 515.7 FLT
248 TC-3660-008 452.30 10.4 2.686 5652.0 0.0 0.053 27.54 4000.0 FLT
312 TC-3660-008 774.21 9.6 2.787 6173.0 0.0 0.123 73.08 373.2 SP


Average density of the anomalous values of Vs

features[features.Vs < 1000].Den.mean()
np.float64(2.7216)

Observations related to Vs

vs_anomalous_index = list(features[features.Vs < 1000].index)

print('Hole             Vs:        Observation:')
for n in vs_anomalous_index:
    print(df.Hole[n], '    ', df.Vs[n], '    ', df.Obs[n])
Hole             Vs:        Observation:
TC-3660-008      0.0      Sample had a broken disk. Full length 10,4cm
TC-3660-008      0.0      Sample had unperpendicular bases respect from the z-axis that prevented to take IP measurements.
TC-3660-008      178.0      Chipped base
TC-3660-008      0.0      Chipped base
TC-3660-008      0.0      nan

Observations on Anomalies

Cllst_logging_details.xls has no mentions related to the zero Vs values. Other observations are:

Index

Hole

Reference Depth (m)

Explanation

175

TC-3660-008

64

198 & 311 | TC-3660-008 | 188 & 774

Diagenetic pyrite & pyrite in mudstones

210 & 248 | TC-3660-008 | 432 & 452

Fault zone

Once again, the majority of Vs anomalies are concentrated in the hole TC-1319-005 and they also appear to be related to fractures or the geometry of the core sample. The stratigraphic logging has no direct mentions of the low Vs.

Magnetic Susceptibility (Mag)

The magnetic susceptibility has an extremely anomalous value (93.89), which is 218 times higher than the average of the rest of the values (0.43). This outlier significantly skews the statistical parameters and visualizations, as it lies beyond 10 standard deviations above the mean. Removing this value reveals that the distribution has a tail to the right (positive skew), with some values exceeding 3 standard deviations.

Regarding the anomalous value, it exceeds the maximum value of 2 delivered by the KT-20 detector device, with a 10 kHz single-frequency circular sensor (https://terraplus.ca/wp-content/uploads/terraplus-Brochures-English/KT-20-Physical-Property-Measuring-System.pdf). Unfortunately, there are no specific observations that explain the reason for this extreme value.

Below are reference values of magnetic susceptibility for different rocks (source: https://www.eoas.ubc.ca/courses/eosc350/content/foundations/properties/magsuscept.htm#:~:text=In%20rocks%2C%20susceptibility%20is%20mainly,trace%20amounts%20in%20most%20sediments.):

Rock

Magnetic Susceptibility x 10^-3 (SI)

Limestones

0 - 3

Sandstones

0 - 20

Shales

0.01 - 15

Gneiss

0.1 - 25

Granite

0 - 50

Basalt

0.2 - 175

Magnetite

1200 - 19200

Mag statistical parameters

basic_stat.parameters(features.Mag, 'Mag')
---------------------------------------

Main statistical parameters of Mag

NaN: 3
NaN(%): 0.912

Lenght: 329
Min: 0.001
Max: 93.886
Range: 93.88499999999999

Mean: 0.431
Median: 0.1145
Mode: ModeResult(mode=np.float64(0.063), count=np.int64(7))

Mean < median: right tail
Skew: 17.962
Fisher's kurtosis: 320.760
Pearson's kurtosis: 323.760
Variance: 26.966
Standart deviation: 5.193
Mean - 3std: -15.148
Mean + 3std: 16.009

Mag plots

basic_stat.plots(features.Mag, 'Mag')
Mag Statistical Plots

The maximum is 200 times bigger than the mean value

print('Index of the maximun:', features.Mag.idxmax())
print('Maximun value:', features.Mag[305])
Index of the maximun: 305
Maximun value: 93.886
features.loc[305]
Hole    TC-3660-008
From          734.8
Len             9.5
Den           2.708
Vp           5026.0
Vs           2914.0
Mag          93.886
Ip            71.63
Res          1200.0
Form             SP
Name: 305, dtype: object

Mag < 10 statistical parameters

basic_stat.parameters(features.Mag[features.Mag < 10], 'Mag')
---------------------------------------

Main statistical parameters of Mag

NaN: 0
NaN(%): 0.000

Lenght: 325
Min: 0.001
Max: 0.545
Range: 0.544

Mean: 0.143
Median: 0.114
Mode: ModeResult(mode=np.float64(0.063), count=np.int64(7))

Mean < median: right tail
Skew: 1.050
Fisher's kurtosis: 0.638
Pearson's kurtosis: 3.638
Variance: 0.010
Standart deviation: 0.101
Mean - 3std: -0.159
Mean + 3std: 0.445

Mag < 10 plots

basic_stat.plots(features.Mag[features.Mag < 10], 'Mag')
Mag Statistical Plots

Mag anomalous value

df[df.Mag > 10]
Hole X Y RL Azi Dip Sample From To Len Den Vp Vs Mag Ip Res Obs Form
305 TC-3660-008 264102.2 271860.1 96.4 37 -85 TC19095 734.8 734.9 9.5 2.708 5026.0 2914.0 93.886 71.63 1200.0 NaN SP


The stratigraphic column mentions “pyrite in mudstones” in the interval corresponding to this sample but nothing related to tha Mag.

Induced Polarization (Ip)

Induced polarization or chargeability is the “capacity of a material to retain charges after a forcing current is removed.” It “depends upon many factors, including mineral type, grain size, the ratio of internal surface area to volume, the properties of electrolytes in pore space, and the physics of interaction between surfaces and fluids. Interpretation of chargeability models is further complicated by the fact that there is no standard set of units for this physical property.”(source: https://gpg.geosci.xyz/content/induced_polarization/induced_polarization_introduction.html). Ip is similar to conductivity (the inverse of resistivity) but not exactly the same, while the first is related with the capacitance of the material (retain electrical charges localized), the second is the ability of a material to allow the flow of electricity.

In our case, the measures of Ip have a tail to the right (positive skew), with only a value (171.73) above 3 standard deviations. There not observations associated with this value.

Ip statistical parameters

basic_stat.parameters(features.Ip, 'Ip')
---------------------------------------

Main statistical parameters of Ip

NaN: 26
NaN(%): 7.903

Lenght: 329
Min: 5.39
Max: 171.73
Range: nan

Mean: 49.739
Median: 47.04
Mode: ModeResult(mode=np.float64(7.27), count=np.int64(2))

Mean < median: right tail
Skew: 0.648
Fisher's kurtosis: 0.964
Pearson's kurtosis: 3.964
Variance: 632.281
Standart deviation: 25.145
Mean - 3std: -25.697
Mean + 3std: 125.174

Ip plots

basic_stat.plots(features.Ip, 'Ip')
Ip Statistical Plots

Ip anomalous value

df[df.Ip > 150]
Hole X Y RL Azi Dip Sample From To Len Den Vp Vs Mag Ip Res Obs Form
256 TC-3660-008 264102.2 271860.1 96.4 37 -85 TC19046 490.7 490.8 10.5 2.723 5412.0 3292.0 0.066 171.73 808.6 NaN S


No information about this anomaly in the petrophysical or in the stratigraphic data.

Resistivity (Res)

Although resistivity is a specific measure of pure materials, this property, like Ip, depends in the case of rocks on the constituent minerals and the fluids in their pore space. The measurements in our case show a typical pattern with tail to the right (positive skew) and with several values above 3 standard deviation. There are no observations associated with these high values.

Res statistical parameters

basic_stat.parameters(df.Res, 'Res')
---------------------------------------

Main statistical parameters of Res

NaN: 26
NaN(%): 7.903

Lenght: 329
Min: 156.9
Max: 23900.0
Range: nan

Mean: 3283.762
Median: 2100.0
Mode: ModeResult(mode=np.float64(1500.0), count=np.int64(12))

Mean < median: right tail
Skew: 2.729
Fisher's kurtosis: 8.783
Pearson's kurtosis: 11.783
Variance: 13800118.663
Standart deviation: 3714.851
Mean - 3std: -7860.791
Mean + 3std: 14428.316

Res plots

basic_stat.plots(df.Res, 'Res')
Res Statistical Plots
df[df.Res > 14000].sort_values(by='From')
Hole X Y RL Azi Dip Sample From To Len Den Vp Vs Mag Ip Res Obs Form
161 TC-3660-008 264102.2 271860.1 96.4 37 -85 TC18951 9.60 9.70 9.9 2.690 6226.0 3426.0 0.545 12.19 18200.0 NaN CALP
162 TC-3660-008 264102.2 271860.1 96.4 37 -85 TC18952 12.86 12.95 9.5 2.702 6376.0 3467.0 0.435 11.26 23000.0 NaN CALP
173 TC-3660-008 264102.2 271860.1 96.4 37 -85 TC18963 59.80 59.90 9.2 2.705 5974.0 3321.0 0.020 7.27 20900.0 NaN CALP
176 TC-3660-008 264102.2 271860.1 96.4 37 -85 TC18966 75.57 75.67 10.4 2.708 6341.0 3478.0 0.051 9.42 23900.0 NaN CALP
179 TC-3660-008 264102.2 271860.1 96.4 37 -85 TC18969 88.90 89.00 8.8 2.696 6331.0 3712.0 0.116 7.50 16600.0 NaN CALP
190 TC-3660-008 264102.2 271860.1 96.4 37 -85 TC18980 146.20 146.30 9.5 2.680 6376.0 3156.0 0.108 8.08 15900.0 NaN CALP
199 TC-3660-008 264102.2 271860.1 96.4 37 -85 TC18989 193.40 193.50 9.9 2.707 6429.0 3333.0 0.154 5.39 14800.0 NaN CALP
4 TC-1319-005 252715.0 271403.0 70.0 30 -85 TC18393 287.20 287.31 10.8 3.293 6034.0 2122.0 0.176 6.88 15500.0 NaN CALP


Stratigraphic Data Observations ———————-_——– Only the stratigraphic data has observations of pyrite in hole TC-1319-005, but at shallower depths, around 146 & 193 m.

Features Correlation

To appreciate the relation between the features, the anomaly of the Mag was excluded. No other elimination or imputation (fill with numbers the NaNs) was performed, therefore the quality of the correlations may change after the imputation.

# Pair Plot of All Features Against Each Other
mat = features[features.Mag < 10].select_dtypes(include=['number'])
sns.pairplot(mat)
plt.show()
eda integration delivery 1

Correlation Matrix

correl_mat = mat.corr()
correl_mat
From Len Den Vp Vs Mag Ip Res
From 1.000000 -0.058138 -0.210686 -0.288455 -0.200839 0.264689 0.424983 -0.531734
Len -0.058138 1.000000 0.462965 -0.116166 -0.204523 -0.093206 -0.012119 -0.072091
Den -0.210686 0.462965 1.000000 0.046488 -0.153618 -0.102067 -0.157620 0.089023
Vp -0.288455 -0.116166 0.046488 1.000000 0.349661 -0.032834 -0.269868 0.393633
Vs -0.200839 -0.204523 -0.153618 0.349661 1.000000 0.003852 -0.088828 0.267800
Mag 0.264689 -0.093206 -0.102067 -0.032834 0.003852 1.000000 0.260463 -0.042086
Ip 0.424983 -0.012119 -0.157620 -0.269868 -0.088828 0.260463 1.000000 -0.656705
Res -0.531734 -0.072091 0.089023 0.393633 0.267800 -0.042086 -0.656705 1.000000


Correlation Heatmap

sns.heatmap(correl_mat, vmin=-1, vmax=1, center=0, linewidths=.1, cmap='seismic_r', annot=True)
plt.show()
eda integration delivery 1

Additional Data Insights

The main role of the additional datasets has been to help us understand the real nature of some anomalies observed in the petrophysical dataset, such as the density anomalies present in the upper half of the TC-1319-005 keyhole.

# Triple Plot by Major Formation
# Fix colors for the formations
import plotly.express as px
import plotly.io as pio

colors = px.colors.qualitative.Plotly

# Define color mapping for each formation
color_map = {form: colors[i % len(colors)] for i, form in enumerate(df['Form'].unique())}

# Set plot renderer and headless environment setup
pio.renderers.default = 'png'
pio.orca.config.use_xvfb = True

# 1x3 subplots
fig = make_subplots(
    rows=1, cols=3,
    subplot_titles=('Vp vs Den', 'Vp vs Vs', 'Vp/Vs vs Den')
)

# Plotting by formations
rat = df['Vp'] / df['Vs']

for form in df['Form'].unique():
    df_form = df[df['Form'] == form]
    rat_form = rat[df['Form'] == form]

    trace1 = go.Scatter(
        x=df_form['Vp'], y=df_form['Den'], mode='markers', name=form,
        legendgroup=form, marker=dict(color=color_map[form]),
        hovertext=df_form['Hole'],
        hoverinfo='x+y+text'
    )

    trace2 = go.Scatter(
        x=df_form['Vp'], y=df_form['Vs'], mode='markers', name=form,
        legendgroup=form, showlegend=False, marker=dict(color=color_map[form]),
        hovertext=df_form['Hole'],
        hoverinfo='x+y+text'
    )

    trace3 = go.Scatter(
        x=rat_form, y=df_form['Den'], mode='markers', name=form,
        legendgroup=form, showlegend=False, marker=dict(color=color_map[form]),
        hovertext=df_form['Hole'],
        hoverinfo='x+y+text'
    )

    fig.add_trace(trace1, row=1, col=1)
    fig.add_trace(trace2, row=1, col=2)
    fig.add_trace(trace3, row=1, col=3)

# Adjusting title and figure size
fig.update_layout(
    height=800,  # Figure height
    width=1600,  # Figure width
    title_text="Plots by Major Formations",
    showlegend=True,
    legend=dict(x=1.02, y=1)  # Adjust legend position
)


Legacy Data Analysis

Legacy data played a crucial role in evaluating the density anomalies present in the keyhole TC-1319-005.

# Whole Rock Lithogeochemistry Data Load
whole = pd.read_csv(
    filepath_or_buffer=f'{base_path}/collinstown_wholerock_lithogeochemistry.csv',
    sep=';'
)

Renaming Columns and Filtering Data

whole = whole.rename(columns={'HOLEID'     : 'Hole', 'SAMPFROM': 'From', 'Zn_ppm_BEST': 'Zn',
                              'Pb_ppm_BEST': 'Pb', 'Ba_ppm_BEST': 'Ba', 'Fe_pct_BEST': 'Fe_pct',
                              'S_pct_BEST' : 'S_pct'})

# Convert % to ppm for Fe and S
whole['Fe'] = whole['Fe_pct'] * 10000
whole['S'] = whole['S_pct'] * 10000

# Drop unnecessary columns
whole = whole.drop(['Fe_pct', 'S_pct'], axis=1)

# Sort filtered data by "From"
whole5 = whole[whole.Hole == 'TC-1319-005'].sort_values('From')

Whole Rock Lithogeochemistry Plots

plt.figure(figsize=(15, 8))

# Plotting key elements
elem_list = ['Zn', 'Pb', 'Fe', 'Ba', 'S']

for element in elem_list:
    plt.subplot(1, 5, (elem_list.index(element) + 1))
    plt.plot(whole5[element], whole5.From)
    plt.axvline(x=np.mean(whole5[element]), ls='--', c='r', label='mean')
    plt.axhspan(272, 400, color='r', alpha=0.2, label='AIPD')
    plt.ylim(1100, -20)
    if elem_list.index(element) == 0:
        plt.ylabel('Depth (m)')
        plt.legend()
    plt.xlabel(f'{element} (ppm)')
    plt.grid()

plt.suptitle('Hole TC-1319-005 Legacy Whole Rock Lithogeochemistry\n AIPD: Anomalous Interval in the Petrophysical Data', y=1)
plt.tight_layout()
plt.show()
Hole TC-1319-005 Legacy Whole Rock Lithogeochemistry  AIPD: Anomalous Interval in the Petrophysical Data

Legacy XRF data load

xrf = pd.read_csv(f'{base_path}/collinstown_pXRF.csv')
xrf.head()
SAMPLEID HOLEID PROJECTCODE SAMPFROM SAMPTO SAMPLETYPE FormationMaj_dr FormationMin_dr LithologyMaj_dr LithologyMin_dr pXRF_Date pXRF_ElapsedTime1 pXRF_ElapsedTime2 pXRF_ElapsedTime3 pXRF_ElapsedTotalTim pXRF_InstrumentSN pXRF_LiveTime1 pXRF_LiveTime2 pXRF_LiveTime3 pXRF_LiveTotalTime pXRF_Sampler Reading X_Drillhole_dr Y_Drillhole_dr Z_Drillhole_dr Ag_pXRF_pct Al_pXRF_pct As_pXRF_pct Bi_pXRF_pct Ca_pXRF_pct Cd_pXRF_pct Co_pXRF_pct Cr_pXRF_pct Cu_pXRF_pct Fe_pXRF_pct Hg_pXRF_pct K_pXRF_pct LE_pXRF_pct Mg_pXRF_pct Mn_pXRF_pct Mo_pXRF_pct Nb_pXRF_pct Ni_pXRF_pct P_pXRF_pct Pb_pXRF_pct Rb_pXRF_pct S_pXRF_pct Sb_pXRF_pct Se_pXRF_pct Si_pXRF_pct Sn_pXRF_pct Sr_pXRF_pct Th_pXRF_pct Ti_pXRF_pct U_pXRF_pct V_pXRF_pct W_pXRF_pct Y_pXRF_pct Zn_pXRF_pct Zr_pXRF_pct PL_dr Sample_Dolomite1_Percent_dr SampleComments_Big Texture1_dr Texture2_dr Texture3_dr
0 CR-1_767mEOH_pXRF_0006 CR-1_767mEOH Collinstown 5.99 6.01 pXRF CALP BU P M 42849 39.01 29.41 NaN 68.42 512468 31.81 15.93 NaN 47.74 NaN NaN 263187.0 272286.0 106.0 0 0.28 0.0003 0.0 33.82 0.0 0 0.0000 0.0010 0.2254 0.0 0.0000 55.24 0.00 0.0105 0.0000 0.0000 0.0000 0.0 0.0007 0.0002 0.0383 0.0 0.0002 10.1076 0.0 0.0527 0.0000 0.2010 0.0 0.0169 0.000 0.0004 0.0029 0.0027 3660 NaN NaN aw bu bi
1 CR-1_767mEOH_pXRF_0009 CR-1_767mEOH Collinstown 8.99 9.01 pXRF CALP BU P M 42849 38.95 29.52 NaN 68.47 512468 31.50 17.93 NaN 49.43 NaN NaN 263187.0 272286.0 106.0 0 0.98 0.0005 0.0 23.83 0.0 0 0.0050 0.0016 1.5387 0.0 0.1115 52.61 1.66 0.0145 0.0000 0.0006 0.0028 0.0 0.0006 0.0019 4.3153 0.0 0.0002 14.7500 0.0 0.0410 0.0009 0.1183 0.0 0.0157 0.000 0.0019 0.0034 0.0025 3660 NaN NaN aw bu bi
2 CR-1_767mEOH_pXRF_0012 CR-1_767mEOH Collinstown 11.99 12.01 pXRF CALP BU P M 42849 39.08 29.27 NaN 68.35 512468 32.01 13.86 NaN 45.86 NaN NaN 263187.0 272286.0 106.0 0 0.00 0.0000 0.0 40.88 0.0 0 0.0051 0.0010 0.0458 0.0 0.0000 55.92 0.00 0.0093 0.0000 0.0000 0.0000 0.0 0.0006 0.0000 0.8947 0.0 0.0003 2.1103 0.0 0.0553 0.0011 0.0624 0.0 0.0117 0.001 0.0004 0.0009 0.0010 3660 NaN NaN aw cht NaN
3 CR-1_767mEOH_pXRF_0015 CR-1_767mEOH Collinstown 14.99 15.01 pXRF CALP BU P M 42849 39.09 29.37 NaN 68.46 512468 32.19 15.37 NaN 47.56 NaN NaN 263187.0 272286.0 106.0 0 0.00 0.0004 0.0 37.94 0.0 0 0.0000 0.0000 0.1165 0.0 0.0000 57.55 0.00 0.0116 0.0005 0.0000 0.0017 0.0 0.0012 0.0000 0.0209 0.0 0.0000 4.1532 0.0 0.0914 0.0023 0.1063 0.0 0.0000 0.000 0.0005 0.0016 0.0016 3660 NaN NaN aw bu bi
4 CR-1_767mEOH_pXRF_0018 CR-1_767mEOH Collinstown 17.99 18.01 pXRF CALP BU P NaN 42849 39.11 29.44 NaN 68.55 512468 32.40 16.54 NaN 48.95 NaN NaN 263187.0 272286.0 106.0 0 0.23 0.0000 0.0 35.63 0.0 0 0.0000 0.0009 0.1996 0.0 0.0000 55.31 0.00 0.0130 0.0000 0.0000 0.0000 0.0 0.0007 0.0002 0.0000 0.0 0.0002 8.4221 0.0 0.0864 0.0013 0.0886 0.0 0.0131 0.000 0.0003 0.0022 0.0019 3660 NaN NaN cht aw bu


xrf columns

xrf.columns
Index(['SAMPLEID', 'HOLEID', 'PROJECTCODE', 'SAMPFROM', 'SAMPTO', 'SAMPLETYPE',
       'FormationMaj_dr', 'FormationMin_dr', 'LithologyMaj_dr',
       'LithologyMin_dr', 'pXRF_Date', 'pXRF_ElapsedTime1',
       'pXRF_ElapsedTime2', 'pXRF_ElapsedTime3', 'pXRF_ElapsedTotalTim',
       'pXRF_InstrumentSN', 'pXRF_LiveTime1', 'pXRF_LiveTime2',
       'pXRF_LiveTime3', 'pXRF_LiveTotalTime', 'pXRF_Sampler', 'Reading',
       'X_Drillhole_dr', 'Y_Drillhole_dr', 'Z_Drillhole_dr', 'Ag_pXRF_pct',
       'Al_pXRF_pct', 'As_pXRF_pct', 'Bi_pXRF_pct', 'Ca_pXRF_pct',
       'Cd_pXRF_pct', 'Co_pXRF_pct', 'Cr_pXRF_pct', 'Cu_pXRF_pct',
       'Fe_pXRF_pct', 'Hg_pXRF_pct', 'K_pXRF_pct', 'LE_pXRF_pct',
       'Mg_pXRF_pct', 'Mn_pXRF_pct', 'Mo_pXRF_pct', 'Nb_pXRF_pct',
       'Ni_pXRF_pct', 'P_pXRF_pct', 'Pb_pXRF_pct', 'Rb_pXRF_pct', 'S_pXRF_pct',
       'Sb_pXRF_pct', 'Se_pXRF_pct', 'Si_pXRF_pct', 'Sn_pXRF_pct',
       'Sr_pXRF_pct', 'Th_pXRF_pct', 'Ti_pXRF_pct', 'U_pXRF_pct', 'V_pXRF_pct',
       'W_pXRF_pct', 'Y_pXRF_pct', 'Zn_pXRF_pct', 'Zr_pXRF_pct', 'PL_dr',
       'Sample_Dolomite1_Percent_dr', 'SampleComments_Big', 'Texture1_dr',
       'Texture2_dr', 'Texture3_dr'],
      dtype='object')

Main columns in xrf df

xrf = xrf[['HOLEID', 'SAMPFROM', 'Fe_pXRF_pct', 'Zn_pXRF_pct', 'Pb_pXRF_pct', 'S_pXRF_pct']]
xrf.head()
HOLEID SAMPFROM Fe_pXRF_pct Zn_pXRF_pct Pb_pXRF_pct S_pXRF_pct
0 CR-1_767mEOH 5.99 0.2254 0.0029 0.0007 0.0383
1 CR-1_767mEOH 8.99 1.5387 0.0034 0.0006 4.3153
2 CR-1_767mEOH 11.99 0.0458 0.0009 0.0006 0.8947
3 CR-1_767mEOH 14.99 0.1165 0.0016 0.0012 0.0209
4 CR-1_767mEOH 17.99 0.1996 0.0022 0.0007 0.0000


Conversion of % to ppm

columns_to_multiply = ['Fe_pXRF_pct', 'Zn_pXRF_pct', 'Pb_pXRF_pct', 'S_pXRF_pct']
xrf[columns_to_multiply] = xrf[columns_to_multiply] * 10000
xrf.head()
HOLEID SAMPFROM Fe_pXRF_pct Zn_pXRF_pct Pb_pXRF_pct S_pXRF_pct
0 CR-1_767mEOH 5.99 2254.0 29.0 7.0 383.0
1 CR-1_767mEOH 8.99 15387.0 34.0 6.0 43153.0
2 CR-1_767mEOH 11.99 458.0 9.0 6.0 8947.0
3 CR-1_767mEOH 14.99 1165.0 16.0 12.0 209.0
4 CR-1_767mEOH 17.99 1996.0 22.0 7.0 0.0


Rename the xrf columns

xrf = xrf.rename(columns={'HOLEID': 'Hole', 'SAMPFROM': 'From', 'Fe_pXRF_pct': 'Fe', 'Zn_pXRF_pct': 'Zn', 'Pb_pXRF_pct': 'Pb', 'S_pXRF_pct': 'S'})
xrf.head()
Hole From Fe Zn Pb S
0 CR-1_767mEOH 5.99 2254.0 29.0 7.0 383.0
1 CR-1_767mEOH 8.99 15387.0 34.0 6.0 43153.0
2 CR-1_767mEOH 11.99 458.0 9.0 6.0 8947.0
3 CR-1_767mEOH 14.99 1165.0 16.0 12.0 209.0
4 CR-1_767mEOH 17.99 1996.0 22.0 7.0 0.0


Sort by From

xrf5 = xrf[xrf.Hole == 'TC-1319-005'].sort_values('From')
xrf5.head()
Hole From Fe Zn Pb S
601 TC-1319-005 3.99 9369.0 9.0 0.0 2435.0
602 TC-1319-005 6.99 5587.0 25.0 0.0 1339.0
603 TC-1319-005 9.99 41980.0 51.0 0.0 16457.0
604 TC-1319-005 12.99 24620.0 104.0 22.0 13974.0
605 TC-1319-005 15.99 30758.0 67.0 26.0 10547.0


Upper portion of xrf5

xrf5[xrf5.From <= 400].describe()
From Fe Zn Pb S
count 133.000000 133.000000 133.000000 133.000000 133.000000
mean 201.990000 12152.646617 79.812030 2.037594 4699.947368
std 115.613581 9226.369163 105.763899 4.904244 6196.147283
min 3.990000 1159.000000 7.000000 0.000000 0.000000
25% 102.990000 4603.000000 22.000000 0.000000 1425.000000
50% 201.990000 9701.000000 43.000000 0.000000 2668.000000
75% 300.990000 16337.000000 80.000000 0.000000 6037.000000
max 399.990000 47995.000000 594.000000 27.000000 52145.000000


Elements in xrf5

elem_list2 = ['Zn', 'Pb', 'Fe', 'S']

XRF plots

plt.figure(figsize=(15, 8))

for element in elem_list2:
    plt.subplot(1, 4, (elem_list2.index(element) + 1))
    plt.plot(xrf5[element], xrf5.From)
    plt.axvline(x=np.mean(xrf5[element]), ls='--', c='r', label='mean')
    # plt.scatter(xrf5[element][200], xrf5.From[200], label='DPAL', c='k', zorder=3)
    plt.axhspan(272, 400, color='r', alpha=0.2, label='AIPD')
    plt.ylim(1100, -20)
    if elem_list.index(element) == 0:
        plt.ylabel('Depth (m)')
        plt.legend()
    plt.xlabel(f'{element} (ppm)')
    plt.grid()

plt.suptitle('Hole TC-1319-005 Legacy XRF\n AIPD: Anomalous Interval in the Petrophysical Data', y=1)
plt.tight_layout()
# plt.savefig('Output/xrf.png')
plt.show()
Hole TC-1319-005 Legacy XRF  AIPD: Anomalous Interval in the Petrophysical Data

Magnetic Susceptibility

This section compares the magnetic susceptibility in the legacy and the latest petrophysical data. In the case of hole TC-1319-005 we can see that the magnetic susceptibility in both measurements have very similar medians. The magnetic susceptibilities, new and legacy, of hole TC-3660-008 have the same medians, but it has an anomaly in the petrophysical data that is above the measurement limit of the KT-20 device (2000 x $10^3$ SI), therefore this anomalous value is going to be deleted. Load magnetic susceptibility data

mag = pd.read_excel(f'{base_path}/collinstown_MagSUS.xlsx')

Magnetic Susceptibility Analysis for Hole TC-1319-005

mag5 = mag[mag.HOLEID == 'TC-1319-005'].sort_values('SAMPFROM')
petro5 = df[df.Hole == 'TC-1319-005'].sort_values('From')

plt.figure(figsize=(10, 6))

plt.subplot(121)
plt.plot(mag5['MS_MagSus'], mag5['SAMPFROM'], label='Legacy')
plt.scatter(petro5['Mag'], petro5['From'], c='k', label='Petrophysic', s=8)
plt.ylim(1100, 0)
plt.legend(loc='lower right')
plt.xlabel('Mag Sus')
plt.ylabel('Depth (m)')
plt.grid()

plt.subplot(122)
plt.plot(np.log10(mag5['MS_MagSus']), mag5['SAMPFROM'], label='Legacy')
plt.axvline(x=np.median(np.log10(mag5['MS_MagSus'])), ls='--', label='Median Legacy')
plt.scatter(np.log10(petro5['Mag']), petro5['From'], c='k', label='Petrophysic', s=8)
plt.axvline(x=np.median(np.log10(petro5['Mag'])), ls='--', c='k', label='Median Petrophysic')

plt.ylim(1100, 0)
plt.legend(loc='lower right')
plt.xlabel('Log(Mag Sus)')
plt.ylabel('Depth (m)')
plt.grid()

plt.suptitle('TC-1319-005 Magnetic Susceptibility')
plt.tight_layout()
plt.show()
TC-1319-005 Magnetic Susceptibility
/home/leguark/.virtualenvs/gempy_dependencies/lib/python3.10/site-packages/pandas/core/arraylike.py:399: RuntimeWarning:

divide by zero encountered in log10

Magnetic Susceptibility Analysis for Hole TC-3660-008

mag8 = mag[mag.HOLEID == 'TC-3660-008'].sort_values('SAMPFROM')
petro8 = df[df.Hole == 'TC-3660-008'].sort_values('From')

plt.figure(figsize=(11, 7))

plt.subplot(121)
plt.plot(mag8['MS_MagSus'], mag8['SAMPFROM'], label='Legacy')
plt.scatter(petro8['Mag'], petro8['From'], c='k', label='Petrophysic', s=8)
plt.ylim(900, 0)
plt.xlabel('Mag Sus')
plt.ylabel('Depth (m)')
plt.legend(loc='lower right')
plt.grid()

plt.subplot(122)
plt.plot(np.log10(mag8['MS_MagSus']), mag8['SAMPFROM'], label='Legacy')
plt.axvline(x=np.median(np.log10(mag8['MS_MagSus'])), ls='--', label='Median Legacy')
plt.scatter(np.log10(petro8['Mag']), petro8['From'], c='k', label='Petrophysic', s=8)
plt.axvline(x=np.nanmedian(np.log10(petro8['Mag'])), ls='--', c='k', label='Median Petrophysic')

plt.ylim(900, 0)
plt.xlabel('Log(Mag Sus)')
plt.ylabel('Depth (m)')
plt.legend(loc='lower right')
plt.grid()

plt.suptitle('TC-3660-008 Magnetic Susceptibility')
plt.tight_layout()
plt.show()
TC-3660-008 Magnetic Susceptibility

Gamma Ray and Formation Analysis for Hole TC-3660-008

gr = pd.read_csv(f'{base_path}/collinstown_Gamma.csv')
gr8 = gr[gr.HOLEID == 'TC-3660-008'].sort_values('DEPTH')

# Depth of tops or markers for formations
tops = pd.DataFrame(petro8.From[petro8['Form'].ne(petro8['Form'].shift())])
tops['Top'] = petro8.Form[petro8['Form'].ne(petro8['Form'].shift())]
tops = tops.reset_index(drop=True)

# Colors of the formations
tops['color'] = pd.Series(['#CCDF9A', '#FF0000', '#CCDF9A', '#D9E9F0', '#FF0000', '#D9E9F0', '#FF0000', '#D9E9F0', '#FF0000', '#8CBAE2',
                           '#8A6E9F', '#ABA16C', '#EBDE98', '#806000', '#2A7C43', '#FF0000'])

# Add base of the deepest formation
new_row = pd.DataFrame([{'From': 833.4, 'Top': '', 'color': '#FF0000'}])
tops = pd.concat([tops, new_row], ignore_index=True)
tops = tops.rename(columns={'From': 'depth', 'Top': 'name'})

tops_list = geo.plot_tops(tops)

gr8 = gr8.sort_values('DEPTH')
# Plot of GR with Formations

plt.figure(figsize=(10, 8))

plt.subplot(121)
plt.plot(gr8.iloc[:,6], gr8.iloc[:,5], label='Data', c='lightblue')
# plt.plot(gr8.iloc[:,6].rolling(window=250).mean(), gr8.iloc[:,5], label='Mean 400')

# Butterworth filter
b, a = butter(N=2, Wn=0.02, btype='low')
filtered_data = filtfilt(b, a, gr8.iloc[:, 6])
plt.plot(filtered_data, gr8.iloc[:, 5], label='Butterworth Filtered')

plt.grid()
plt.xlabel('GR (API)')
plt.xlabel('Depth (m)')
plt.axis([0, 120, 850, 0])

plt.subplot(122)
for i in range(0, len(tops_list)):
    plt.axhspan(tops_list[i]['top'], tops_list[i]['base'], color=tops_list[i]['color'])
    plt.text(122, tops_list[i]['top'], tops_list[i]['name'], fontsize=9, va='center')
plt.axis([0, 120, 850, 0])
plt.xticks([])
plt.xlabel('Formations')

plt.suptitle('GR and Formations in Hole TC-3660-008', y=0.93)
plt.savefig('Output/gr_form.png')
plt.show()
GR and Formations in Hole TC-3660-008

Passive Vs

Parallel to the petrophysical measurements, a passive seismic survey was carried out between Collinstown and Kells. From this survey a Vs was obtained which can be compared with that recorded during the petrophysical measurements. Despite being measurements of a very different nature, the Vs of the key holes are in the range of values of the passive Vs. Refering to the trends, hole TC-1319-005, inside the survey, shows a contrary trend as the passive Vs. Paradoxically, Hole TC-3660-008, outside the survey, shows a trend that coincides with that of the passive Vs.

Load passive Vs data for hole TC-3660-008

pas = pd.read_csv(
    filepath_or_buffer=f'{base_path}/tc-3660-008_Vs.txt',
    sep=' ',
    skiprows=1,
    names=['depth_km', 'vs_km/s']
)

# Convert depth and velocity from km to m
pas[['depth_km', 'vs_km/s']] = pas[['depth_km', 'vs_km/s']] * 1000

# Rename columns for easier access
pas = pas.rename(columns={'depth_km': 'depth', 'vs_km/s': 'vs'})

Passive Vs Statistical Parameters

print(f"Basic statistics for Passive Vs:\n{pas['vs'].describe()}")
Basic statistics for Passive Vs:
count      20.000000
mean     3001.458200
std       219.007636
min      2595.905000
25%      2851.969250
50%      2937.861000
75%      3241.223000
max      3291.111000
Name: vs, dtype: float64

Passive Vs Plots

basic_stat.plots(pas.vs, 'Passive Vs')
Passive Vs Statistical Plots

Vs Comparison Plot

plt.figure(figsize=(5, 8))

# Vs at TC-1319-005
plt.plot(petro8.Vs[petro8.Vs >= 1000], petro8.From[petro8.Vs >= 1000], label='TC-1319-005', c='salmon')

# Apply Butterworth filter to TC-1319-005 Vs
b, a = butter(N=2, Wn=0.2, btype='low')
vs_butter = filtfilt(b, a, petro8.Vs[petro8.Vs >= 1000])
plt.plot(
    vs_butter,
    petro8.From[petro8.Vs >= 1000],
    label='Butterworth Filter - TC-1319-005',
    c='crimson'
)

# Vs at TC-3660-008
plt.plot(petro5.Vs[petro5.Vs >= 1000], petro5.From[petro5.Vs >= 1000], label='TC-3660-008', c='lightgreen')

# Apply Butterworth filter to TC-3660-008 Vs
vs_butter = filtfilt(b, a, petro5.Vs[petro5.Vs >= 1000])
plt.plot(
    vs_butter,
    petro5.From[petro5.Vs >= 1000],
    label='Butterworth Filter - TC-3660-008',
    c='green'
)

# Passive seismic Vs
plt.scatter(pas.vs, pas.depth, c='k', label='Passive Survey', s=8, zorder=2)

plt.axis([1000, 4000, 2000, 0])
plt.ylabel('Depth (m)')
plt.xlabel('Vs (m/s)')
plt.title('Vs Comparison')
plt.legend(loc='lower left')
plt.grid()
plt.savefig('Output/vs.png')
plt.show()
Vs Comparison

Observations on Anomalies

The following observations summarize the findings for each feature in the petrophysical dataset. They are based on the findings made during the exploratory data analysis (EDA) and the integration with additional data, which was crucial in assessing the real nature of the anomalies in the petrophysical features. These observations establish the steps to follow, which involve the elimination of some anomalies, while others are maintained.

Feature

Observations on anomalies

Den

The presence of abnormal densities requires checking all available data before discarding such values. Additional data, including stratigraphic columns, whole rock geochemistry, and XRF, indicate an absence of massive sulfides in the upper portion of TC-1319-005

Vp

Values below 3000 should be eliminated, as these are linked to fractures in the samples.

Vs

Values of 0 must be eliminated as they are not physically meaningful for solid samples. These values often coincide with anomalous Vp values, indicating fractures.

Mag

The anomaly of 93.9 should be deleted, as it exceeds the KT-20 measurement range.

Ip

No strong reasons to discard outliers.

Res

No strong reasons to discard outliers.

Next Steps for Data Cleaning

The next step is part of what is known as data mining, which involves cleaning the false anomalies and filling the gaps in the data. Specific steps will be:

  • Record the positions of the original NaNs.

  • Record the positions and values of the anomalies to be deleted.

  • Delete the anomalies.

  • Fill all NaNs, both original and those resulting from anomaly deletions, using different methods (linear, nonlinear models, imputations, ML models).

  • Compare the effectiveness of different gap-filling methods.

  • Use the best option to fill in gaps and deliver corrected petrophysical data for further investigation.

References

Total running time of the script: (0 minutes 9.015 seconds)

Gallery generated by Sphinx-Gallery