.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/05_Geo/eda_integration_delivery_1.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_05_Geo_eda_integration_delivery_1.py: Modelling and Propagation of Legacy Petrophysical Data for Mining Exploration (1/3) =================================================================================== **Exploratory Data Analysis and Data Integration** Barcelona 25/09/24 GEO3BCN Manuel David Soto, Juan Alcalde, Adrià Hernàndez-Pineda, Ramón Carbonel .. GENERATED FROM PYTHON SOURCE LINES 11-28 .. code-block:: Python import os import dotenv import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import plotly.graph_objects as go from plotly.subplots import make_subplots # PEL - Filtering from scipy.signal import butter, filtfilt # Custom Libraries (_aux) from _aux import basic_stat from _aux import geo # sphinx_gallery_thumbnail_number = 19 .. GENERATED FROM PYTHON SOURCE LINES 31-47 Introduction ------------ The dispersion and scarcity of petrophysical data are well-known challenges in the mining sector. These issues are primarily driven by economic factors, but also by geological (such as sedimentary cover, weathering, erosion, or the depth of targets), geotechnical (e.g., slope or borehole stability), and technical limitations that have been addressed in other industries. To tackle the challenges associated with sparse and incomplete petrophysical data, we developed three Jupyter Notebooks that make use of open-source Python tools. These tools assist researchers in the exploration, visualization, and integration of data from diverse sources, ultimately enabling the complete modeling (including ML-based models) of missing properties. The tools were applied to both recently acquired and legacy data from two holes in Collinstown, Ireland. This first notebook primarily focuses on the Exploratory Data Analysis (EDA) of a new petrophysical dataset resulting from measurements on two keyholes: - TC 1319 008 - TC 3660 005 Three other datasets were integrated into the petrophysical dataset: - New stratigraphic logging of the same keyholes, providing information on formations, photos, and observations. - Legacy data including whole rock geochemistry, XRF, magnetic susceptibility, and GR log. - Vp data from the Passive Seismic Survey. .. GENERATED FROM PYTHON SOURCE LINES 49-100 Variables --------- The new petrophysical data were collected using various devices and methodologies described in the "New petrophysical data collected at site 2" document, and saved in `collinstown_petrophys.csv`. This file contains 17 columns and 329 rows. The following variables (3 objects/text, 8 numeric) are primarily for reference or location: +---------------+-----------+-----------------------------------+---------+ | Name | New Name | Explanation | Unit | +===============+===========+===================================+=========+ | HoleID | Hole | Hole identification | - | +---------------+-----------+-----------------------------------+---------+ | Easting | X | Easting coordinate | m | +---------------+-----------+-----------------------------------+---------+ | Northing | Y | Northing coordinate | m | +---------------+-----------+-----------------------------------+---------+ | RL | RL | - | - | +---------------+-----------+-----------------------------------+---------+ | Azimuth | Azi | Hole azimuth | degree | +---------------+-----------+-----------------------------------+---------+ | Dip | Dip | Hole deviation | degree | +---------------+-----------+-----------------------------------+---------+ | SampleNo | Sample | Sample number | - | +---------------+-----------+-----------------------------------+---------+ | From | From | Top of the sample | m | +---------------+-----------+-----------------------------------+---------+ | To | To | Base of the sample | m | +---------------+-----------+-----------------------------------+---------+ | Length | Len | Length | cm | +---------------+-----------+-----------------------------------+---------+ | Observations | Obs | Observations on the sample | - | +---------------+-----------+-----------------------------------+---------+ The following numerical variables are potential candidates for modeling: +----------------+-----------+-----------------------------------+---------+ | Name | New Name | Explanation | Unit | +================+===========+===================================+=========+ | From | From | Top of the sample | m | +----------------+-----------+-----------------------------------+---------+ | Density | Den | Density | g/cm³ | +----------------+-----------+-----------------------------------+---------+ | Vp | Vp | Compressional velocity | m/s | +----------------+-----------+-----------------------------------+---------+ | Vs | Vs | Shear velocity | m/s | +----------------+-----------+-----------------------------------+---------+ | Xm | Mag | Magnetic susceptibility | - | +----------------+-----------+-----------------------------------+---------+ | Mx | Ip | Chargeability | mv/V | +----------------+-----------+-----------------------------------+---------+ | R | Res | Resistivity | ohm·m | +----------------+-----------+-----------------------------------+---------+ | Formation Major| Form | Major formations along hole | - | +----------------+-----------+-----------------------------------+---------+ .. GENERATED FROM PYTHON SOURCE LINES 103-112 Libraries --------- The following are the Python libraries used throughout this notebook: - **PSL**: Python Standard Libraries - **_aux**: User Defined Libraries - **PEL**: Python External Libraries .. GENERATED FROM PYTHON SOURCE LINES 115-118 Data Loading ------------ Ten 'blank' entries in the original file were deleted and saved in a new file, `collinstown_petrophys_no_blanc.csv`. In this way, cells with 'blank' are now set to NaN in their corresponding positions. .. GENERATED FROM PYTHON SOURCE LINES 118-126 .. code-block:: Python # Load data dotenv.load_dotenv() base_path = os.getenv("PATH_TO_COLLINSTOWN_PETRO") blanc_xlsx = f'{base_path}/collinstown_petrophys_no_blanc.xlsx' df = pd.read_excel(blanc_xlsx) df.head() .. rst-class:: sphx-glr-script-out .. code-block:: none Python-dotenv could not parse statement starting at line 16 .. raw:: html
HoleID Easting Northing RL Azimuth Dip SampleNo From(m) To(m) lenght(cm) Density(g/cm3) Vp(m/s) Vs(m/s) Xm(e-3) Mx(mv/V) R(ohm·m) Observations Formation
0 TC-1319-005 252715.0 271403.0 70.0 30 -85 TC18397 265.77 265.88 11.2 2.610 2179.0 NaN 0.108 NaN NaN Broken sample parallel to z-axis prevented to ... CALP
1 TC-1319-005 252715.0 271403.0 70.0 30 -85 TC18396 271.66 271.77 11.8 4.318 5514.0 2222.0 0.108 26.06 4600.0 NaN CALP
2 TC-1319-005 252715.0 271403.0 70.0 30 -85 TC18395 277.05 277.16 10.3 3.399 5754.0 2085.0 0.237 17.52 10200.0 NaN CALP
3 TC-1319-005 252715.0 271403.0 70.0 30 -85 TC18394 282.55 282.66 11.2 4.342 5773.0 2011.0 0.151 28.13 5800.0 NaN CALP
4 TC-1319-005 252715.0 271403.0 70.0 30 -85 TC18393 287.20 287.31 10.8 3.293 6034.0 2122.0 0.176 6.88 15500.0 NaN CALP


.. GENERATED FROM PYTHON SOURCE LINES 127-128 Columns in the file .. GENERATED FROM PYTHON SOURCE LINES 128-130 .. code-block:: Python df.columns .. rst-class:: sphx-glr-script-out .. code-block:: none Index(['HoleID', 'Easting', 'Northing', 'RL', 'Azimuth', 'Dip', 'SampleNo', 'From(m)', 'To(m)', 'lenght(cm)', 'Density(g/cm3)', 'Vp(m/s)', 'Vs(m/s)', 'Xm(e-3)', 'Mx(mv/V)', 'R(ohm·m)', 'Observations', 'Formation'], dtype='object') .. GENERATED FROM PYTHON SOURCE LINES 131-132 Rename columns for better readability .. GENERATED FROM PYTHON SOURCE LINES 132-155 .. code-block:: Python df = df.rename( columns={ 'HoleID' : 'Hole', 'Easting' : 'X', 'Northing' : 'Y', 'Azimuth' : 'Azi', 'SampleNo' : 'Sample', 'From(m)' : 'From', 'To(m)' : 'To', 'lenght(cm)' : 'Len', 'Density(g/cm3)': 'Den', 'Vp(m/s)' : 'Vp', 'Vs(m/s)' : 'Vs', 'Xm(e-3)' : 'Mag', 'Mx(mv/V)' : 'Ip', 'R(ohm·m)' : 'Res', 'Observations' : 'Obs', 'Formation' : 'Form' } ) df.head() .. raw:: html
Hole X Y RL Azi Dip Sample From To Len Den Vp Vs Mag Ip Res Obs Form
0 TC-1319-005 252715.0 271403.0 70.0 30 -85 TC18397 265.77 265.88 11.2 2.610 2179.0 NaN 0.108 NaN NaN Broken sample parallel to z-axis prevented to ... CALP
1 TC-1319-005 252715.0 271403.0 70.0 30 -85 TC18396 271.66 271.77 11.8 4.318 5514.0 2222.0 0.108 26.06 4600.0 NaN CALP
2 TC-1319-005 252715.0 271403.0 70.0 30 -85 TC18395 277.05 277.16 10.3 3.399 5754.0 2085.0 0.237 17.52 10200.0 NaN CALP
3 TC-1319-005 252715.0 271403.0 70.0 30 -85 TC18394 282.55 282.66 11.2 4.342 5773.0 2011.0 0.151 28.13 5800.0 NaN CALP
4 TC-1319-005 252715.0 271403.0 70.0 30 -85 TC18393 287.20 287.31 10.8 3.293 6034.0 2122.0 0.176 6.88 15500.0 NaN CALP


.. GENERATED FROM PYTHON SOURCE LINES 156-157 Columns in the file .. GENERATED FROM PYTHON SOURCE LINES 157-160 .. code-block:: Python df.columns .. rst-class:: sphx-glr-script-out .. code-block:: none Index(['Hole', 'X', 'Y', 'RL', 'Azi', 'Dip', 'Sample', 'From', 'To', 'Len', 'Den', 'Vp', 'Vs', 'Mag', 'Ip', 'Res', 'Obs', 'Form'], dtype='object') .. GENERATED FROM PYTHON SOURCE LINES 161-163 .. code-block:: Python df.describe() .. raw:: html
X Y RL Azi Dip From To Len Den Vp Vs Mag Ip Res
count 329.000000 329.000000 329.000000 329.000000 329.0 329.000000 329.000000 321.000000 327.000000 310.000000 289.000000 326.000000 303.000000 303.000000
mean 258529.740426 271636.412766 83.480851 33.574468 -85.0 543.279696 543.095030 10.223364 2.761664 5441.990323 2772.937716 0.430543 49.738548 3283.762376
std 5700.981826 228.846318 13.217114 3.504538 0.0 262.864370 263.430806 1.070127 0.293938 700.858075 653.446129 5.192919 25.145208 3714.851096
min 252715.000000 271403.000000 70.000000 30.000000 -85.0 9.600000 9.700000 7.500000 2.420000 992.000000 0.000000 0.001000 5.390000 156.900000
25% 252715.000000 271403.000000 70.000000 30.000000 -85.0 355.580000 355.680000 9.700000 2.678500 5196.000000 2294.000000 0.064000 31.555000 1100.000000
50% 264102.200000 271860.100000 96.400000 37.000000 -85.0 547.800000 547.950000 10.000000 2.706000 5575.000000 3041.000000 0.114500 47.040000 2100.000000
75% 264102.200000 271860.100000 96.400000 37.000000 -85.0 750.000000 750.100000 10.400000 2.731000 5857.000000 3248.000000 0.205250 67.190000 3850.000000
max 264102.200000 271860.100000 96.400000 37.000000 -85.0 1049.200000 1049.300000 18.000000 5.074000 6478.000000 3740.000000 93.886000 171.730000 23900.000000


.. GENERATED FROM PYTHON SOURCE LINES 164-165 The keyholes are .. GENERATED FROM PYTHON SOURCE LINES 165-168 .. code-block:: Python df.Hole.unique() .. rst-class:: sphx-glr-script-out .. code-block:: none array(['TC-1319-005', 'TC-3660-008'], dtype=object) .. GENERATED FROM PYTHON SOURCE LINES 169-172 Data General Description ------------------------ Among the variables in the dataset we have 4 texts (object) and 14 numeric variables (int64 and float64). The general behavior of the numerical variables can be seen in the following tables and graphs. Stand out the important anomaly in Mag, as well as the numerous missing values (NaN) in Vp, Vs, Ip, and Res: .. GENERATED FROM PYTHON SOURCE LINES 174-175 Type of data in the dataset .. GENERATED FROM PYTHON SOURCE LINES 175-178 .. code-block:: Python df.info() .. rst-class:: sphx-glr-script-out .. code-block:: none RangeIndex: 329 entries, 0 to 328 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Hole 329 non-null object 1 X 329 non-null float64 2 Y 329 non-null float64 3 RL 329 non-null float64 4 Azi 329 non-null int64 5 Dip 329 non-null int64 6 Sample 329 non-null object 7 From 329 non-null float64 8 To 329 non-null float64 9 Len 321 non-null float64 10 Den 327 non-null float64 11 Vp 310 non-null float64 12 Vs 289 non-null float64 13 Mag 326 non-null float64 14 Ip 303 non-null float64 15 Res 303 non-null float64 16 Obs 73 non-null object 17 Form 329 non-null object dtypes: float64(12), int64(2), object(4) memory usage: 46.4+ KB .. GENERATED FROM PYTHON SOURCE LINES 179-180 NaN in Len .. GENERATED FROM PYTHON SOURCE LINES 180-183 .. code-block:: Python print('NaNs in Len:', 329 - 321) .. rst-class:: sphx-glr-script-out .. code-block:: none NaNs in Len: 8 .. GENERATED FROM PYTHON SOURCE LINES 184-185 Main statistical parameters of the variables .. GENERATED FROM PYTHON SOURCE LINES 185-188 .. code-block:: Python df.describe() .. raw:: html
X Y RL Azi Dip From To Len Den Vp Vs Mag Ip Res
count 329.000000 329.000000 329.000000 329.000000 329.0 329.000000 329.000000 321.000000 327.000000 310.000000 289.000000 326.000000 303.000000 303.000000
mean 258529.740426 271636.412766 83.480851 33.574468 -85.0 543.279696 543.095030 10.223364 2.761664 5441.990323 2772.937716 0.430543 49.738548 3283.762376
std 5700.981826 228.846318 13.217114 3.504538 0.0 262.864370 263.430806 1.070127 0.293938 700.858075 653.446129 5.192919 25.145208 3714.851096
min 252715.000000 271403.000000 70.000000 30.000000 -85.0 9.600000 9.700000 7.500000 2.420000 992.000000 0.000000 0.001000 5.390000 156.900000
25% 252715.000000 271403.000000 70.000000 30.000000 -85.0 355.580000 355.680000 9.700000 2.678500 5196.000000 2294.000000 0.064000 31.555000 1100.000000
50% 264102.200000 271860.100000 96.400000 37.000000 -85.0 547.800000 547.950000 10.000000 2.706000 5575.000000 3041.000000 0.114500 47.040000 2100.000000
75% 264102.200000 271860.100000 96.400000 37.000000 -85.0 750.000000 750.100000 10.400000 2.731000 5857.000000 3248.000000 0.205250 67.190000 3850.000000
max 264102.200000 271860.100000 96.400000 37.000000 -85.0 1049.200000 1049.300000 18.000000 5.074000 6478.000000 3740.000000 93.886000 171.730000 23900.000000


.. GENERATED FROM PYTHON SOURCE LINES 189-190 All numerical variables in line plots, the x axis is just the position in the file. .. GENERATED FROM PYTHON SOURCE LINES 190-196 .. code-block:: Python df.plot(figsize=(17, 14), subplots=True) plt.tight_layout() # plt.savefig('Output/lines.png') plt.show() .. image-sg:: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_001.png :alt: eda integration delivery 1 :srcset: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 197-200 Null Values (NaN) in the Variables ---------------------------------- As can be seen in the previous cells and plots, most of the NaN are in Vp, Vs, Ip, and Re. Obs, a text variable, has also a lot (77.8%) of NaNs. .. GENERATED FROM PYTHON SOURCE LINES 202-203 % of NaN in all variables .. GENERATED FROM PYTHON SOURCE LINES 203-207 .. code-block:: Python print('% of NaN in all variables:') df.isna().sum()*100/len(df) .. rst-class:: sphx-glr-script-out .. code-block:: none % of NaN in all variables: Hole 0.000000 X 0.000000 Y 0.000000 RL 0.000000 Azi 0.000000 Dip 0.000000 Sample 0.000000 From 0.000000 To 0.000000 Len 2.431611 Den 0.607903 Vp 5.775076 Vs 12.158055 Mag 0.911854 Ip 7.902736 Res 7.902736 Obs 77.811550 Form 0.000000 dtype: float64 .. GENERATED FROM PYTHON SOURCE LINES 208-209 Plot of the number of NaN in the numerical variables .. GENERATED FROM PYTHON SOURCE LINES 209-215 .. code-block:: Python df.select_dtypes(include=['number']).isna().sum().plot.bar(figsize=(7, 4), ylabel='Count', edgecolor='k', color='skyblue', title='All Variables NaN', zorder=2) plt.grid(zorder=1) plt.show() .. image-sg:: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_002.png :alt: All Variables NaN :srcset: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 216-219 Reference Variables ------------------- Not all numerical variables are suitable for modeling. In addition to the text variables (Hole, Sample, Obs, Form), X, Y, RL, Azi, Dip, From, To, and Len are reference or location variables. Among these, `Len` represents the length of the cores used for measurements, with an average length of 10.2 cm. .. GENERATED FROM PYTHON SOURCE LINES 219-226 .. code-block:: Python # Plot core length histogram df.Len.plot.hist(figsize=(6, 5), subplots=True, bins=40, edgecolor='k', color='skyblue', zorder=2) plt.grid(zorder=1) plt.xlabel('Core Length (cm)') plt.show() .. image-sg:: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_003.png :alt: eda integration delivery 1 :srcset: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 227-228 Core length influence on density .. GENERATED FROM PYTHON SOURCE LINES 228-265 .. code-block:: Python plt.figure(figsize=(11, 8)) plt.subplot(221) plt.scatter(df.Len[df.Hole == 'TC-1319-005'], df.Den[df.Hole == 'TC-1319-005'], edgecolor='k', color='skyblue', zorder=2) plt.grid(zorder=1) plt.xlabel('Core Length (cm)') plt.ylabel('Density (g/cm³)') plt.axis([8, 16, 2, 5.5]) plt.title('Hole TC-1319-005') plt.subplot(222) plt.scatter(df.Len[df.Hole == 'TC-3660-008'], df.Den[df.Hole == 'TC-3660-008'], edgecolor='k', color='skyblue', zorder=2) plt.grid(zorder=1) plt.xlabel('Core Length (cm)') plt.ylabel('Density (g/cm³)') plt.title('Hole TC-3660-008') plt.axis([8, 16, 2, 5.5]) plt.subplot(223) plt.scatter(df.Len[df.Hole == 'TC-1319-005'], df.From[df.Hole == 'TC-1319-005'], edgecolor='k', color='skyblue', zorder=2) plt.grid(zorder=1) plt.xlabel('Core Length (cm)') plt.ylabel('Depth (m)') plt.axis([8, 16, 1100, 0]) plt.title('Hole TC-1319-005') plt.subplot(224) plt.scatter(df.Len[df.Hole == 'TC-3660-008'], df.From[df.Hole == 'TC-3660-008'], edgecolor='k', color='skyblue', zorder=2) plt.grid(zorder=1) plt.xlabel('Core Length (cm)') plt.ylabel('Depth (m)') plt.title('Hole TC-3660-008') plt.axis([8, 16, 1100, 0]) plt.tight_layout() plt.show() .. image-sg:: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_004.png :alt: Hole TC-1319-005, Hole TC-3660-008, Hole TC-1319-005, Hole TC-3660-008 :srcset: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_004.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 266-270 Modelable Variables or Features ------------------------------- In the following section, we will review, through the main statistical parameters and plots, each of the features or variables suitable for modeling. The variable From, the sample's top depth, could have a double meaning, as a reference and feature. Len is not for modeling, it is important because allows the assessment of the quality of the density. .. GENERATED FROM PYTHON SOURCE LINES 272-274 .. code-block:: Python features = df[['Hole', 'From', 'Len', 'Den', 'Vp', 'Vs', 'Mag', 'Ip', 'Res', 'Form']] features .. raw:: html
Hole From Len Den Vp Vs Mag Ip Res Form
0 TC-1319-005 265.77 11.2 2.610 2179.0 NaN 0.108 NaN NaN CALP
1 TC-1319-005 271.66 11.8 4.318 5514.0 2222.0 0.108 26.06 4600.0 CALP
2 TC-1319-005 277.05 10.3 3.399 5754.0 2085.0 0.237 17.52 10200.0 CALP
3 TC-1319-005 282.55 11.2 4.342 5773.0 2011.0 0.151 28.13 5800.0 CALP
4 TC-1319-005 287.20 10.8 3.293 6034.0 2122.0 0.176 6.88 15500.0 CALP
... ... ... ... ... ... ... ... ... ... ...
324 TC-3660-008 814.55 9.6 2.763 5680.0 3179.0 0.158 70.94 481.1 LIS
325 TC-3660-008 819.92 NaN 2.649 NaN NaN 0.024 NaN NaN LIS
326 TC-3660-008 825.60 9.9 2.762 5593.0 1787.0 0.337 78.44 334.6 PAL
327 TC-3660-008 828.80 9.9 2.804 5858.0 1964.0 0.311 73.73 466.5 PAL
328 TC-3660-008 833.40 10.1 2.735 5805.0 3217.0 0.308 33.86 2300.0 PAL

329 rows × 10 columns



.. GENERATED FROM PYTHON SOURCE LINES 275-276 .. code-block:: Python features.describe() .. raw:: html
From Len Den Vp Vs Mag Ip Res
count 329.000000 321.000000 327.000000 310.000000 289.000000 326.000000 303.000000 303.000000
mean 543.279696 10.223364 2.761664 5441.990323 2772.937716 0.430543 49.738548 3283.762376
std 262.864370 1.070127 0.293938 700.858075 653.446129 5.192919 25.145208 3714.851096
min 9.600000 7.500000 2.420000 992.000000 0.000000 0.001000 5.390000 156.900000
25% 355.580000 9.700000 2.678500 5196.000000 2294.000000 0.064000 31.555000 1100.000000
50% 547.800000 10.000000 2.706000 5575.000000 3041.000000 0.114500 47.040000 2100.000000
75% 750.000000 10.400000 2.731000 5857.000000 3248.000000 0.205250 67.190000 3850.000000
max 1049.200000 18.000000 5.074000 6478.000000 3740.000000 93.886000 171.730000 23900.000000


.. GENERATED FROM PYTHON SOURCE LINES 277-278 Boxplot of each feature .. GENERATED FROM PYTHON SOURCE LINES 278-283 .. code-block:: Python features.plot.box(subplots=True, grid=False, figsize=(12, 7), layout=(3, 4), flierprops={"marker": "."}) plt.tight_layout() # plt.savefig('Output/histos.png') plt.show() .. image-sg:: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_005.png :alt: eda integration delivery 1 :srcset: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_005.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 284-286 .. code-block:: Python features.to_csv('Output/features.csv') .. GENERATED FROM PYTHON SOURCE LINES 287-290 Density (Den) ^^^^^^^^^^^^^ The distribution of the density variable has a wide tail to the right (positive skew), with anomalous values beyond +3 standard deviations. There is no mention, in the observations of the petrophysical data or in the stratigraphic logging of the hole, about the reason for these anomalous values, however, as it was seen previouly, the influence of long length (> 11.2 cm) in these high densities is clear. .. GENERATED FROM PYTHON SOURCE LINES 292-293 Density statistical parameters by _aux .. GENERATED FROM PYTHON SOURCE LINES 293-295 .. code-block:: Python basic_stat.parameters(features.Den, 'Den') .. rst-class:: sphx-glr-script-out .. code-block:: none --------------------------------------- Main statistical parameters of Den NaN: 2 NaN(%): 0.608 Lenght: 329 Min: 2.42 Max: 5.074 Range: 2.654 Mean: 2.762 Median: 2.706 Mode: ModeResult(mode=np.float64(2.703), count=np.int64(9)) Mean < median: right tail Skew: 4.618 Fisher's kurtosis: 24.334 Pearson's kurtosis: 27.334 Variance: 0.086 Standart deviation: 0.294 Mean - 3std: 1.880 Mean + 3std: 3.643 .. GENERATED FROM PYTHON SOURCE LINES 296-297 Density plots by _aux .. GENERATED FROM PYTHON SOURCE LINES 297-299 .. code-block:: Python basic_stat.plots(features.Den, 'Den') .. image-sg:: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_006.png :alt: Den Statistical Plots :srcset: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_006.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 300-301 8 observations with anomalous values in the density .. GENERATED FROM PYTHON SOURCE LINES 301-303 .. code-block:: Python features[features.Den > 4].sort_values(by='From') .. raw:: html
Hole From Len Den Vp Vs Mag Ip Res Form
1 TC-1319-005 271.66 11.8 4.318 5514.0 2222.0 0.108 26.06 4600.0 CALP
3 TC-1319-005 282.55 11.2 4.342 5773.0 2011.0 0.151 28.13 5800.0 CALP
6 TC-1319-005 301.50 11.6 4.251 5686.0 2755.0 0.132 14.77 6000.0 CALP
7 TC-1319-005 313.65 11.4 4.193 5588.0 2039.0 0.068 20.37 5400.0 CALP
11 TC-1319-005 334.65 11.3 4.079 5678.0 2198.0 0.036 34.51 2000.0 CALP
12 TC-1319-005 340.00 11.9 4.444 5561.0 2581.0 0.024 14.92 4800.0 CALP
13 TC-1319-005 344.77 11.2 4.123 5359.0 NaN 0.041 45.93 1000.0 CALP
24 TC-1319-005 400.10 13.9 5.074 5367.0 2422.0 0.066 NaN NaN CALP


.. GENERATED FROM PYTHON SOURCE LINES 304-305 Observations related to density .. GENERATED FROM PYTHON SOURCE LINES 305-307 .. code-block:: Python df[df.Den > 4].Obs .. rst-class:: sphx-glr-script-out .. code-block:: none 1 NaN 3 NaN 6 NaN 7 NaN 11 NaN 12 NaN 13 Vs signal to low. 24 Sample too long to acquire IP measurements. Name: Obs, dtype: object .. GENERATED FROM PYTHON SOURCE LINES 308-313 .. code-block:: Python den_anomalous_index = list(features[features.Den > 4].index) print('Hole Vp: Observation:') for n in den_anomalous_index: print(df.Hole[n], ' ', df.Den[n], ' ', df.Obs[n]) .. rst-class:: sphx-glr-script-out .. code-block:: none Hole Vp: Observation: TC-1319-005 4.318 nan TC-1319-005 4.342 nan TC-1319-005 4.251 nan TC-1319-005 4.193 nan TC-1319-005 4.079 nan TC-1319-005 4.444 nan TC-1319-005 4.123 Vs signal to low. TC-1319-005 5.074 Sample too long to acquire IP measurements. .. GENERATED FROM PYTHON SOURCE LINES 314-319 Anomalies in Hole TC-1319-005 ----------------------------- Apart from the fact that the anomalies are only in the TC-1319-005, there is almost no information related to the density anomalies. The stratigraphic logging mentioned abundant pyrite in hole TC-1319-005, around 1046 m, but this is far from the location of the anomalies in the petrophysical data (271.7 - 400.1 m). .. GENERATED FROM PYTHON SOURCE LINES 321-335 Compressional Velocity (Vp) --------------------------- The distribution of the compressional velocity has a wide tail to the left (negative skew), with anomalous values below -3 standard deviations. The observations on these anomalous samples point toward open fractures on the core as responsible for the low Vp values. Below are two reference values of Vp, at 20ºC (source: https://www.engineeringtoolbox.com/sound-speed-water-d_598.html): +----------------+-------------+ | Medium | Vp (m/s) | +================+=============+ | Fresh Water | 1480 | +----------------+-------------+ | Air | 343 | +----------------+-------------+ .. GENERATED FROM PYTHON SOURCE LINES 337-338 Vp statistical parameters .. GENERATED FROM PYTHON SOURCE LINES 338-339 .. code-block:: Python basic_stat.parameters(features.Vp, 'Vp') .. rst-class:: sphx-glr-script-out .. code-block:: none --------------------------------------- Main statistical parameters of Vp NaN: 19 NaN(%): 5.775 Lenght: 329 Min: 992.0 Max: 6478.0 Range: 5486.0 Mean: 5441.990 Median: 5575.0 Mode: ModeResult(mode=np.float64(5575.0), count=np.int64(8)) Mean < median: left tail Skew: -2.427 Fisher's kurtosis: 9.138 Pearson's kurtosis: 12.138 Variance: 491202.042 Standart deviation: 700.858 Mean - 3std: 3339.416 Mean + 3std: 7544.565 .. GENERATED FROM PYTHON SOURCE LINES 340-341 Vp plots .. GENERATED FROM PYTHON SOURCE LINES 341-343 .. code-block:: Python basic_stat.plots(features.Vp, 'Vp') .. image-sg:: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_007.png :alt: Vp Statistical Plots :srcset: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_007.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 344-345 5 obsevations with anomalous values in the Vp .. GENERATED FROM PYTHON SOURCE LINES 345-347 .. code-block:: Python features[features.Vp < 3000] .. raw:: html
Hole From Len Den Vp Vs Mag Ip Res Form
0 TC-1319-005 265.77 11.2 2.610 2179.0 NaN 0.108 NaN NaN CALP
77 TC-1319-005 645.30 11.0 2.725 992.0 NaN 0.308 98.11 293.3 LABL
79 TC-1319-005 653.95 12.0 2.936 2321.0 1929.0 0.342 88.45 443.4 LABL
159 TC-1319-005 1046.35 10.6 2.455 2325.0 NaN 0.158 44.44 1300.0 LIS
174 TC-3660-008 64.54 9.4 2.677 2802.0 0.0 0.038 35.19 2500.0 CALP


.. GENERATED FROM PYTHON SOURCE LINES 348-349 Observations related to Vp .. GENERATED FROM PYTHON SOURCE LINES 349-355 .. code-block:: Python vp_anomalous_index = list(features[features.Vp < 3000].index) print('Hole Vp: Observation:') for n in vp_anomalous_index: print(df.Hole[n], ' ', df.Vp[n], ' ', df.Obs[n]) .. rst-class:: sphx-glr-script-out .. code-block:: none Hole Vp: Observation: TC-1319-005 2179.0 Broken sample parallel to z-axis prevented to take IP and Vs measurements. TC-1319-005 992.0 Broken sample parallel to z-axis prevented to take Vp and Vs measurements. TC-1319-005 2321.0 nan TC-1319-005 2325.0 The sample had an unflat base. Vs signal to low. TC-3660-008 2802.0 Sample had a broken disk. Full length 10,4cm .. GENERATED FROM PYTHON SOURCE LINES 356-357 The mayority of Vp anomalies are concentrate in the hole TC-1319-005 and they appear to be related with the geometry and fractures in the core sample. The stratigraphic logging has not direct mentions of the low Vp. .. GENERATED FROM PYTHON SOURCE LINES 359-375 Shear Velocity (Vs) ------------------- The distribution of the shear velocity has an irregular tail to the left (negative skew), with anomalous values below -3 standard deviations. As well as in the case of Vp, the observations on these anomalous values point toward open fractures on the core as responsible for the low Vs. Values of zero are not admissible in the case of solid samples, especially considering the densities in these anomalies (mean of 2.72). To improve subsequent models, these anomalous values of Vs should be replaced by NaN and then imputed (replaced by a logical value). Below are two reference values of Vs (source: Reynolds, J (2011) An Introduction to Applied and Environmental Geophysics): +------------------------+-------------+ | Medium | Vs (m/s) | +========================+=============+ | Unconsolidated sands | 65-105 | +------------------------+-------------+ | Plastic clays | 80-130 | +------------------------+-------------+ .. GENERATED FROM PYTHON SOURCE LINES 377-378 Vs statistical parameters .. GENERATED FROM PYTHON SOURCE LINES 378-381 .. code-block:: Python basic_stat.parameters(features.Vs, 'Vs') .. rst-class:: sphx-glr-script-out .. code-block:: none --------------------------------------- Main statistical parameters of Vs NaN: 40 NaN(%): 12.158 Lenght: 329 Min: 0.0 Max: 3740.0 Range: nan Mean: 2772.938 Median: 3041.0 Mode: ModeResult(mode=np.float64(3333.0), count=np.int64(6)) Mean < median: left tail Skew: -1.540 Fisher's kurtosis: 3.346 Pearson's kurtosis: 6.346 Variance: 426991.843 Standart deviation: 653.446 Mean - 3std: 812.599 Mean + 3std: 4733.276 .. GENERATED FROM PYTHON SOURCE LINES 382-383 Vp statistical parameterss plots .. GENERATED FROM PYTHON SOURCE LINES 383-385 .. code-block:: Python basic_stat.plots(features.Vs, 'Vs') .. image-sg:: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_008.png :alt: Vs Statistical Plots :srcset: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_008.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 386-387 5 obsevations with anomalous values of Vs .. GENERATED FROM PYTHON SOURCE LINES 387-389 .. code-block:: Python features[features.Vs < 1000].sort_values(by='From') .. raw:: html
Hole From Len Den Vp Vs Mag Ip Res Form
174 TC-3660-008 64.54 9.4 2.677 2802.0 0.0 0.038 35.19 2500.0 CALP
198 TC-3660-008 188.03 9.4 2.679 5345.0 0.0 0.098 NaN NaN CALP
243 TC-3660-008 432.95 9.7 2.779 4641.0 178.0 NaN 69.69 515.7 FLT
248 TC-3660-008 452.30 10.4 2.686 5652.0 0.0 0.053 27.54 4000.0 FLT
312 TC-3660-008 774.21 9.6 2.787 6173.0 0.0 0.123 73.08 373.2 SP


.. GENERATED FROM PYTHON SOURCE LINES 390-391 Average density of the anomalous values of Vs .. GENERATED FROM PYTHON SOURCE LINES 391-393 .. code-block:: Python features[features.Vs < 1000].Den.mean() .. rst-class:: sphx-glr-script-out .. code-block:: none np.float64(2.7216) .. GENERATED FROM PYTHON SOURCE LINES 394-395 Observations related to Vs .. GENERATED FROM PYTHON SOURCE LINES 395-402 .. code-block:: Python vs_anomalous_index = list(features[features.Vs < 1000].index) print('Hole Vs: Observation:') for n in vs_anomalous_index: print(df.Hole[n], ' ', df.Vs[n], ' ', df.Obs[n]) .. rst-class:: sphx-glr-script-out .. code-block:: none Hole Vs: Observation: TC-3660-008 0.0 Sample had a broken disk. Full length 10,4cm TC-3660-008 0.0 Sample had unperpendicular bases respect from the z-axis that prevented to take IP measurements. TC-3660-008 178.0 Chipped base TC-3660-008 0.0 Chipped base TC-3660-008 0.0 nan .. GENERATED FROM PYTHON SOURCE LINES 403-416 Observations on Anomalies ------------------------- *Cllst_logging_details.xls* has no mentions related to the zero Vs values. Other observations are: +-------+-------------+---------------------+--------------------------------------------+ | Index | Hole | Reference Depth (m) | Explanation | +=======+=============+=====================+============================================+ | 175 | TC-3660-008 | 64 | - | +-------+-------------+---------------------+--------------------------------------------+ | 198 & 311 | TC-3660-008 | 188 & 774 | Diagenetic pyrite & pyrite in mudstones | +-------+-------------+---------------------+--------------------------------------------+ | 210 & 248 | TC-3660-008 | 432 & 452 | Fault zone | +-------+-------------+---------------------+--------------------------------------------+ .. GENERATED FROM PYTHON SOURCE LINES 418-419 Once again, the majority of Vs anomalies are concentrated in the hole TC-1319-005 and they also appear to be related to fractures or the geometry of the core sample. The stratigraphic logging has no direct mentions of the low Vs. .. GENERATED FROM PYTHON SOURCE LINES 421-450 Magnetic Susceptibility (Mag) ----------------------------- The magnetic susceptibility has an extremely anomalous value (93.89), which is 218 times higher than the average of the rest of the values (0.43). This outlier significantly skews the statistical parameters and visualizations, as it lies beyond 10 standard deviations above the mean. Removing this value reveals that the distribution has a tail to the right (positive skew), with some values exceeding 3 standard deviations. Regarding the anomalous value, it exceeds the maximum value of 2 delivered by the KT-20 detector device, with a 10 kHz single-frequency circular sensor (https://terraplus.ca/wp-content/uploads/terraplus-Brochures-English/KT-20-Physical-Property-Measuring-System.pdf). Unfortunately, there are no specific observations that explain the reason for this extreme value. Below are reference values of magnetic susceptibility for different rocks (source: https://www.eoas.ubc.ca/courses/eosc350/content/foundations/properties/magsuscept.htm#:~:text=In%20rocks%2C%20susceptibility%20is%20mainly,trace%20amounts%20in%20most%20sediments.): +----------------+--------------------------------------+ | Rock | Magnetic Susceptibility x 10^-3 (SI) | +================+======================================+ | Limestones | 0 - 3 | +----------------+--------------------------------------+ | Sandstones | 0 - 20 | +----------------+--------------------------------------+ | Shales | 0.01 - 15 | +----------------+--------------------------------------+ | Gneiss | 0.1 - 25 | +----------------+--------------------------------------+ | Granite | 0 - 50 | +----------------+--------------------------------------+ | Basalt | 0.2 - 175 | +----------------+--------------------------------------+ | Magnetite | 1200 - 19200 | +----------------+--------------------------------------+ .. GENERATED FROM PYTHON SOURCE LINES 452-453 Mag statistical parameters .. GENERATED FROM PYTHON SOURCE LINES 453-455 .. code-block:: Python basic_stat.parameters(features.Mag, 'Mag') .. rst-class:: sphx-glr-script-out .. code-block:: none --------------------------------------- Main statistical parameters of Mag NaN: 3 NaN(%): 0.912 Lenght: 329 Min: 0.001 Max: 93.886 Range: 93.88499999999999 Mean: 0.431 Median: 0.1145 Mode: ModeResult(mode=np.float64(0.063), count=np.int64(7)) Mean < median: right tail Skew: 17.962 Fisher's kurtosis: 320.760 Pearson's kurtosis: 323.760 Variance: 26.966 Standart deviation: 5.193 Mean - 3std: -15.148 Mean + 3std: 16.009 .. GENERATED FROM PYTHON SOURCE LINES 456-457 Mag plots .. GENERATED FROM PYTHON SOURCE LINES 457-459 .. code-block:: Python basic_stat.plots(features.Mag, 'Mag') .. image-sg:: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_009.png :alt: Mag Statistical Plots :srcset: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_009.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 460-461 The maximum is 200 times bigger than the mean value .. GENERATED FROM PYTHON SOURCE LINES 461-464 .. code-block:: Python print('Index of the maximun:', features.Mag.idxmax()) print('Maximun value:', features.Mag[305]) .. rst-class:: sphx-glr-script-out .. code-block:: none Index of the maximun: 305 Maximun value: 93.886 .. GENERATED FROM PYTHON SOURCE LINES 465-466 .. code-block:: Python features.loc[305] .. rst-class:: sphx-glr-script-out .. code-block:: none Hole TC-3660-008 From 734.8 Len 9.5 Den 2.708 Vp 5026.0 Vs 2914.0 Mag 93.886 Ip 71.63 Res 1200.0 Form SP Name: 305, dtype: object .. GENERATED FROM PYTHON SOURCE LINES 467-468 Mag < 10 statistical parameters .. GENERATED FROM PYTHON SOURCE LINES 468-470 .. code-block:: Python basic_stat.parameters(features.Mag[features.Mag < 10], 'Mag') .. rst-class:: sphx-glr-script-out .. code-block:: none --------------------------------------- Main statistical parameters of Mag NaN: 0 NaN(%): 0.000 Lenght: 325 Min: 0.001 Max: 0.545 Range: 0.544 Mean: 0.143 Median: 0.114 Mode: ModeResult(mode=np.float64(0.063), count=np.int64(7)) Mean < median: right tail Skew: 1.050 Fisher's kurtosis: 0.638 Pearson's kurtosis: 3.638 Variance: 0.010 Standart deviation: 0.101 Mean - 3std: -0.159 Mean + 3std: 0.445 .. GENERATED FROM PYTHON SOURCE LINES 471-472 Mag < 10 plots .. GENERATED FROM PYTHON SOURCE LINES 472-474 .. code-block:: Python basic_stat.plots(features.Mag[features.Mag < 10], 'Mag') .. image-sg:: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_010.png :alt: Mag Statistical Plots :srcset: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_010.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 475-476 Mag anomalous value .. GENERATED FROM PYTHON SOURCE LINES 476-478 .. code-block:: Python df[df.Mag > 10] .. raw:: html
Hole X Y RL Azi Dip Sample From To Len Den Vp Vs Mag Ip Res Obs Form
305 TC-3660-008 264102.2 271860.1 96.4 37 -85 TC19095 734.8 734.9 9.5 2.708 5026.0 2914.0 93.886 71.63 1200.0 NaN SP


.. GENERATED FROM PYTHON SOURCE LINES 479-480 The stratigraphic column mentions "pyrite in mudstones" in the interval corresponding to this sample but nothing related to tha Mag. .. GENERATED FROM PYTHON SOURCE LINES 482-487 Induced Polarization (Ip) ^^^^^^^^^^^^^^^^^^^^^^^^^ Induced polarization or chargeability is the "capacity of a material to retain charges after a forcing current is removed." It "depends upon many factors, including mineral type, grain size, the ratio of internal surface area to volume, the properties of electrolytes in pore space, and the physics of interaction between surfaces and fluids. Interpretation of chargeability models is further complicated by the fact that there is no standard set of units for this physical property."(source: https://gpg.geosci.xyz/content/induced_polarization/induced_polarization_introduction.html). Ip is similar to conductivity (the inverse of resistivity) but not exactly the same, while the first is related with the capacitance of the material (retain electrical charges localized), the second is the ability of a material to allow the flow of electricity. In our case, the measures of Ip have a tail to the right (positive skew), with only a value (171.73) above 3 standard deviations. There not observations associated with this value. .. GENERATED FROM PYTHON SOURCE LINES 489-490 Ip statistical parameters .. GENERATED FROM PYTHON SOURCE LINES 490-492 .. code-block:: Python basic_stat.parameters(features.Ip, 'Ip') .. rst-class:: sphx-glr-script-out .. code-block:: none --------------------------------------- Main statistical parameters of Ip NaN: 26 NaN(%): 7.903 Lenght: 329 Min: 5.39 Max: 171.73 Range: nan Mean: 49.739 Median: 47.04 Mode: ModeResult(mode=np.float64(7.27), count=np.int64(2)) Mean < median: right tail Skew: 0.648 Fisher's kurtosis: 0.964 Pearson's kurtosis: 3.964 Variance: 632.281 Standart deviation: 25.145 Mean - 3std: -25.697 Mean + 3std: 125.174 .. GENERATED FROM PYTHON SOURCE LINES 493-494 Ip plots .. GENERATED FROM PYTHON SOURCE LINES 494-496 .. code-block:: Python basic_stat.plots(features.Ip, 'Ip') .. image-sg:: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_011.png :alt: Ip Statistical Plots :srcset: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_011.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 497-498 Ip anomalous value .. GENERATED FROM PYTHON SOURCE LINES 498-501 .. code-block:: Python df[df.Ip > 150] .. raw:: html
Hole X Y RL Azi Dip Sample From To Len Den Vp Vs Mag Ip Res Obs Form
256 TC-3660-008 264102.2 271860.1 96.4 37 -85 TC19046 490.7 490.8 10.5 2.723 5412.0 3292.0 0.066 171.73 808.6 NaN S


.. GENERATED FROM PYTHON SOURCE LINES 502-503 No information about this anomaly in the petrophysical or in the stratigraphic data. .. GENERATED FROM PYTHON SOURCE LINES 505-508 Resistivity (Res) ^^^^^^^^^^^^^^^^^ Although resistivity is a specific measure of pure materials, this property, like Ip, depends in the case of rocks on the constituent minerals and the fluids in their pore space. The measurements in our case show a typical pattern with tail to the right (positive skew) and with several values above 3 standard deviation. There are no observations associated with these high values. .. GENERATED FROM PYTHON SOURCE LINES 510-511 Res statistical parameters .. GENERATED FROM PYTHON SOURCE LINES 511-513 .. code-block:: Python basic_stat.parameters(df.Res, 'Res') .. rst-class:: sphx-glr-script-out .. code-block:: none --------------------------------------- Main statistical parameters of Res NaN: 26 NaN(%): 7.903 Lenght: 329 Min: 156.9 Max: 23900.0 Range: nan Mean: 3283.762 Median: 2100.0 Mode: ModeResult(mode=np.float64(1500.0), count=np.int64(12)) Mean < median: right tail Skew: 2.729 Fisher's kurtosis: 8.783 Pearson's kurtosis: 11.783 Variance: 13800118.663 Standart deviation: 3714.851 Mean - 3std: -7860.791 Mean + 3std: 14428.316 .. GENERATED FROM PYTHON SOURCE LINES 514-515 Res plots .. GENERATED FROM PYTHON SOURCE LINES 515-517 .. code-block:: Python basic_stat.plots(df.Res, 'Res') .. image-sg:: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_012.png :alt: Res Statistical Plots :srcset: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_012.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 518-520 .. code-block:: Python df[df.Res > 14000].sort_values(by='From') .. raw:: html
Hole X Y RL Azi Dip Sample From To Len Den Vp Vs Mag Ip Res Obs Form
161 TC-3660-008 264102.2 271860.1 96.4 37 -85 TC18951 9.60 9.70 9.9 2.690 6226.0 3426.0 0.545 12.19 18200.0 NaN CALP
162 TC-3660-008 264102.2 271860.1 96.4 37 -85 TC18952 12.86 12.95 9.5 2.702 6376.0 3467.0 0.435 11.26 23000.0 NaN CALP
173 TC-3660-008 264102.2 271860.1 96.4 37 -85 TC18963 59.80 59.90 9.2 2.705 5974.0 3321.0 0.020 7.27 20900.0 NaN CALP
176 TC-3660-008 264102.2 271860.1 96.4 37 -85 TC18966 75.57 75.67 10.4 2.708 6341.0 3478.0 0.051 9.42 23900.0 NaN CALP
179 TC-3660-008 264102.2 271860.1 96.4 37 -85 TC18969 88.90 89.00 8.8 2.696 6331.0 3712.0 0.116 7.50 16600.0 NaN CALP
190 TC-3660-008 264102.2 271860.1 96.4 37 -85 TC18980 146.20 146.30 9.5 2.680 6376.0 3156.0 0.108 8.08 15900.0 NaN CALP
199 TC-3660-008 264102.2 271860.1 96.4 37 -85 TC18989 193.40 193.50 9.9 2.707 6429.0 3333.0 0.154 5.39 14800.0 NaN CALP
4 TC-1319-005 252715.0 271403.0 70.0 30 -85 TC18393 287.20 287.31 10.8 3.293 6034.0 2122.0 0.176 6.88 15500.0 NaN CALP


.. GENERATED FROM PYTHON SOURCE LINES 521-524 Stratigraphic Data Observations ----------------------_-------- Only the stratigraphic data has observations of pyrite in hole TC-1319-005, but at shallower depths, around 146 & 193 m. .. GENERATED FROM PYTHON SOURCE LINES 526-529 Features Correlation -------------------- To appreciate the relation between the features, the anomaly of the Mag was excluded. No other elimination or imputation (fill with numbers the NaNs) was performed, therefore the quality of the correlations may change after the imputation. .. GENERATED FROM PYTHON SOURCE LINES 529-535 .. code-block:: Python # Pair Plot of All Features Against Each Other mat = features[features.Mag < 10].select_dtypes(include=['number']) sns.pairplot(mat) plt.show() .. image-sg:: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_013.png :alt: eda integration delivery 1 :srcset: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_013.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 536-537 Correlation Matrix .. GENERATED FROM PYTHON SOURCE LINES 537-540 .. code-block:: Python correl_mat = mat.corr() correl_mat .. raw:: html
From Len Den Vp Vs Mag Ip Res
From 1.000000 -0.058138 -0.210686 -0.288455 -0.200839 0.264689 0.424983 -0.531734
Len -0.058138 1.000000 0.462965 -0.116166 -0.204523 -0.093206 -0.012119 -0.072091
Den -0.210686 0.462965 1.000000 0.046488 -0.153618 -0.102067 -0.157620 0.089023
Vp -0.288455 -0.116166 0.046488 1.000000 0.349661 -0.032834 -0.269868 0.393633
Vs -0.200839 -0.204523 -0.153618 0.349661 1.000000 0.003852 -0.088828 0.267800
Mag 0.264689 -0.093206 -0.102067 -0.032834 0.003852 1.000000 0.260463 -0.042086
Ip 0.424983 -0.012119 -0.157620 -0.269868 -0.088828 0.260463 1.000000 -0.656705
Res -0.531734 -0.072091 0.089023 0.393633 0.267800 -0.042086 -0.656705 1.000000


.. GENERATED FROM PYTHON SOURCE LINES 541-542 Correlation Heatmap .. GENERATED FROM PYTHON SOURCE LINES 542-545 .. code-block:: Python sns.heatmap(correl_mat, vmin=-1, vmax=1, center=0, linewidths=.1, cmap='seismic_r', annot=True) plt.show() .. image-sg:: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_014.png :alt: eda integration delivery 1 :srcset: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_014.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 546-549 Additional Data Insights ------------------------ The main role of the additional datasets has been to help us understand the real nature of some anomalies observed in the petrophysical dataset, such as the density anomalies present in the upper half of the TC-1319-005 keyhole. .. GENERATED FROM PYTHON SOURCE LINES 549-611 .. code-block:: Python # Triple Plot by Major Formation # Fix colors for the formations import plotly.express as px import plotly.io as pio colors = px.colors.qualitative.Plotly # Define color mapping for each formation color_map = {form: colors[i % len(colors)] for i, form in enumerate(df['Form'].unique())} # Set plot renderer and headless environment setup pio.renderers.default = 'png' pio.orca.config.use_xvfb = True # 1x3 subplots fig = make_subplots( rows=1, cols=3, subplot_titles=('Vp vs Den', 'Vp vs Vs', 'Vp/Vs vs Den') ) # Plotting by formations rat = df['Vp'] / df['Vs'] for form in df['Form'].unique(): df_form = df[df['Form'] == form] rat_form = rat[df['Form'] == form] trace1 = go.Scatter( x=df_form['Vp'], y=df_form['Den'], mode='markers', name=form, legendgroup=form, marker=dict(color=color_map[form]), hovertext=df_form['Hole'], hoverinfo='x+y+text' ) trace2 = go.Scatter( x=df_form['Vp'], y=df_form['Vs'], mode='markers', name=form, legendgroup=form, showlegend=False, marker=dict(color=color_map[form]), hovertext=df_form['Hole'], hoverinfo='x+y+text' ) trace3 = go.Scatter( x=rat_form, y=df_form['Den'], mode='markers', name=form, legendgroup=form, showlegend=False, marker=dict(color=color_map[form]), hovertext=df_form['Hole'], hoverinfo='x+y+text' ) fig.add_trace(trace1, row=1, col=1) fig.add_trace(trace2, row=1, col=2) fig.add_trace(trace3, row=1, col=3) # Adjusting title and figure size fig.update_layout( height=800, # Figure height width=1600, # Figure width title_text="Plots by Major Formations", showlegend=True, legend=dict(x=1.02, y=1) # Adjust legend position ) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 612-615 Legacy Data Analysis -------------------- Legacy data played a crucial role in evaluating the density anomalies present in the keyhole TC-1319-005. .. GENERATED FROM PYTHON SOURCE LINES 615-622 .. code-block:: Python # Whole Rock Lithogeochemistry Data Load whole = pd.read_csv( filepath_or_buffer=f'{base_path}/collinstown_wholerock_lithogeochemistry.csv', sep=';' ) .. GENERATED FROM PYTHON SOURCE LINES 623-624 Renaming Columns and Filtering Data .. GENERATED FROM PYTHON SOURCE LINES 624-638 .. code-block:: Python whole = whole.rename(columns={'HOLEID' : 'Hole', 'SAMPFROM': 'From', 'Zn_ppm_BEST': 'Zn', 'Pb_ppm_BEST': 'Pb', 'Ba_ppm_BEST': 'Ba', 'Fe_pct_BEST': 'Fe_pct', 'S_pct_BEST' : 'S_pct'}) # Convert % to ppm for Fe and S whole['Fe'] = whole['Fe_pct'] * 10000 whole['S'] = whole['S_pct'] * 10000 # Drop unnecessary columns whole = whole.drop(['Fe_pct', 'S_pct'], axis=1) # Sort filtered data by "From" whole5 = whole[whole.Hole == 'TC-1319-005'].sort_values('From') .. GENERATED FROM PYTHON SOURCE LINES 639-640 Whole Rock Lithogeochemistry Plots .. GENERATED FROM PYTHON SOURCE LINES 640-660 .. code-block:: Python plt.figure(figsize=(15, 8)) # Plotting key elements elem_list = ['Zn', 'Pb', 'Fe', 'Ba', 'S'] for element in elem_list: plt.subplot(1, 5, (elem_list.index(element) + 1)) plt.plot(whole5[element], whole5.From) plt.axvline(x=np.mean(whole5[element]), ls='--', c='r', label='mean') plt.axhspan(272, 400, color='r', alpha=0.2, label='AIPD') plt.ylim(1100, -20) if elem_list.index(element) == 0: plt.ylabel('Depth (m)') plt.legend() plt.xlabel(f'{element} (ppm)') plt.grid() plt.suptitle('Hole TC-1319-005 Legacy Whole Rock Lithogeochemistry\n AIPD: Anomalous Interval in the Petrophysical Data', y=1) plt.tight_layout() plt.show() .. image-sg:: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_015.png :alt: Hole TC-1319-005 Legacy Whole Rock Lithogeochemistry AIPD: Anomalous Interval in the Petrophysical Data :srcset: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_015.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 661-662 Legacy XRF data load .. GENERATED FROM PYTHON SOURCE LINES 662-665 .. code-block:: Python xrf = pd.read_csv(f'{base_path}/collinstown_pXRF.csv') xrf.head() .. raw:: html
SAMPLEID HOLEID PROJECTCODE SAMPFROM SAMPTO SAMPLETYPE FormationMaj_dr FormationMin_dr LithologyMaj_dr LithologyMin_dr pXRF_Date pXRF_ElapsedTime1 pXRF_ElapsedTime2 pXRF_ElapsedTime3 pXRF_ElapsedTotalTim pXRF_InstrumentSN pXRF_LiveTime1 pXRF_LiveTime2 pXRF_LiveTime3 pXRF_LiveTotalTime pXRF_Sampler Reading X_Drillhole_dr Y_Drillhole_dr Z_Drillhole_dr Ag_pXRF_pct Al_pXRF_pct As_pXRF_pct Bi_pXRF_pct Ca_pXRF_pct Cd_pXRF_pct Co_pXRF_pct Cr_pXRF_pct Cu_pXRF_pct Fe_pXRF_pct Hg_pXRF_pct K_pXRF_pct LE_pXRF_pct Mg_pXRF_pct Mn_pXRF_pct Mo_pXRF_pct Nb_pXRF_pct Ni_pXRF_pct P_pXRF_pct Pb_pXRF_pct Rb_pXRF_pct S_pXRF_pct Sb_pXRF_pct Se_pXRF_pct Si_pXRF_pct Sn_pXRF_pct Sr_pXRF_pct Th_pXRF_pct Ti_pXRF_pct U_pXRF_pct V_pXRF_pct W_pXRF_pct Y_pXRF_pct Zn_pXRF_pct Zr_pXRF_pct PL_dr Sample_Dolomite1_Percent_dr SampleComments_Big Texture1_dr Texture2_dr Texture3_dr
0 CR-1_767mEOH_pXRF_0006 CR-1_767mEOH Collinstown 5.99 6.01 pXRF CALP BU P M 42849 39.01 29.41 NaN 68.42 512468 31.81 15.93 NaN 47.74 NaN NaN 263187.0 272286.0 106.0 0 0.28 0.0003 0.0 33.82 0.0 0 0.0000 0.0010 0.2254 0.0 0.0000 55.24 0.00 0.0105 0.0000 0.0000 0.0000 0.0 0.0007 0.0002 0.0383 0.0 0.0002 10.1076 0.0 0.0527 0.0000 0.2010 0.0 0.0169 0.000 0.0004 0.0029 0.0027 3660 NaN NaN aw bu bi
1 CR-1_767mEOH_pXRF_0009 CR-1_767mEOH Collinstown 8.99 9.01 pXRF CALP BU P M 42849 38.95 29.52 NaN 68.47 512468 31.50 17.93 NaN 49.43 NaN NaN 263187.0 272286.0 106.0 0 0.98 0.0005 0.0 23.83 0.0 0 0.0050 0.0016 1.5387 0.0 0.1115 52.61 1.66 0.0145 0.0000 0.0006 0.0028 0.0 0.0006 0.0019 4.3153 0.0 0.0002 14.7500 0.0 0.0410 0.0009 0.1183 0.0 0.0157 0.000 0.0019 0.0034 0.0025 3660 NaN NaN aw bu bi
2 CR-1_767mEOH_pXRF_0012 CR-1_767mEOH Collinstown 11.99 12.01 pXRF CALP BU P M 42849 39.08 29.27 NaN 68.35 512468 32.01 13.86 NaN 45.86 NaN NaN 263187.0 272286.0 106.0 0 0.00 0.0000 0.0 40.88 0.0 0 0.0051 0.0010 0.0458 0.0 0.0000 55.92 0.00 0.0093 0.0000 0.0000 0.0000 0.0 0.0006 0.0000 0.8947 0.0 0.0003 2.1103 0.0 0.0553 0.0011 0.0624 0.0 0.0117 0.001 0.0004 0.0009 0.0010 3660 NaN NaN aw cht NaN
3 CR-1_767mEOH_pXRF_0015 CR-1_767mEOH Collinstown 14.99 15.01 pXRF CALP BU P M 42849 39.09 29.37 NaN 68.46 512468 32.19 15.37 NaN 47.56 NaN NaN 263187.0 272286.0 106.0 0 0.00 0.0004 0.0 37.94 0.0 0 0.0000 0.0000 0.1165 0.0 0.0000 57.55 0.00 0.0116 0.0005 0.0000 0.0017 0.0 0.0012 0.0000 0.0209 0.0 0.0000 4.1532 0.0 0.0914 0.0023 0.1063 0.0 0.0000 0.000 0.0005 0.0016 0.0016 3660 NaN NaN aw bu bi
4 CR-1_767mEOH_pXRF_0018 CR-1_767mEOH Collinstown 17.99 18.01 pXRF CALP BU P NaN 42849 39.11 29.44 NaN 68.55 512468 32.40 16.54 NaN 48.95 NaN NaN 263187.0 272286.0 106.0 0 0.23 0.0000 0.0 35.63 0.0 0 0.0000 0.0009 0.1996 0.0 0.0000 55.31 0.00 0.0130 0.0000 0.0000 0.0000 0.0 0.0007 0.0002 0.0000 0.0 0.0002 8.4221 0.0 0.0864 0.0013 0.0886 0.0 0.0131 0.000 0.0003 0.0022 0.0019 3660 NaN NaN cht aw bu


.. GENERATED FROM PYTHON SOURCE LINES 666-667 xrf columns .. GENERATED FROM PYTHON SOURCE LINES 667-669 .. code-block:: Python xrf.columns .. rst-class:: sphx-glr-script-out .. code-block:: none Index(['SAMPLEID', 'HOLEID', 'PROJECTCODE', 'SAMPFROM', 'SAMPTO', 'SAMPLETYPE', 'FormationMaj_dr', 'FormationMin_dr', 'LithologyMaj_dr', 'LithologyMin_dr', 'pXRF_Date', 'pXRF_ElapsedTime1', 'pXRF_ElapsedTime2', 'pXRF_ElapsedTime3', 'pXRF_ElapsedTotalTim', 'pXRF_InstrumentSN', 'pXRF_LiveTime1', 'pXRF_LiveTime2', 'pXRF_LiveTime3', 'pXRF_LiveTotalTime', 'pXRF_Sampler', 'Reading', 'X_Drillhole_dr', 'Y_Drillhole_dr', 'Z_Drillhole_dr', 'Ag_pXRF_pct', 'Al_pXRF_pct', 'As_pXRF_pct', 'Bi_pXRF_pct', 'Ca_pXRF_pct', 'Cd_pXRF_pct', 'Co_pXRF_pct', 'Cr_pXRF_pct', 'Cu_pXRF_pct', 'Fe_pXRF_pct', 'Hg_pXRF_pct', 'K_pXRF_pct', 'LE_pXRF_pct', 'Mg_pXRF_pct', 'Mn_pXRF_pct', 'Mo_pXRF_pct', 'Nb_pXRF_pct', 'Ni_pXRF_pct', 'P_pXRF_pct', 'Pb_pXRF_pct', 'Rb_pXRF_pct', 'S_pXRF_pct', 'Sb_pXRF_pct', 'Se_pXRF_pct', 'Si_pXRF_pct', 'Sn_pXRF_pct', 'Sr_pXRF_pct', 'Th_pXRF_pct', 'Ti_pXRF_pct', 'U_pXRF_pct', 'V_pXRF_pct', 'W_pXRF_pct', 'Y_pXRF_pct', 'Zn_pXRF_pct', 'Zr_pXRF_pct', 'PL_dr', 'Sample_Dolomite1_Percent_dr', 'SampleComments_Big', 'Texture1_dr', 'Texture2_dr', 'Texture3_dr'], dtype='object') .. GENERATED FROM PYTHON SOURCE LINES 670-671 Main columns in xrf df .. GENERATED FROM PYTHON SOURCE LINES 671-674 .. code-block:: Python xrf = xrf[['HOLEID', 'SAMPFROM', 'Fe_pXRF_pct', 'Zn_pXRF_pct', 'Pb_pXRF_pct', 'S_pXRF_pct']] xrf.head() .. raw:: html
HOLEID SAMPFROM Fe_pXRF_pct Zn_pXRF_pct Pb_pXRF_pct S_pXRF_pct
0 CR-1_767mEOH 5.99 0.2254 0.0029 0.0007 0.0383
1 CR-1_767mEOH 8.99 1.5387 0.0034 0.0006 4.3153
2 CR-1_767mEOH 11.99 0.0458 0.0009 0.0006 0.8947
3 CR-1_767mEOH 14.99 0.1165 0.0016 0.0012 0.0209
4 CR-1_767mEOH 17.99 0.1996 0.0022 0.0007 0.0000


.. GENERATED FROM PYTHON SOURCE LINES 675-676 Conversion of % to ppm .. GENERATED FROM PYTHON SOURCE LINES 676-680 .. code-block:: Python columns_to_multiply = ['Fe_pXRF_pct', 'Zn_pXRF_pct', 'Pb_pXRF_pct', 'S_pXRF_pct'] xrf[columns_to_multiply] = xrf[columns_to_multiply] * 10000 xrf.head() .. raw:: html
HOLEID SAMPFROM Fe_pXRF_pct Zn_pXRF_pct Pb_pXRF_pct S_pXRF_pct
0 CR-1_767mEOH 5.99 2254.0 29.0 7.0 383.0
1 CR-1_767mEOH 8.99 15387.0 34.0 6.0 43153.0
2 CR-1_767mEOH 11.99 458.0 9.0 6.0 8947.0
3 CR-1_767mEOH 14.99 1165.0 16.0 12.0 209.0
4 CR-1_767mEOH 17.99 1996.0 22.0 7.0 0.0


.. GENERATED FROM PYTHON SOURCE LINES 681-682 Rename the xrf columns .. GENERATED FROM PYTHON SOURCE LINES 682-685 .. code-block:: Python xrf = xrf.rename(columns={'HOLEID': 'Hole', 'SAMPFROM': 'From', 'Fe_pXRF_pct': 'Fe', 'Zn_pXRF_pct': 'Zn', 'Pb_pXRF_pct': 'Pb', 'S_pXRF_pct': 'S'}) xrf.head() .. raw:: html
Hole From Fe Zn Pb S
0 CR-1_767mEOH 5.99 2254.0 29.0 7.0 383.0
1 CR-1_767mEOH 8.99 15387.0 34.0 6.0 43153.0
2 CR-1_767mEOH 11.99 458.0 9.0 6.0 8947.0
3 CR-1_767mEOH 14.99 1165.0 16.0 12.0 209.0
4 CR-1_767mEOH 17.99 1996.0 22.0 7.0 0.0


.. GENERATED FROM PYTHON SOURCE LINES 686-687 Sort by From .. GENERATED FROM PYTHON SOURCE LINES 687-690 .. code-block:: Python xrf5 = xrf[xrf.Hole == 'TC-1319-005'].sort_values('From') xrf5.head() .. raw:: html
Hole From Fe Zn Pb S
601 TC-1319-005 3.99 9369.0 9.0 0.0 2435.0
602 TC-1319-005 6.99 5587.0 25.0 0.0 1339.0
603 TC-1319-005 9.99 41980.0 51.0 0.0 16457.0
604 TC-1319-005 12.99 24620.0 104.0 22.0 13974.0
605 TC-1319-005 15.99 30758.0 67.0 26.0 10547.0


.. GENERATED FROM PYTHON SOURCE LINES 691-692 Upper portion of xrf5 .. GENERATED FROM PYTHON SOURCE LINES 692-694 .. code-block:: Python xrf5[xrf5.From <= 400].describe() .. raw:: html
From Fe Zn Pb S
count 133.000000 133.000000 133.000000 133.000000 133.000000
mean 201.990000 12152.646617 79.812030 2.037594 4699.947368
std 115.613581 9226.369163 105.763899 4.904244 6196.147283
min 3.990000 1159.000000 7.000000 0.000000 0.000000
25% 102.990000 4603.000000 22.000000 0.000000 1425.000000
50% 201.990000 9701.000000 43.000000 0.000000 2668.000000
75% 300.990000 16337.000000 80.000000 0.000000 6037.000000
max 399.990000 47995.000000 594.000000 27.000000 52145.000000


.. GENERATED FROM PYTHON SOURCE LINES 695-696 Elements in xrf5 .. GENERATED FROM PYTHON SOURCE LINES 696-698 .. code-block:: Python elem_list2 = ['Zn', 'Pb', 'Fe', 'S'] .. GENERATED FROM PYTHON SOURCE LINES 699-700 XRF plots .. GENERATED FROM PYTHON SOURCE LINES 700-720 .. code-block:: Python plt.figure(figsize=(15, 8)) for element in elem_list2: plt.subplot(1, 4, (elem_list2.index(element) + 1)) plt.plot(xrf5[element], xrf5.From) plt.axvline(x=np.mean(xrf5[element]), ls='--', c='r', label='mean') # plt.scatter(xrf5[element][200], xrf5.From[200], label='DPAL', c='k', zorder=3) plt.axhspan(272, 400, color='r', alpha=0.2, label='AIPD') plt.ylim(1100, -20) if elem_list.index(element) == 0: plt.ylabel('Depth (m)') plt.legend() plt.xlabel(f'{element} (ppm)') plt.grid() plt.suptitle('Hole TC-1319-005 Legacy XRF\n AIPD: Anomalous Interval in the Petrophysical Data', y=1) plt.tight_layout() # plt.savefig('Output/xrf.png') plt.show() .. image-sg:: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_016.png :alt: Hole TC-1319-005 Legacy XRF AIPD: Anomalous Interval in the Petrophysical Data :srcset: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_016.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 721-725 Magnetic Susceptibility ^^^^^^^^^^^^^^^^^^^^^^^ This section compares the magnetic susceptibility in the legacy and the latest petrophysical data. In the case of hole TC-1319-005 we can see that the magnetic susceptibility in both measurements have very similar medians. The magnetic susceptibilities, new and legacy, of hole TC-3660-008 have the same medians, but it has an anomaly in the petrophysical data that is above the measurement limit of the KT-20 device (2000 x $10^3$ SI), therefore this anomalous value is going to be deleted. Load magnetic susceptibility data .. GENERATED FROM PYTHON SOURCE LINES 725-727 .. code-block:: Python mag = pd.read_excel(f'{base_path}/collinstown_MagSUS.xlsx') .. GENERATED FROM PYTHON SOURCE LINES 728-730 Magnetic Susceptibility Analysis for Hole TC-1319-005 ------------------------------------------------------ .. GENERATED FROM PYTHON SOURCE LINES 730-760 .. code-block:: Python mag5 = mag[mag.HOLEID == 'TC-1319-005'].sort_values('SAMPFROM') petro5 = df[df.Hole == 'TC-1319-005'].sort_values('From') plt.figure(figsize=(10, 6)) plt.subplot(121) plt.plot(mag5['MS_MagSus'], mag5['SAMPFROM'], label='Legacy') plt.scatter(petro5['Mag'], petro5['From'], c='k', label='Petrophysic', s=8) plt.ylim(1100, 0) plt.legend(loc='lower right') plt.xlabel('Mag Sus') plt.ylabel('Depth (m)') plt.grid() plt.subplot(122) plt.plot(np.log10(mag5['MS_MagSus']), mag5['SAMPFROM'], label='Legacy') plt.axvline(x=np.median(np.log10(mag5['MS_MagSus'])), ls='--', label='Median Legacy') plt.scatter(np.log10(petro5['Mag']), petro5['From'], c='k', label='Petrophysic', s=8) plt.axvline(x=np.median(np.log10(petro5['Mag'])), ls='--', c='k', label='Median Petrophysic') plt.ylim(1100, 0) plt.legend(loc='lower right') plt.xlabel('Log(Mag Sus)') plt.ylabel('Depth (m)') plt.grid() plt.suptitle('TC-1319-005 Magnetic Susceptibility') plt.tight_layout() plt.show() .. image-sg:: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_017.png :alt: TC-1319-005 Magnetic Susceptibility :srcset: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_017.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none /home/leguark/.virtualenvs/gempy_dependencies/lib/python3.10/site-packages/pandas/core/arraylike.py:399: RuntimeWarning: divide by zero encountered in log10 .. GENERATED FROM PYTHON SOURCE LINES 761-763 Magnetic Susceptibility Analysis for Hole TC-3660-008 ------------------------------------------------------ .. GENERATED FROM PYTHON SOURCE LINES 763-793 .. code-block:: Python mag8 = mag[mag.HOLEID == 'TC-3660-008'].sort_values('SAMPFROM') petro8 = df[df.Hole == 'TC-3660-008'].sort_values('From') plt.figure(figsize=(11, 7)) plt.subplot(121) plt.plot(mag8['MS_MagSus'], mag8['SAMPFROM'], label='Legacy') plt.scatter(petro8['Mag'], petro8['From'], c='k', label='Petrophysic', s=8) plt.ylim(900, 0) plt.xlabel('Mag Sus') plt.ylabel('Depth (m)') plt.legend(loc='lower right') plt.grid() plt.subplot(122) plt.plot(np.log10(mag8['MS_MagSus']), mag8['SAMPFROM'], label='Legacy') plt.axvline(x=np.median(np.log10(mag8['MS_MagSus'])), ls='--', label='Median Legacy') plt.scatter(np.log10(petro8['Mag']), petro8['From'], c='k', label='Petrophysic', s=8) plt.axvline(x=np.nanmedian(np.log10(petro8['Mag'])), ls='--', c='k', label='Median Petrophysic') plt.ylim(900, 0) plt.xlabel('Log(Mag Sus)') plt.ylabel('Depth (m)') plt.legend(loc='lower right') plt.grid() plt.suptitle('TC-3660-008 Magnetic Susceptibility') plt.tight_layout() plt.show() .. image-sg:: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_018.png :alt: TC-3660-008 Magnetic Susceptibility :srcset: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_018.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 794-796 Gamma Ray and Formation Analysis for Hole TC-3660-008 ------------------------------------------------------ .. GENERATED FROM PYTHON SOURCE LINES 796-846 .. code-block:: Python gr = pd.read_csv(f'{base_path}/collinstown_Gamma.csv') gr8 = gr[gr.HOLEID == 'TC-3660-008'].sort_values('DEPTH') # Depth of tops or markers for formations tops = pd.DataFrame(petro8.From[petro8['Form'].ne(petro8['Form'].shift())]) tops['Top'] = petro8.Form[petro8['Form'].ne(petro8['Form'].shift())] tops = tops.reset_index(drop=True) # Colors of the formations tops['color'] = pd.Series(['#CCDF9A', '#FF0000', '#CCDF9A', '#D9E9F0', '#FF0000', '#D9E9F0', '#FF0000', '#D9E9F0', '#FF0000', '#8CBAE2', '#8A6E9F', '#ABA16C', '#EBDE98', '#806000', '#2A7C43', '#FF0000']) # Add base of the deepest formation new_row = pd.DataFrame([{'From': 833.4, 'Top': '', 'color': '#FF0000'}]) tops = pd.concat([tops, new_row], ignore_index=True) tops = tops.rename(columns={'From': 'depth', 'Top': 'name'}) tops_list = geo.plot_tops(tops) gr8 = gr8.sort_values('DEPTH') # Plot of GR with Formations plt.figure(figsize=(10, 8)) plt.subplot(121) plt.plot(gr8.iloc[:,6], gr8.iloc[:,5], label='Data', c='lightblue') # plt.plot(gr8.iloc[:,6].rolling(window=250).mean(), gr8.iloc[:,5], label='Mean 400') # Butterworth filter b, a = butter(N=2, Wn=0.02, btype='low') filtered_data = filtfilt(b, a, gr8.iloc[:, 6]) plt.plot(filtered_data, gr8.iloc[:, 5], label='Butterworth Filtered') plt.grid() plt.xlabel('GR (API)') plt.xlabel('Depth (m)') plt.axis([0, 120, 850, 0]) plt.subplot(122) for i in range(0, len(tops_list)): plt.axhspan(tops_list[i]['top'], tops_list[i]['base'], color=tops_list[i]['color']) plt.text(122, tops_list[i]['top'], tops_list[i]['name'], fontsize=9, va='center') plt.axis([0, 120, 850, 0]) plt.xticks([]) plt.xlabel('Formations') plt.suptitle('GR and Formations in Hole TC-3660-008', y=0.93) plt.savefig('Output/gr_form.png') plt.show() .. image-sg:: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_019.png :alt: GR and Formations in Hole TC-3660-008 :srcset: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_019.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 847-850 Passive Vs ---------- Parallel to the petrophysical measurements, a passive seismic survey was carried out between Collinstown and Kells. From this survey a Vs was obtained which can be compared with that recorded during the petrophysical measurements. Despite being measurements of a very different nature, the Vs of the key holes are in the range of values of the passive Vs. Refering to the trends, hole TC-1319-005, inside the survey, shows a contrary trend as the passive Vs. Paradoxically, Hole TC-3660-008, outside the survey, shows a trend that coincides with that of the passive Vs. .. GENERATED FROM PYTHON SOURCE LINES 852-853 Load passive Vs data for hole TC-3660-008 .. GENERATED FROM PYTHON SOURCE LINES 853-866 .. code-block:: Python pas = pd.read_csv( filepath_or_buffer=f'{base_path}/tc-3660-008_Vs.txt', sep=' ', skiprows=1, names=['depth_km', 'vs_km/s'] ) # Convert depth and velocity from km to m pas[['depth_km', 'vs_km/s']] = pas[['depth_km', 'vs_km/s']] * 1000 # Rename columns for easier access pas = pas.rename(columns={'depth_km': 'depth', 'vs_km/s': 'vs'}) .. GENERATED FROM PYTHON SOURCE LINES 867-869 Passive Vs Statistical Parameters --------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 869-871 .. code-block:: Python print(f"Basic statistics for Passive Vs:\n{pas['vs'].describe()}") .. rst-class:: sphx-glr-script-out .. code-block:: none Basic statistics for Passive Vs: count 20.000000 mean 3001.458200 std 219.007636 min 2595.905000 25% 2851.969250 50% 2937.861000 75% 3241.223000 max 3291.111000 Name: vs, dtype: float64 .. GENERATED FROM PYTHON SOURCE LINES 872-874 Passive Vs Plots ---------------- .. GENERATED FROM PYTHON SOURCE LINES 874-877 .. code-block:: Python basic_stat.plots(pas.vs, 'Passive Vs') .. image-sg:: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_020.png :alt: Passive Vs Statistical Plots :srcset: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_020.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 878-880 Vs Comparison Plot ------------------ .. GENERATED FROM PYTHON SOURCE LINES 880-919 .. code-block:: Python plt.figure(figsize=(5, 8)) # Vs at TC-1319-005 plt.plot(petro8.Vs[petro8.Vs >= 1000], petro8.From[petro8.Vs >= 1000], label='TC-1319-005', c='salmon') # Apply Butterworth filter to TC-1319-005 Vs b, a = butter(N=2, Wn=0.2, btype='low') vs_butter = filtfilt(b, a, petro8.Vs[petro8.Vs >= 1000]) plt.plot( vs_butter, petro8.From[petro8.Vs >= 1000], label='Butterworth Filter - TC-1319-005', c='crimson' ) # Vs at TC-3660-008 plt.plot(petro5.Vs[petro5.Vs >= 1000], petro5.From[petro5.Vs >= 1000], label='TC-3660-008', c='lightgreen') # Apply Butterworth filter to TC-3660-008 Vs vs_butter = filtfilt(b, a, petro5.Vs[petro5.Vs >= 1000]) plt.plot( vs_butter, petro5.From[petro5.Vs >= 1000], label='Butterworth Filter - TC-3660-008', c='green' ) # Passive seismic Vs plt.scatter(pas.vs, pas.depth, c='k', label='Passive Survey', s=8, zorder=2) plt.axis([1000, 4000, 2000, 0]) plt.ylabel('Depth (m)') plt.xlabel('Vs (m/s)') plt.title('Vs Comparison') plt.legend(loc='lower left') plt.grid() plt.savefig('Output/vs.png') plt.show() .. image-sg:: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_021.png :alt: Vs Comparison :srcset: /examples/05_Geo/images/sphx_glr_eda_integration_delivery_1_021.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 920-944 Observations on Anomalies ------------------------- The following observations summarize the findings for each feature in the petrophysical dataset. They are based on the findings made during the exploratory data analysis (EDA) and the integration with additional data, which was crucial in assessing the real nature of the anomalies in the petrophysical features. These observations establish the steps to follow, which involve the elimination of some anomalies, while others are maintained. +---------+-------------------------------------------------------------------------------------------------------------------------------------+ | Feature | Observations on anomalies | +=========+=====================================================================================================================================+ | Den | The presence of abnormal densities requires checking all available data before discarding such values. Additional data, including | | | stratigraphic columns, whole rock geochemistry, and XRF, indicate an absence of massive sulfides in the upper portion of TC-1319-005| +---------+-------------------------------------------------------------------------------------------------------------------------------------+ | Vp | Values below 3000 should be eliminated, as these are linked to fractures in the samples. | +---------+-------------------------------------------------------------------------------------------------------------------------------------+ | Vs | Values of 0 must be eliminated as they are not physically meaningful for solid samples. These values often coincide with anomalous | | | Vp values, indicating fractures. | +---------+-------------------------------------------------------------------------------------------------------------------------------------+ | Mag | The anomaly of 93.9 should be deleted, as it exceeds the KT-20 measurement range. | +---------+-------------------------------------------------------------------------------------------------------------------------------------+ | Ip | No strong reasons to discard outliers. | +---------+-------------------------------------------------------------------------------------------------------------------------------------+ | Res | No strong reasons to discard outliers. | +---------+-------------------------------------------------------------------------------------------------------------------------------------+ .. GENERATED FROM PYTHON SOURCE LINES 946-956 Next Steps for Data Cleaning ---------------------------- The next step is part of what is known as data mining, which involves cleaning the false anomalies and filling the gaps in the data. Specific steps will be: * Record the positions of the original NaNs. * Record the positions and values of the anomalies to be deleted. * Delete the anomalies. * Fill all NaNs, both original and those resulting from anomaly deletions, using different methods (linear, nonlinear models, imputations, ML models). * Compare the effectiveness of different gap-filling methods. * Use the best option to fill in gaps and deliver corrected petrophysical data for further investigation. .. GENERATED FROM PYTHON SOURCE LINES 958-964 References ---------- - https://www.engineeringtoolbox.com/sound-speed-water-d_598.html - https://www.eoas.ubc.ca/courses/eosc350/content/foundations/properties/magsuscept.htm#:~:text=In%20rocks%2C%20susceptibility%20is%20mainly,trace%20amounts%20in%20most%20sediments. - Reynolds, J. 2011. An Introduction to Applied and Environmental Geophysics. John Wiley & Sons. 2nd Edition. 696 p. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 9.015 seconds) .. _sphx_glr_download_examples_05_Geo_eda_integration_delivery_1.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: eda_integration_delivery_1.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: eda_integration_delivery_1.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_