Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions examples/data/borehole_canonical/assays.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
id,top,base,Cu,Au
well_01,0,10,0.5,0.1
well_01,10,20,1.2,0.3
well_01,20,30,0.8,0.2
well_02,0,15,0.2,0.05
well_02,15,40,1.5,0.4
3 changes: 3 additions & 0 deletions examples/data/borehole_canonical/collars.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
id,x,y,z
well_01,100,200,50
well_02,150,250,60
5 changes: 5 additions & 0 deletions examples/data/borehole_canonical/lithology.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
id,top,base,component lith
well_01,0,15,Sandstone
well_01,15,30,Shale
well_02,0,10,Gravel
well_02,10,30,Granite
9 changes: 9 additions & 0 deletions examples/data/borehole_canonical/survey.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
id,md,inc,azi
well_01,0,180,0
well_01,10,180,0
well_01,20,170,10
well_01,30,160,20
well_02,0,180,0
well_02,10,180,0
well_02,20,175,0
well_02,30,170,0
329 changes: 329 additions & 0 deletions subsurface/core/geological_formats/boreholes/GUIDE.md

Large diffs are not rendered by default.

169 changes: 169 additions & 0 deletions subsurface/core/geological_formats/boreholes/WELLS_DOC.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
# 1. Overview of Borehole Data

When working with borehole (or well) data, there are typically three major components:

1. **Collars**
- **What it represents**: The collar is essentially the “top” of the borehole, including the surface coordinates where it starts.
- **Typical data**: This includes borehole/well ID, and the x,y,z coordinates of the collar.
2. **Survey**
- **What it represents**: The survey data describes the trajectory (deviation) of the borehole from the collar downward.
- **Typical data**: Measured depth (MD), azimuth, and dip data along the well trajectory.
3. **Attributes**
- **What it represents**: Interval-based or point-based properties measured along the borehole. This can include geochemistry, geophysical logs, lithologies, or any other properties that vary with depth.
- **Typical data**: The well ID (to match with collars), a *from* depth, a *to* depth, and the actual attribute values (e.g., lithology type, geochem assays, etc.).

### Why Three Separate Files?

The CSVs are commonly split so that each dataset (collars, survey, attributes) captures a different aspect of the borehole. Splitting the data makes it easier to manage and update individually. LiquidEarth merges these datasets internally by matching the well IDs (collar IDs) and by referencing the same depth ranges or measured depths in the survey and attributes.

### Special Note on Lithologies

If the attributes being read are lithologies, LiquidEarth may treat them differently—for instance, by using different internal logic or classification schemes to display them in 3D, or by requiring specific naming conventions for intervals. The import process is similar, but the resulting data might be displayed or analyzed in a different manner (e.g., color-coded strata intervals in a 3D model).

---

# 2. Arguments

Guide describing all the key arguments and parameters you’ll encounter when setting up borehole imports (Collars, Surveys, and Attributes) in LiquidEarth. The guide is split into two sections:

1. **High-Level Import Settings** (what you see directly in the import dialog).
2. **Detailed Reader Settings** (advanced options controlling how each CSV file is read via pandas).

## 2.1. High-Level Import Settings

These fields determine what data you’re loading and how LiquidEarth should handle it at a conceptual level.

1. **Content Name**
- **What it is**: A short label for your imported dataset (e.g., “Spring2025 Geochem”).
- **Why it matters**: Helps you identify the dataset in LiquidEarth once it’s loaded.
2. **Collar File**
- **What it is**: A selection among the files you have uploaded to your project that contains the collar (starting) positions of your wells/boreholes.
- **Typical columns**:
- Well/borehole ID.
- X, Y, Z coordinates (the collar location).
3. **Survey File**
- **What it is**: A selection among the uploaded files that contains the well trajectory data.
- **Typical columns**:
- Measured Depth (MD).
- Azimuth.
- Dip.
4. **Attrs File**
- **What it is**: A selection among the uploaded files that contains attributes (interval-based or point-based data), such as geochemistry or lithology.
- **Typical columns**:
- Well/borehole ID.
- From depth.
- To depth.
- Attribute values (e.g., lithology type, assay values, etc.).
5. **Number Nodes per Well**
- **What it is**: The number of sampling points per well used in the 3D trajectory reconstruction.
- **Why it matters**: More nodes can yield smoother trajectories but requires more computation.
- **Example**: Setting this to **10** (as in the example) means each well will be split into 10 segments for 3D visualization.
6. **Enable Interval Nodes**
- **What it is**: A flag indicating whether nodes should also be created at each depth interval in the attribute file.
- **Why it matters**: Useful if you want explicit 3D nodes where attribute changes occur (e.g., lithology boundaries).
7. **Is Lith Attr**
- **What it is**: A checkbox (or toggle) to indicate that the attribute data is lithological.
- **Why it matters**: If set, LiquidEarth may apply special rules for interval merging, default lithology handling, or color-coding in 3D.

---

## 2.2. Detailed Reader Settings

For each file type (Collar, Survey, Attributes), you have advanced options—**Reader Settings**—that control how the file is parsed. Under the hood, LiquidEarth uses something similar to Python’s pandas.read_csv() function. These settings let you handle various CSV formats, encodings, column mappings, etc.

1. **Uploaded File**
- **What it is**: Links to the file selected above (Collar, Survey, or Attrs).
- **Why it matters**: Ensures the reader knows which physical file to parse.
2. **Encoding** (Default: `ISO-8859-1`)
- **What it is**: The text encoding used to read the file (e.g., `UTF-8`, `ISO-8859-1`).
- **Why it matters**: If your file contains special characters (like accents), matching the correct encoding is crucial for readable data.
3. **Index Columns** (Default: `null`)
- **What it is**: Tells the reader which column (by name or index) should become the DataFrame’s index.
- **Why it matters**: Setting an index can make lookups more direct. If not needed, you can leave it at the default (`null`).
4. **Header Row** (Default: `0`)
- **What it is**: Specifies which row in the CSV file is used for column names. For example, `0` means the first row is the header.
- **Why it matters**: If your file doesn’t have a header (i.e., only data rows), you might set this to `None` or `1`.
5. **Separator** (Default: `null`)
- **What it is**: The delimiter that separates columns in your CSV (e.g., `,`, `;`, or `\t`).
- **Why it matters**: If left `null`, LiquidEarth might auto-detect. If that fails, specify the correct delimiter.
- **Tip**: If your file is delimited by semicolons, you might manually set it to `";"`.
6. **Columns to Use** (List of columns, expressed as a semicolon-separated string)
- **What it is**: Tells the reader which columns to include (by name or position).
- **Why it matters**: Helps skip unnecessary columns in large files.
- **Example**: `"0; 2; 5"` means read only columns 0, 2, and 5.
- **Tip**: `null` means “use all columns”. If you only want columns 0 and 2, you might provide `0;2`
7. **Columns Names** (List of new column names, also semicolon-separated)
- **What it is**: Overrides or sets the column names if the file lacks a proper header.
- **Why it matters**: Ensures consistent naming, especially if your CSV has no header row or has inconsistent column names.
- **Example**: `"id; x; y; z"` to explicitly rename columns 0, 1, 2, 3.
8. **Index Map** (Dictionary expressed as `original_name:new_name;...`)
- **What it is**: A mapping of certain column names to serve as an index or partially rename them for indexing.
- **Why it matters**: If you want a specific column to become your DataFrame’s index or unify naming across multiple files.
- **Example**: `"HOLE_ID:id"` means the `HOLE_ID` column is remapped to `id`, potentially used as an index or for unique identification.
9. **Columns Map** (Dictionary expressed as `original_name:new_name;...`)
- **What it is**: A mapping to rename columns (apart from indexing). Critical for aligning your CSV columns with LiquidEarth’s internal naming (e.g., “id”, “x”, “y”, “z”, “md”, “dip”, “azi”, “top”, “base”).
- **Why it matters**: Ensures standardized column naming across different CSV formats.
- **Example**: `"X:x;Y:y;Z:z"` means rename column `X` to `x`, `Y` to `y`, `Z` to `z`.
- **Examples** from the three readers:
- **Collar Reader**:

```json
{
"HOLE_ID": "id",
"X": "x",
"Y": "y",
"Z": "z"
}
```

This means:

- Rename `HOLE_ID` in the CSV to `id`.
- Rename `X` to `x`, `Y` to `y`, and `Z` to `z`.
- **Survey Reader**:

```json
{
"Distance": "md",
"Dip": "dip",
"Azimuth": "azi"
}
```

This means:

- Rename `Distance` to `md` (measured depth).
- Rename `Dip` to `dip`.
- Rename `Azimuth` to `azi`.
- **Attrs Reader**:

```json
{
"HoleId": "id",
"from": "top",
"to": "base"
}
```

This means:

- Rename `HoleId` to `id`.
- Rename `from` to `top` and `to` to `base` for interval depths.
10. **Additional Reader Arguments** (Dictionary in `key:value;...` format)
- **What it is**: A catch-all for extra keyword arguments you might pass to `pandas.read_csv()` (e.g., `parse_dates`, `dtype`).
- **Why it matters**: Gives you advanced control over parsing (handling missing data, data types, date parsing, etc.).
- **Example**: `{}` means none were provided. You might specify something like `{"na_values": "NA"}` if your data uses `"NA"` to represent missing values, or `{"dtype": "float"}` to force numeric columns.

---

# 3. Tips for Setting Up Your Import

1. **Start with Defaults**
- For most straightforward CSV files (comma-delimited, first row as header, standard text encoding), leaving the advanced settings at default often works.
2. **Be Consistent**
- Aim for consistent naming (e.g., always rename the “Hole ID” column to simply “id” in all files). This makes combining data simpler and avoids confusion.
3. **Check Encoding**
- If you see gibberish characters or question marks, you might need to change from `ISO-8859-1` to `UTF-8`, or vice versa.
4. **Map the Key Columns**
- Make sure `Index Map` and `Columns Map` highlight how your “id”, “x”, “y”, “z”, “md”, “from”, and “to” columns are named. This is critical for merging collars, surveys, and attributes correctly.
5. Save and load your configuration for later applications
46 changes: 46 additions & 0 deletions tests/test_io/test_lines/test_canonical_assays.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
import pandas as pd
from subsurface.api.reader.read_wells import read_wells
from subsurface.core.reader_helpers.readers_data import GenericReaderFilesHelper
import os

def test_canonical_assays():
# Define paths
base_path = "examples/data/borehole_canonical/"
collars_path = os.path.join(base_path, "collars.csv")
survey_path = os.path.join(base_path, "survey.csv")
assays_path = os.path.join(base_path, "assays.csv")

# Readers
collars_reader = GenericReaderFilesHelper(file_or_buffer=collars_path)
survey_reader = GenericReaderFilesHelper(file_or_buffer=survey_path)
assays_reader = GenericReaderFilesHelper(file_or_buffer=assays_path)

# Read wells with assays (not lith)
borehole_set = read_wells(
collars_reader=collars_reader,
surveys_reader=survey_reader,
attrs_reader=assays_reader,
is_lith_attr=False,
add_attrs_as_nodes=True,
number_nodes=5,
duplicate_attr_depths=True
)

print("BoreholeSet created successfully from canonical files with assays")
print(f"Boreholes: {borehole_set.survey.ids}")

# Check if assays are present in combined_trajectory
data = borehole_set.combined_trajectory.data.data
print(f"Available attributes: {data.keys()}")

# Verify Cu and Au are in vertex_attr if they were mapped
# Actually read_wells uses read_attributes which returns the whole df if not lith
# then survey.update_survey_with_lith(attrs) is called (it seems it's used for both lith and attrs in read_wells, though the name is misleading)

assert "well_id" in data.vertex_attr.values
# Cu and Au should be there as well
assert "Cu" in data.vertex_attr.values
assert "Au" in data.vertex_attr.values

if __name__ == "__main__":
test_canonical_assays()
41 changes: 41 additions & 0 deletions tests/test_io/test_lines/test_canonical_reading.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import pandas as pd
from subsurface.api.reader.read_wells import read_wells
from subsurface.core.reader_helpers.readers_data import GenericReaderFilesHelper
import os

def test_read_canonical():
# Paths to canonical files
collar_path = "examples/data/borehole_canonical/collars.csv"
survey_path = "examples/data/borehole_canonical/survey.csv"
lith_path = "examples/data/borehole_canonical/lithology.csv"

# Create readers with NO arguments except file path
# GenericReaderFilesHelper has some defaults but we want to see if they match canonical
# We'll use defaults as much as possible
collars_reader = GenericReaderFilesHelper(file_or_buffer=collar_path)
surveys_reader = GenericReaderFilesHelper(file_or_buffer=survey_path)
attrs_reader = GenericReaderFilesHelper(file_or_buffer=lith_path)

# Read wells
borehole_set = read_wells(
collars_reader=collars_reader,
surveys_reader=surveys_reader,
attrs_reader=attrs_reader,
is_lith_attr=True
)

print("BoreholeSet successfully created from canonical files!")
print(f"Boreholes: {borehole_set.collars.ids}")

# Verify some data
assert len(borehole_set.collars.ids) == 2
assert "well_01" in borehole_set.collars.ids
assert "well_02" in borehole_set.collars.ids

# Check if survey was correctly read (inc should be present)
# The read_wells function calls Survey.from_df which corrects angles and sets up trajectories

print("Verification complete.")

if __name__ == "__main__":
test_read_canonical()