Ingestion

Source Format

The reader is assumed to be familiar with the SEG-Y format specification, and the terms “trace”, “trace header”, and text/binary header are to be interpreted as the specification.

Seismic data is generally loaded into CDF by ingesting a SEG-Y file stored in a Cloud Storage bucket. The SEG Y file to be ingested must contain post-stack seismic data where each trace has a Common Depth Point location.

Trace data from the SEG Y file may be loaded into a CDF-managed Bigtable instance for improved random access performance (often referred to as Tier 1 storage), or left in the cloud storage bucket to reduce costs.

Regardless of the storage tier, the CDF Seismic ingestion process attempts to interpret the file and trace headers to identify the spatial location of each trace. The CDP-X and CDP-Y header fields are used to identify the spatial location of a trace. For 3d files, the inline and crossline number fields are used to identify the location of the trace on the processing compute grid. For 2d files, the cdp number is used to identify the location of traces.

It is worth noting that CDF only accepts one trace for each point on the compute grid for 3d files, ie. inline and crossline pairs must be unique for each trace in a file.

For 2d files, the cdp number must be unique for each trace, but other trace headers such as the energy source point number can contain duplicates.

Prerequisites

For ingesting a file into CDF, there are certain prerequisites and expectations about the data, as listed below.

  • The file must be stored in a Google Cloud Storage bucket that CDF has been given access to read from.
  • The file’s text header must be in either ASCII or EBCDIC format.
  • The file’s binary header must be correct for the sample format, the sample count, and the trace count.
  • Each trace in the file must have at least one sample.
  • All traces in the file must have the same number of samples.
  • Any relevant trace header values must either appear in the expected position in the trace header according to the SEG-Y Revision 1 Specification, or be explicitly overriden when registering the file.
  • All CDP-X values (when scaled) must be between -500,000 and 2,000,000 (inclusive).
  • All CDP-Y values (when scaled) must be between -500,000 and 10,000,000 (inclusive).
  • CDP-X and CDP-Y trace header fields must be valid UTM grid coordinates and must match the CRS provided when registering the file.
  • CDP-X and CDP-Y trace header fields must be scaled by a constant factor (by default this uses the source group scalar field in the trace header).

The following prerequisites and expectations are specific to 3d files:

  • Each trace must have a unique (Inline, Crossline) pair.
  • Traces in the file are ideally sorted by either Inline or Crossline. Unsorted files are accepted up to a complexity limit when indexing and may affect retrieval times.
  • All Inline and Crossline trace header values must be between 1 and 1,000,000 (inclusive).
  • If only a single subline in the file is detected (i.e all the inline values are the same or all the crossline values are the same), no transformation matrix will be built. Additionally, the coverage generated for the file will be the result of concatenating all CDP-X/Y positions into a simplified line. This single subline support was developed before the addition of 2d file support. To generate a transformation matrix and coverage for a single subline file, you may want to consider re-ingesting the file as 2d.

The following prerequisites and expectations are specific to 2d files:

  • Each trace must have a unique CDP trace header value in order to generate coverage.
  • There must be at least 2 traces with unique CDP trace header values in the file in order to generate a coverage for the file.
  • Trace headers other than CDP that are targetted for indexing as part of tier 2 storage are allowed to contain duplicate values between traces. However, when retrieving traces by a duplicate trace header value, there is no built-in support for distinguishing between them.

Configuration

Offsets

When ingesting a file, the seismic ingestion process parses values such as the Inline, Crossline, CDP_X and CDP_Y from each trace’s header to build condensed representations of the file. By default, the ingestion process will look for these values at the offsets defined in the SEG Y Revision 1 Specification. However, many files define these values at a different offset in the trace headers. Trace header offsets that do not match the layout of the trace headers in the file is one of the main causes of failed ingestion jobs. Below is a table of trace header fields that can currently be overriden when registering a file to be ingested, along with the default offset if no custom offset is specified.

Trace Header Offsets
Trace Header Default Offset Expected Format
Energy source point 16 4 byte signed integer
CDP 20 4 byte signed integer
CDP_X 180 4 byte signed integer
CDP_Y 184 4 byte signed integer
Inline 188 4 byte signed integer
Crossline 192 4 byte signed integer

Scalars

For CDP_X and CDP_Y trace header values, after reading the value at the defined offset in the trace header, a scalar is applied to determine the final coordinate value. By default, the scalar value used is the source group scalar (defined at position 70 in the trace header). If the source group scalar value is 0, the CDP_X/Y value is not modified. If the source group scalar is not 0, then we return the CDP_X/Y value divided by the absolute value of the source group scalar (always positive).

The behaviour defined above for dividing by the absolute value of the source group scalar when the scalar is greater than 0 diverges from the SEG-Y standards, where it considers a positive scalar as being a multiply operation. To achieve the original multiply operation for the scalar, a source group scalar override can be defined when registering or editing the file. This source group scalar override value will always be multiplied against the original CDP_X/Y value regardless of it being positive or negative (or 0).

An incorrect source group scalar can be a common cause for failed ingestion jobs, where the described error is a CDP_X/Y value that is out of bounds.

Ingestion Errors

When the ingestion worker fails to process a file, an error message will be given in the job’s status. Below is a table of typical error messages that the ingestion worker may return for your file along with a brief explanation of why it failed.

Ingestion Errors
Error message Explanation
Ingestion failed. Please contact support and provide these details: [file_id=<File Id>] [job_id=<Job Id>] The ingestion worker was unable to detect a typical problem when attempting to ingest the file, or an internal error ocurred.
Failed to retrieve file <File Name> with path <File Path>. Please check that the file was registered correctly. The ingestion worker failed to retrieve the file from the google cloud storage bucket defined during registration. This could be caused by either the file being registered with the incorrect file name, or the underlying file in the google cloud storage bucket being moved/deleted.
Failed to retrieve file <File Name> with path <File Path> due to an unknown error. Please contact support. The ingestion worker failed to retrieve the file from the google cloud storage bucket defined during registration but not due to an issue such as an incorrect file name.
Failed to retrieve binary header. Please contact support. The ingestion worker found the file in the google cloud storage bucket defined during registration, but was unable to parse the binary header from the specified file.
Failed to retrieve text header. Please contact support. The ingestion worker found the file in the google cloud storage bucket defined during registration, but was unable to parse the text header from the specified file.
Failed to delete existing traces. Please contact support. The ingestion worker will check if the file being ingested was previously ingested to tier one storage. If it was, the ingestion worker will first delete any existing traces in the tier one storage before re-ingesting.
An error occurred while trying to parse the binary header. Please contact support. The ingestion worker found the file but failed to ingest it due to an underlying issue with the binary header.
Failed to stream traces from file. Please contact support. The ingestion worker found the file but failed to retrieve any traces from it. This could be due to an underlying issue with the binary header.
Failed to compress trace before persisting. Please contact support. The ingestion worker encountered a trace which was valid, but could not be encoded to the correct format for tier one storage. This should be escalated to support.
Failed to persist traces. Please contact support. The ingestion worker encountered valid traces and successfully encoded them, but failed to persist them to the underlying tier one storage.
Failed to process trace due to it having no samples. Please check that the file is registered correctly. The ingestion worker did not find any samples in the file while processing traces. This could be due to the binary header in the file having incorrect sample count information.
Failed to process trace <Trace> due to its cdp_x value being outside the acceptable range. Easting values must be between -500,000 and 2,000,000. Please check that the configured cdp_x offset for this file is correct. The ingestion worker assumes that all cdp_x coordinates are UTM coordinates. When processing 3d files, the ingestion worker expects all cdp_x coordinates to have a value between -500,000 and 2,000,000. We accept negative cdp_x coordinates down to -500 kilometers to allow for files that encompass multiple UTM zones or the equator. One reason the cdp_x value may be rejected by the ingestion worker is if the offset defined for the cdp_x coordinates is incorrect and that the ingestion worker is interpreting a different trace header field as the cdp_x. Another possibility is that the ingestion worker is applying the incorrect scalar to the cdp_x coordinates found in the file. See the configuration section above for more details.
Failed to process trace <Trace> due to its cdp_y value being outside the acceptable range. Northing values must be between -500,000 and 10,000,000. Please check that the configured cdp_y offset for this file is correct. The ingestion worker assumes that all cdp_y coordinates are UTM coordinates. When processing 3d files, the ingestion worker expects all cdp_y coordinates to have a value between -500,000 and 10,000,000. We accept negative cdp_y coordinates down to -500 kilometers to allow for files that encompass multiple UTM zones or the equator. One reason the cdp_y value may be rejected by the ingestion worker is if the offset defined for the cdp_y coordinates is incorrect and that the ingestion worker is interpreting a different trace header field as the cdp_y. Another possibility is that the ingestion worker is applying the incorrect scalar to the cdp_y coordinates found in the file. See the configuration section above for more details.

The following errors are specific to 3d files.

3d Ingestion Errors
Error message Explanation
Failed to process trace <Trace> due to its inline/crossline value being outside the acceptable range. Inline/Crossline values must be between 1 and 1,000,000. Please check that the configured inline/crossline offset for this file is correct. When processing 3d files, the ingestion worker expects all traces’ inlines/crosslines to have a value between 1 and 1,000,000. If the value is outside of these bounds, it’s possible that the offset defined for inlines/crosslines in this file is incorrect and that the ingestion worker is interpreting a different trace header field as the inline/crossline.
Failed to process trace <Trace> due to its inline/crossline values conflicting with a previous trace. Duplicate inline/crossline values may have been found if the file has incorrect inline or crossline header offsets. Please check that the offsets for the file are correct. When processing 3d files, the ingestion worker expects all traces to have unique inline and crossline trace header values. If duplicate inline/crossline values are found in the file then we can not successfully ingest the given file. A common reason for this error showing up is due to the offsets being incorrect, resulting in the ingestion worker interpreting the inline/crossline values as 0 (from some undefined fields in the trace headers).

The following errors are specific to 2d files.

2d Ingestion Errors
Error message Explanation
Failed to process trace <Trace> due to its cdp value conflicting with a previous trace. Duplicate cdp values may have been found if the file has an incorrect cdp header offset. Please check that the offsets for the file are correct. Each trace in the file must have a unique CDP value in order to build the correct geometry. Typically, cdp values simply increment in value for every trace, so an error about duplicate CDP values may imply that the CDP offset for the file is incorrect.
Not enough unique cdp values were found when processing the file to be able to calculate the 2d geometry. Expected at least 2 points, but found only <N>. Please check that the offsets for the file are correct. We require at least 2 traces with unique CDP values to be in the file in order to calculate the geometry. If the file is expected to have at least 2 points, please check that the binary header for the file is correct.

Troubleshooting

Incorrect Geometry Coordinates

A processed file’s geometry may have an incorrect geometry if the CRS registered with the file is not correct. For example, if the file’s geometry is off by 6 degrees longitude, this would imply that the UTM zone off by 1.

Another possibility is that the offsets for CDP_X/Y are flipped in the trace headers. Ensure that the offsets of the CDP_X/Y trace headers in the file are correct.

Incorrect Geometry Scale

A processed file’s geometry may have the correct shape but incorrect scale if the source group scalar used on the file’s CDP_X/Y coordinates is incorrect.

Duplicate Inline/Crossline/CDP values

If the file is rejected for having duplicate Inline/Crossline/CDP values, check that the file’s trace header offsets are correct.

CDP X/Y coordinates out of range

If the file is rejected for having CDP_X/Y coordinates outside of the accepted range, check that the file’s trace header offsets are correct, and that the file’s source group scalar is correct.

No Traces/Samples found

If the ingestion worker failed to find any traces in the file, or any samples in the found traces, check that the binary header for the file has the correct information, such as the trace format and sample count (These are expected to be found in the binary header at the offsets defined in the SEG Y Revision 1 Specification).

Incorrect Sample Data

If the samples retrieved from CDP after ingestion are incorrect, check that the trace format in the binary header of the original file is correct. When processing the file, the ingestion worker will parse all samples to match this given trace format.

Ingestion logs

After starting the ingestion of a file, the status and logs of the ingestion job can be viewed via either the job id, the file id or the seismic-store id.

If a file is being re-ingested, the status returned will always be from the newest ingestion job.

Retrieving the status and logs of an ingestion job via the seismic-store id can only be done on re-ingestion.

# Get ingestion job status via job id
client.job.status(job_id="e5707588-adf2-4d2a-8c2f-f782ca2f0880")

# Get ingestion job status via file id
client.job.status(file_uuid="example.sgy")

Example output

Logs are returned and printed in ascending order:

>>> job = list(client.job.status(job_id="a6c628d1-002b-4f8d-87d7-cf897c8f7157"))[0]
>>> print(job)
JobStatus<job_id: a6c628d1-002b-4f8d-87d7-cf897c8f7157,
  file_uuid: 9d1b09f6-5f0f-4ae8-8b19-664e55471d53,
  status: SUCCESS, target_storage_tier_name: tier2_cloudstorage,
  started_at: 2022-04-08 09:44:17,
  updated_at: 2022-04-08 09:46:20,
  logs: 16 entries>
>>> for log in job.logs:
...   print(log)
...
"2022-04-08 09:44:26: Starting ingestion for file_id "9d1b09f6-5f0f-4ae8-8b19-664e55471d53""
"2022-04-08 09:44:26: Retrieving seismic store"
"2022-04-08 09:44:26: Retrieving binary header"
"2022-04-08 09:44:26: Retrieving text header"
"2022-04-08 09:44:27: Persisting file headers"
"2022-04-08 09:44:27: Persisting text header"
"2022-04-08 09:44:27: Persisting binary header"
"2022-04-08 09:44:27: Processing 3d traces"
"2022-04-08 09:46:20: File scanned. Found grid with bounds Some(MajorMinorBounds { major_min: 10187, major_max: 13829, major_step: 1, minor_min: 10198, minor_max: 16741, minor_step: 1 }), total 14716366 traces, 77172623304 bytes, in 112 seconds"
"2022-04-08 09:46:20: Computing and persisting 3d grid"
"2022-04-08 09:46:20: Computing and persisting coverage"
"2022-04-08 09:46:20: Computing 3D coverage. If the file is supposed to be 2D, please check that the file is registered correctly, such as correct trace header offsets."
"2022-04-08 09:46:20: Computing and persisting volume definitions"
"2022-04-08 09:46:20: Computing and persisting trace index offsets"
"2022-04-08 09:46:20: Publishing seismic store with id 6926840200551363"
"2022-04-08 09:46:20: Ingestion complete"