Parsing the Past: Using Modern Technology to Visualize Seismic Data

By: Daniel Ashcraft December 28, 2022

Geoscientists and geophysicists gather data from seismic surveys using a file format called "SEG-Y." Seg-Y is a unique file format that most never have to use. Similar to the struggles with parsing most older file formats, it has various problems that engineers must address, such as:

1. Inconsistencies across Seg-Y versions.
2. Older versions have non-standard parsing requirements (i.e. old float parsing).
3. An insular community with proprietary data they are reluctant to share
4. Existing solutions tend to either be inflexible or expensive (sometimes both).

So what do you do when a client asks you to gather, parse, analyze, and visualize something based on a lesser-known or older spec?

Investigate

The first thing we did was found the seg-y format documentation for the latest version, in other words; we read the instruction manual. Afterward, with some general research and googling, we found revision information and file format specifications that told us where to look for various bits and bytes to verify we were on the right track. Then we proceeded to look at open-source solutions and assess the parser's stability, flexibility, and efficiency. This is to prevent us from accidentally doing double work that the community has done at large.

It's important to note that parsers are not created identically; when someone builds open source, there's usually either a personal or economic need, which influences how the parser is optimized. We're explicitly optimizing for visualizations and high-performance data streaming. This means waiting 20 seconds for a 4MB file to parse, is out of the question. The fear of "rolling your own" is always there, but we had to go the custom route with our specific needs in this case.

Technical Spikes

Now's the fun part, time to dig in and see what you get. Luckily I've got a lot of really smart people I know to help out. One of the perks of running a small dev shop and being part of a growing tech community is the network of engineers I have available to help answer some of the more niche questions. I reached out to a referral @mochetts and asked them if they'd be interested in working on this project with the rest of my product team. Luckily, he's flexible and used to fast pace prototyping.

We were able to spike several small solutions and test them in various environments. Mochetts was also able to locate and identify many minor issues. Some older versions of the SEG-Y format were using IBM HFP (IBM 360) floating points, a fascinating problem that the maintainers of Segy-IO were able to help us identify. So we had to dig deep into our bag of tricks; using some bitwise operators in javascript and the help of a local legend Yury, we were able to create a small library for parsing IBM HFP (IBM 360) floating points into something recognizable.

Putting Together the Team (starting product sprints)

After the technical spikes and answering most of the fundamental questions, it was time to assemble a team. First, we solidify ourselves, the core product team, then we look for the Specialists, the type of engineer's that FAANG companies wish they could hire. The kind of engineers your senior engineer friends consider their seniors and admire. In this case, I sought out a team I'd worked with before, the Simiancraft team.

@the-simian and @vantreeseba are two of the best engineers I've ever worked with. They are also heavily into game development, computer science, and building cool stuff. We tapped into that to form the interesting yet weird data visualization duo. True to that reputation, they were immediately able to understand where we were at and where we were planning on going. Older (lesser known) file format: check. Weird floating points: check. Visualizations need to be merged, processed, downsampled, upsampled, scaled, zoomed, and transformed in new and exciting ways, for incredibly smart geophysicists and such: double-check. The team quickly got started and created a small/short visualization spike rendering 1.2M data points in roughly 80ms with color shaders/hue shifts! Holy smokes, folks, that's fast.

Next Steps

In terms of product development, we're still in our infancy. Finding new edge cases every day, putting together the product side of things to make clients happy, and interviewing potential users and customers. For now, we're smooth sailing; it's just a matter of focusing on the correct problems to solve for the right reasons.

My crew and I have a relatively simple ethos that maps across all our projects and relationships.

We build cool stuff.

We make good money.

We have a great time doing it.

software engineering, data visualization, product design