The Challenge: An Undocumented Legacy API
A telecommunications provider needed to integrate their modern service platform with an existing legacy CRM system. The problem was immediately clear: the CRM exposed a SOAP/WSDL API with minimal documentation, no official SDK, and no developer community to speak of. The vendor's documentation consisted of sparse, outdated reference pages that covered fewer than half the available operations -- and the ones that were documented frequently described behaviors that no longer matched reality.
This is a common scenario in telecommunications and other industries that adopted enterprise CRM platforms in the early 2000s. The systems still work. They still hold critical customer data -- account records, service subscriptions, billing histories, trouble tickets. But the integration surfaces were designed for a different era, and the institutional knowledge required to use them has often walked out the door.
The provider had already engaged two other development teams who abandoned the effort after failing to make meaningful progress against the API. Our engagement began with a simple question: can we build a systematic methodology for reverse-engineering this API that produces reliable, maintainable integrations?
Our Methodology: Systematic API Discovery
Rather than attempting to brute-force the integration through trial and error, we developed a structured four-phase methodology that we have since applied to multiple legacy API engagements. The core insight is that undocumented APIs are not unknowable -- they are simply uncharted. The WSDL definitions, the SOAP fault codes, and the response payloads themselves contain all the information needed to build a complete integration. You just need a disciplined process for extracting it.
Phase 1: WSDL-First Service Enumeration
Every SOAP API begins with its WSDL (Web Services Description Language) definition. Even when human-readable documentation is absent or wrong, the WSDL is the contract -- it is the machine-readable truth about what the API offers. Our first step was to pull every available WSDL endpoint and parse them programmatically.
This initial analysis revealed the scope of the integration surface:
- 12+ distinct SOAP services, each with its own WSDL definition and operation set
- Hundreds of operations spanning account management, service provisioning, billing, trouble ticketing, and equipment tracking
- Complex type hierarchies with deep inheritance chains and polymorphic elements
- 49+ "context" types -- polymorphic request/response wrappers whose shape changed depending on the operation being invoked
We cataloged every service, operation, and type into a structured inventory. This became our roadmap -- a complete enumeration of what the API could do, even before we understood how any individual operation actually behaved.
Phase 2: Test-Driven Endpoint Discovery
With the service catalog in hand, we moved to the most critical phase: figuring out how each operation actually works in practice. WSDL definitions tell you the shape of requests and responses, but they do not tell you which fields are truly required (despite what the schema says), which field combinations are valid, what values are acceptable, or how the system behaves when you push against its boundaries.
Our approach was test-driven discovery -- treating integration tests as probes rather than verification:
- Write a test that calls the operation with the minimum viable request based on WSDL types
- Observe the response -- does it succeed? If it fails, what does the SOAP fault tell us?
- Adjust and re-run -- add required fields, change values, test edge cases
- Document the actual contract -- the real required fields, valid value ranges, and behavioral quirks
- Lock the test as a regression guard once the operation's behavior is fully understood
This approach turned the legacy API into a testable system. Every piece of integration knowledge was captured as a passing test, which meant that if the API's behavior ever changed -- during vendor updates, for instance -- we would know immediately.
The most valuable documentation for an undocumented API is a comprehensive test suite. Tests do not go stale the way documentation does. They either pass or they fail, and when they fail, they tell you exactly what changed.
Phase 3: Schema Validation with Zod
One of the most significant challenges was the API's polymorphic response types. The same operation could return fundamentally different response shapes depending on the input context, the account state, or even server-side configuration flags that were invisible to the caller. A response field documented as a single object might arrive as an array. A field expected to contain a string might come back as a nested structure.
To handle this, we built Zod runtime validation schemas for every response type:
- 49+ polymorphic context types modeled as Zod discriminated unions
- Strict parsing that rejects unexpected shapes rather than silently accepting them
- Detailed error messages that pinpoint exactly which field violated expectations, making debugging straightforward
- Versioned schemas that could be updated incrementally as we discovered new response variants
This was not just a defensive measure -- it was a discovery tool. When a Zod schema rejected a response, the validation error told us exactly what we did not yet understand about the API. Each validation failure became a learning opportunity that expanded our schema and deepened our integration coverage.
Phase 4: TypeScript Abstraction Layer
Raw SOAP calls are verbose, error-prone, and hostile to modern development workflows. Our final phase was building a typed TypeScript service layer that abstracted the SOAP complexity entirely. The modern application consuming the CRM data never needs to think about XML namespaces, SOAP envelopes, or WSDL bindings.
Each of the 12+ SOAP services received a corresponding TypeScript client with:
- Fully typed method signatures -- every operation has TypeScript interfaces for its inputs and outputs
- Zod runtime validation on every response, catching API drift before it propagates
- Automatic session management -- authentication, token refresh, and session recovery handled transparently
- Retry logic with exponential backoff for the unreliable endpoints (and there were several)
- SOAP fault-to-HTTP error mapping -- translating opaque SOAP fault codes into meaningful HTTP status codes and error messages that modern API consumers can interpret
The result was an integration layer that felt like calling a modern REST API, while under the hood it was managing the full complexity of SOAP communication with a legacy system that predates many current web standards.
Technical Deep Dive: Handling Polymorphic Contexts
The most technically demanding aspect of this project was the CRM's polymorphic context system. Nearly every operation accepted a "context" parameter that controlled the scope and behavior of the request. The context type was nominally a single type in the WSDL, but in practice it functioned as a discriminated union with 49+ variants.
The challenge was threefold:
- Discovery: Which context types existed was not fully documented. We discovered several through SOAP faults that referenced types we had not yet encountered.
- Validation: Each context type required different fields, and sending the wrong fields for a given context type produced unpredictable results -- sometimes silent data corruption rather than clean errors.
- Maintenance: The vendor occasionally added new context types in updates without documenting them, which could break integrations that assumed a fixed set of variants.
Our Zod-based approach solved all three. The discriminated union schemas served as living documentation of every known context type. The strict parsing caught new or changed types immediately. And the modular schema structure made it straightforward to add new variants without disrupting existing integrations.
Error Handling: Making Legacy Faults Meaningful
Legacy SOAP APIs have a particular talent for unhelpful error messages. The CRM system returned SOAP faults with codes like SERVERR_002 or INVALID_CONTEXT -- codes that appeared nowhere in the vendor documentation and whose meanings could only be determined empirically.
We built a comprehensive fault mapping layer that:
- Catalogs every observed fault code with its actual meaning (determined through test-driven discovery)
- Maps SOAP faults to appropriate HTTP status codes (authentication failures become 401s, missing resources become 404s, validation errors become 422s)
- Enriches error responses with human-readable descriptions and suggested remediation steps
- Flags unknown fault codes for investigation rather than swallowing them silently
This error handling layer alone saved the client's development team significant debugging time. Instead of staring at raw XML fault responses, they received structured JSON errors with clear explanations and actionable guidance.
Results
The systematic approach produced measurable outcomes across several dimensions:
- Complete API surface mapped: 12+ SOAP services with full operation coverage, documented through tests rather than stale documentation
- Integration time reduced by an estimated 60-70% for new features: developers work against the typed TypeScript layer rather than raw SOAP, eliminating entire categories of integration bugs
- Runtime validation catches API drift immediately: Zod schemas have already caught three vendor-side changes that would have caused silent data issues in production
- Zero undocumented-behavior production incidents since deployment: every API behavior is captured in the test suite and schema definitions
- Reusable methodology: the four-phase approach (enumerate, probe, validate, abstract) has since been applied to two additional legacy integration projects
A Repeatable Methodology for Legacy Integration
The most valuable outcome of this engagement was not the integration itself -- it was the methodology. The four-phase approach we developed is applicable to any legacy API integration, regardless of protocol (SOAP, XML-RPC, proprietary) or industry:
- Enumerate: Use whatever machine-readable definitions exist (WSDL, XSD, Swagger) to build a complete service catalog
- Probe: Use test-driven discovery to map actual behavior, documenting findings as executable tests
- Validate: Build runtime schemas (Zod, io-ts, or similar) to enforce contracts and catch drift
- Abstract: Create a typed service layer that insulates consumers from protocol complexity
This approach turns what is typically a frustrating, open-ended investigation into a structured engineering process with clear milestones and measurable progress. Each phase produces tangible artifacts -- catalogs, tests, schemas, clients -- that serve as both documentation and regression protection.
If your organization is facing a similar challenge with undocumented or poorly documented legacy systems, we have a proven approach for turning those systems into reliable, maintainable integration surfaces. Learn more about our custom development services or read our guide on legacy system modernization strategies.
Ready to Modernize Your Legacy Integrations?
Legacy systems do not have to be black boxes. With the right methodology, any undocumented API can be systematically mapped, validated, and wrapped in a modern, developer-friendly interface. Whether you are dealing with SOAP, XML-RPC, proprietary protocols, or any other legacy integration surface, we can help you build reliable, maintainable integration layers that unlock the data trapped in your existing systems.
Contact us to discuss your legacy integration challenges. We will assess your existing systems and outline a practical path to modern, reliable integrations -- without requiring a full platform replacement.