Railway track with switch and interchange
getty
In the late 1800s, the U.S. railroad system didn’t just revolutionize transportation—it catalyzed entirely new industries. Refrigerated railcars enabled a national meatpacking economy. Standard time zones emerged. Towns, banks, and industrial supply chains took shape around the infrastructure. In short, once the railroads laid the tracks, new economic possibilities exploded.
We are witnessing something similar in healthcare today. The convergence of powerful forces has opened a new frontier for how data can be used to accelerate innovation in medicine and care delivery. These forces include the mass digitization of health records through the EHR Incentive Program, policy pushes like the 21st Century Cures Act mandating interoperability, the explosion in computing power and cloud infrastructure, breakthroughs in AI and LLMs, and record levels of venture investment in digital health infrastructure.
Together, these forces are enabling a new kind of infrastructure: health data platforms that aggregate, normalize, curate, and make deidentified health records accessible for use across the healthcare ecosystem.
The promise of this infrastructure is hard to overstate. Much of modern medicine has come about through the rigorous, time- and labor-intensive nature of clinical trials. Data infrastructure that allows researchers to understand how procedures, devices, and therapeutics are being used (often in ways that vary from protocols in clinical trials) in the real world, however, represents a potential for a paradigm shift.
Much like the railroads of old, these platforms are enabling entirely new markets to emerge around them—including the rapidly expanding space of real world data (RWD), which promises to change how innovation is done in life sciences, medical devices, health services, and beyond.
II. The Market for Deidentified Health Data: How It Evolved, What It Enables
The market for deidentified health data has evolved quickly, shaped by both policy and technological change. Initially, real-world data primarily meant insurance claims data: standardized, billable events that offered a structured and wide but shallow view of patient care.
Claims data has long been valuable to payers and life sciences companies for market access strategies, cost forecasting, and utilization management. However, its limitations have grown obvious. Claims data lacks clinical depth, has long lag times, often omits medication details, and suffers from fragmentation across payers and high churn. Most significantly, it excludes uninsured patients and those paying out of pocket, which introduces systemic bias.
More recently, novel data sources have entered the market. Electronic Health Records (EHR) data is rich in clinical nuance, enabling use cases like outcomes analysis, protocol adherence, and real-time clinical trial identification. Personal health records (PHRs), wearables and fitness trackers, and patient-reported outcomes collected through digital platforms have expanded the landscape further, offering insight into lifestyle, adherence, and real-world effectiveness.
As the scope and variety of available data has expanded, so has the use case landscape. Pharmaceutical companies use RWD to complement clinical trials. Medtech firms analyze post-market surveillance data. Payers examine longitudinal cost patterns. Researchers use RWD to explore health disparities or intervention efficacy. The result is a demand-rich environment—but one where infrastructure to access, manage, and use data is still in its formative stages.
III. The Emerging Health Data Platform Ecosystem
Some of the infrastructure to support RWD has existed for decades. Well-established incumbents such as IQVIA and Symphony Health, for example, have assembled large datasets through longstanding partnerships with claims clearinghouses and payers. Their scale is formidable, but their focus on claims-based data leaves substantial room for improvement.
That opportunity for improvement comes against the backdrop of health systems spending the past decade-plus investing capital and labor-intensive change management efforts to adopt EHRs and more recently, to optimize how they use the systems. In the meantime, health systems have struggled to get through a series of crises: Covid-19, clinician burnout, rising labor and supply costs, and challenging patient volume trends.
As health systems grappled with operational challenges, others acted on new opportunities created by digital health records and connectivity. Some EHRs built data businesses. Tempus AI built a data business, combining its own genetic testing data with records from referring providers.
As data from more digital sources became available, the challenge shifted to aggregating and harmonizing that data to make it usable. Now, there is an emerging set of startups that are focused on enabling providers, and in some cases individuals, to take control of the data that they generate and are stewards of.
Mitesh Rao is a physician and Adjunct Professor of Emergency Medicine at Stanford School of Medicine, who started OMNY Health to address his own frustration as a former researcher. “I would consistently see that our [provider] data had powerful opportunities for both research, advancing healthcare, but getting that data out at scale, being able to build those partnerships was a constant struggle,” he explained last year.
More recently, a wide array of new entrants have taken aim at building next-generation platforms to “get the data out at scale”. These include:
- Venture-backed firms such as OMNY Health, Briya, Komodo Health, and Evidation Health, each bringing novel approaches to sourcing and structuring data.
- Truveta, a health system consortium-backed data company with over 30 health system members and backing from Providence and Microsoft.
- Mayo Clinic Platform, a health system-led initiative that is building data infrastructure not for data resale, but to enable a marketplace for artificial intelligence in healthcare.
- EHRs such as Veradigm and notably Epic, which has building its Cosmos data infrastructure and announced new capabilities recently
These companies differ not just in their origins but in how they create value. Some focus on data liquidity, others on analytics, still others on technology enablement.
IV. Models of Differentiation: Infrastructure, Networks, and Use Cases
The breadth of players in this space means that their business models and strategic bets diverge significantly. Some, like OMNY, Briya Health, and Truveta, are creating two-sided marketplaces connecting clinical data sources (like hospitals) to data users (pharma, medtech, payers). Their core value proposition lies in surfacing rich new data sets that have historically been locked within siloed EHRs. By creating technology rails for this exchange, they provide infrastructure and tools for data sources, while allowing data users to discover, access, and derive value from real world clinical data.
Then there are platforms like Komodo Health and PurpleLab, which aggregate both claims and clinical data, often from third-party sources (including those above) rather than directly from health systems. These companies are betting that full-stack solutions that include analytics tools, visualizations, and machine learning capabilities along with professional services, will appeal to data users. The idea is that as data access becomes commoditized, differentiation will come from how well a company can help customers make sense of that data.
Evidation Health is taking another approach entirely, building a direct-to-consumer network. “Evidation is built on a different foundation — creating direct, longitudinal relationships with individuals who explicitly permission their data for use in research,” explained CEO Leslie Oley Wilberforce by email. The company provides technology tools and services to individuals, allowing them to aggregate their wearables, fitness and health data. Evidation makes money in part by helping life sciences firms craft and conduct real world studies with Evidation’s population who opt in, sharing a portion with consumers. “We believe individuals should receive clear value in return, whether through compensation, health insights, or the ability to contribute to research that matters to them,” noted Oley Wilberforce.
Mayo Clinic Platform represents yet another approach. Rather than monetizing deidentified data, the platform provides a secure environment where third-party developers can build, test, and train AI models based on data from Mayo’s global network of data partners. The value here is in safe, privacy-preserving data access for algorithm development, not resale.
V. Strategic Considerations in a Network-Driven Market
One of the defining features of this market is that value creation depends on network effects. The more data sources a platform connects, the more valuable it is to data users. Conversely, the more high-value data users a platform attracts, the more appealing it becomes to hospitals and providers as a revenue or research channel. Sustaining both sides of the network is the strategic challenge.
Several key decisions influence how these dynamics play out:
- Data Composition: Companies must decide which types of data to specialize in. Claims data offers scale and a longitudinal perspective; clinical data offers depth; imaging, labs, and unstructured notes offer robust specifics but come with processing challenges.
- Incentive Models: Platforms must offer data sources a reason to join. OMNY and Briya offer direct revenue shares to participating hospitals, generating high-margin income for financially strained providers. Truveta offers something else entirely: “A key differentiator is Truveta’s capital structure. Its healthcare system members… in exchange for an equity stake and percentage of profits, each makes a financial investment.” Mayo Clinic Platform doesn’t appear to offer direct compensation but provides preferred access to AI tools trained on contributed data.
- Governance Models: Trust is critical. Some platforms take custody of data and manage access centrally. Others, like Briya, allow hospitals to retain control, with CEO David Lazerson explaining by email that health systems “can keep their data securely on-site, reducing compliance risk, which makes them more willing to collaborate and share information.” This distributed approach appeals to data sources, but may reduce confidence for data users looking to looking for reliability of data access.
- Data Security and Privacy: Given the sensitivity of health data, platforms must maintain rigorous standards and transparency in how data is stored, deidentified, and accessed. Mayo Clinic Platform uses a “Data Behind Glass” model, a proprietary system with various technical controls. “Mayo Clinic neither owns nor wants to own the data from our partners,” writes Mayo Clinic Platform President John Halamka along with coauthor Paul Cerrato. Differentiation here is as much about trust as it is about technology.
VI. Business Model Headwinds
Despite growth potential, these companies face meaningful business challenges:
- Lumpy Revenue: Demand for RWD tends to be driven on a case-by-case basis, depending on the shifting priorities within life sciences companies. This means data platform revenue is largely tied to project-specific deals, which can be hard to forecast. This creates volatility and makes long-term investment planning difficult.
- Pricing Complexity: As discussed earlier in this article series, there is no standardized pricing model for deidentified data. This makes deal negotiations complex and expectations with data sources hard to manage.
- Strategic Uncertainty: The data that clients want today may not be what they want tomorrow. Clinical data may be preferred for its detail, but closed claims data may be more complete for understanding patient journeys. Platform investments must balance present utility with future flexibility.
- Cross-Side Dependencies: Two-sided platforms must constantly balance the interests of both sides. Empowering data sources with more control can limit the availability of data to users. Prioritizing user access may erode provider trust. Navigating these tradeoffs is not a one-time decision but an ongoing balancing act.
The biggest challenge, however, may not be the natural challenges of building a platform business, but from an established incumbent and the weight it holds in the market.
VII. An Epic Data Challenge
In 2019, Epic formally introduced Cosmos as its enterprise data collaboration initiative, threading together deidentified, longitudinal patient data contributed by participating health systems. Epic’s intent was to offer a “commons” of clinical information across its installed base, with query tools, analytics, and insight services layered on top. Over time, Cosmos has scaled aggressively: it now claims coverage of hundreds of millions of patients drawn from “hundreds of participating health care systems” nationwide. Earlier academic reviews described it as a “rapidly growing EHR vendor-facilitated data collaboration.”
Epic has positioned Cosmos as a multipurpose backbone spanning three key domains. First, it supports research and real-world evidence: institutions can run deidentified cohort queries, epidemiologic studies, comparative effectiveness analyses, and multicenter observational work. Second, Cosmos supports point-of-care insight tools: features like “Best Care Choices” or “Look-Alikes” allow clinicians to see what interventions or outcomes similar patients experienced in the aggregate Cosmos pool. Third, it now undergirds Epic’s AI and predictive modeling ambitions (e.g. pretraining with Comet, composing patient trajectory estimations via Comet, embedding agents into workflows).
The competitive implications for real-world data (RWD) platforms are profound. Epic holds a structural advantage: it is the incumbent EHR supplier for more than 40% of hospital systems. That means Epic already “owns the pipes” – it can natively collect, normalize, and operationalize clinical data at scale, and push models or agents directly into the workflows of its customers. Compared to independent RWD players like Briya, OMNY Health, or the Mayo Clinic Platform, which must negotiate data access, ingest heterogeneous sources, and integrate with non-Epic systems, Epic’s advantage is not just technical but institutional. Also important: Epic is privately held, meaning it is not beholden to quarterly earnings calls, short-term investor pressure, or public disclosure. That gives it the privilege of patience: it can invest heavily in long-cycle R&D and internal strengthening of its data assets, even if return is slow or uncertain.
VII. Realizing the Promise of Real World Data
The promise of real world data is profound. It could enable more inclusive clinical research, faster evidence generation, better regulatory submissions, and more personalized medicine. But that promise will only be fulfilled if the platforms building this ecosystem can execute on several fronts simultaneously:
- They must build and maintain trust across a fragmented landscape of data providers and users.
- They must develop sustainable business models that attract long-term capital and deliver real value to stakeholders.
- They must navigate a shifting regulatory and ethical landscape with transparency and integrity.
- And they must do all this while aligning their offerings with the evolving expectations of regulators like the FDA, which is gradually but meaningfully embracing RWD as a complement to traditional evidence generation.
The stakes are high. But so is the potential. Much like the railroads of the 19th century, today’s health data platforms are laying the tracks for a new kind of economy – one where insight, evidence, and innovation move faster and more freely than ever before.