Raman spectroscopy has been having a bit of a moment in upstream process development. It’s one of the most promising tools we have for getting closer to real-time biological understanding inside a bioreactor. But it’s also one of the most frustrating, and hard to get it right.
Over the past year, talking with dozens of upstream PD groups across mammalian, microbial, fed-batch, and perfusion systems, it was clear that teams wanted Raman-level insights, but were hesitant to use it because of the cost, overhead, and specialized modeling burden that Raman requires.
tl;dr - Raman is amazing, you can create a lot of the same insights using neural net based soft-sensors with software only.
What is Raman?
Raman spectroscopy is like shining a flashlight on a material and listening to how affected molecules vibrate. The Raman device uses an optical fiber to shine a laser at your broth and capture scattered light in a tiny digital camera to create a spectral fingerprint of everything that’s in your solution.
Raw readings are a spectrum of light intensities at different ‘Raman shifts,’ caused by photons interacting with the material, and more precisely, the increase or decrease of energy due to the vibrational or rotational movement of different molecules.
But what you want from it is not this ‘shift fingerprint’. What we really want is to quantify some component of interest - carbohydrate, amino acid, produced mAb, or general biomass. Each would have a different effect on the fingerprint and different quantities of it would change the fingerprint in some (yet unknown) manner.
The next step is to take these spectral fingerprints and try to understand the correlations between changes in them with changes in our molecule of interest. That’s where modeling comes in.
To build a model you:
- Run batches where you take Raman spectra continuously.
- Measure your molecule of interest offline (the “labels”).
- Use PCA, PLS, or other supervised methods to correlate the fingerprint with the molecule concentration.
- Validate the model.
- Pray it behaves when you change anything.
When it works, it’s beautiful. You can imagine a world where you always know:
- when glucose dips below threshold,
- when ammonia spikes,
- when lactate builds up,
- where your biomass is trending hour by hour.
It’s a level of visibility that compresses development timelines and improves control strategies dramatically.
So why aren’t we all using Raman?
As powerful as Raman is, it is also rather finicky.
Failure modes can be any of the following incredibly small environmental or operational shifts:
- A bit of ambient light leaking in.
- A slightly different media lot.
- Using a different scale of bioreactor.
- Swapping cell lines.
- Changing feed composition.
Models that work beautifully on System A fall apart when moved to System B. Models that work at 2 L need to be retrained at 50 L. The myth of “transferability” exists, but only as an exception.
As this is not enough, in addition to an already pricey bioreactor, you also need:
- additional probes that practically double the price of your new bioreactor
- trained Raman specialists
- specialized calibration runs
- someone comfortable doing chemometric modeling
- someone who can maintain and monitor drift in the models
Soft sensors as an alternative to Raman-level insights
After enough conversations with teams about Raman, you start hearing the same tension over and over again. People want the insight, but are hesitant to use Raman. They want Raman-level insights like the metabolite trajectories, the ability to see metabolic shifts in real time, and the comfort of knowing you’re not flying blind between daily samples. But they don’t want the operational reality of actually running Raman — the model brittleness, the re-training cycles, the cell-line dependency, and the constant calibration work. And frankly, most teams don’t have the bandwidth or internal expertise to babysit a chemometric model every time a media lot changes.
So the question that kept nagging me was whether we really needed Raman to get the kind of visibility people were chasing. Not the spectroscopy itself — the insight. If the objective is to understand metabolism continuously, do we actually need a laser and a spectral fingerprint? Or is the reactor already giving us enough information in the signals we look at every day?
This is what led us down the soft sensor path.
The idea behind soft sensors is simple: If you show a model enough historical examples of how these process signals correlate with offline measurements, it can learn to estimate metabolites in real time without additional hardware.
A bioreactor is constantly reacting to biology, and those reactions propagate through the control system in ways that are surprisingly consistent. When cells consume glucose faster, oxygen uptake changes. When lactate starts to build, the pH controller behaves differently. When biomass increases, the gas-transfer dynamics shift. These signals are messy and noisy and often counterintuitive, but they are not random. They carry the imprint of what the cells are doing metabolically.
If you can learn the relationship between these patterns and the corresponding offline measurements — glucose, lactate, ammonia, cell density, metabolic rates — then you can estimate the metabolic state in real time without additional hardware.
Soft sensors rely on temporal models that treat bioprocesses the way they actually behave — as continuous, evolving systems. You take the continuous online signals you already have — DO, pH, gas flows, agitation, temperature, feed additions, controller outputs — and you train a temporal model to learn the mapping between those signals and your offline labels.
This is the part that often gets overlooked. Soft sensors rely on the quality of your historical and context rich data.
If time stamps drift, if sampling events don’t align, if feeding changes aren’t captured, or if each reactor exports data differently, the model never generalizes. Most soft-sensor failures are data failures, not modeling failures.
How BioRaptor enables soft sensors
BioRaptor solves the foundational problem: capturing accurate, context-rich data.
Because BioRaptor already ingests, harmonizes, and contextualizes all online and offline bioprocess data, it provides the infrastructure soft sensors require. The platform continuously captures DO, pH, gas flows, agitation, temperature, feeding behaviors, controller setpoints, and the entire operational context of each run. It aligns offline assays automatically, normalizes sampling gaps, handles metadata inconsistencies, and reconstructs the timeline of each batch with the kind of precision no one has time to do manually. By the time you start training a model, the heavy work — the part that determines whether the model will ever generalize — is already done. This is the real story behind how we developed our soft-sensor capabilities.
How accurate are numbers from soft sensors
I want to be concrete about performance because I think a lot of technology gets oversold in this space. We've now built soft sensors for a handful of different customers across mammalian and microbial systems, fed-batch and perfusion modes, and the results are pretty consistently in the R-squared range of 0.75 to 0.91 depending on what you're trying to estimate.
For something like viable cell density, where you would assume the online signals have no chance of reflecting biomass meaningfully, the model often lands around a 0.85–0.91 correlation on unseen batches. Which still surprises people when they first see it, because it feels like magic until you remember that metabolism is not subtle. Cells shift oxygen use, acidification, and gas-transfer behavior in ways the reactor already measures. We simply never had a good way to read it.
One of the more interesting findings across projects is how well the same core architecture adapts to very different biological systems. Perfusion behaves differently from fed-batch. Microbial behaves differently from mammalian. Some systems have sharp metabolic shifts; others drift slowly. And yet the network architecture doesn’t need to change much. What changes is the data it sees. The model learns the vocabulary of each system’s behavior. As long as the bioreactor’s control system is reacting to biology, the signals contain structure.

Eliminating sampling artifact problem
Offline sampling introduces measurement variability that clouds your biological signal. Sample degradation, operator technique differences, analyzer calibration drift—for stable metabolites like glucose this adds 5-8% CV, but for labile metabolites like glutamine or ammonia you're looking at 15-20% CV just from measurement artifacts.
Soft sensors eliminate sampling artifacts and provide estimates every few minutes versus once daily, making it easier to distinguish real trends from noise. One customer optimizing ammonia control had offline measurements with ~18% CV - too variable to tell what actually helped. Continuous estimates cleared up the signal enough for meaningful optimization.
There's also a psychological shift. Daily measurements create recency bias. Continuous trends make it obvious whether something is real change or noise. Your mental model shifts from "what did today's sample say" to "what does the trajectory look like."
How this stacks up against Raman
The comparison to Raman is honestly pretty nuanced and I don't think it's an either-or situation. Raman is doing something fundamentally different - it's directly measuring molecular vibrations spectroscopically, which means if you build the right models you can quantify a much wider range of components, not necessarily related to metabolism.
What makes soft sensors compelling is that they remove the entire category of operational burden. There's zero additional capital expenditure beyond what you're already spending on standard instrumentation. No one on your team needs to become a Raman expert. You don't need to multiplex hardware across vessels. And the brittleness problem, while it still exists, it tends to be far more stable than a spectral model trained under idealized conditions.
Where soft sensors make sense versus where they don't
I think this makes sense precisely where Raman doesn’t.
Early process development seems like a natural fit. You're trying to understand metabolic patterns and optimize feeding but can't yet justify a $500k PAT infrastructure investment and training runs. The barrier to trying soft sensors is basically just your data infrastructure.
Scale-up is another scenario that makes sense. Moving from bench to pilot to manufacturing means installing and recalibrating hardware at each scale, dealing with different mixing and mass transfer regimes. Having monitoring that's based on the data every bioreactor already generates potentially transfers more smoothly, though you'd obviously still validate at each scale.
For processes where sampling is risky, expensive or even undoable - whether that's because of contamination concerns, difficult-to-access cultures, or just high-value small batches where every sample matters - being able to maintain metabolic visibility without routine sampling has real operational value.
And I'll be blunt, for teams that tried Raman and decided it wasn't worth the overhead for their particular application, soft sensors offer a way to recover some of that value proposition without the same operational load.
Where soft sensors probably don't help much is for substances which are not participating / influencing any metabolic activity. The information just might not be there in your pH and dissolved oxygen traces in a distinguishable way.
The future of Raman without Raman
My suspicion is that the future for a lot of teams actually involves using both technologies at different stages. Perhaps even complementing each other for various teams.
You might run Raman on your key development bioreactors where you're doing intensive optimization and building detailed models. Then deploy soft sensors across your manufacturing fleet where putting Raman on every vessel doesn't pencil out financially, but you still want real-time visibility. You can even use Raman to train our soft-sensor, and gain the same insights in vessels where you haven’t installed Raman.
The real competition isn't between these approaches. It's between having real-time metabolic information versus flying blind between daily offline samples. Any technology that moves you toward continuous process understanding has value, and different solutions make sense for different applications and development stages.
For teams that have been hesitant about Raman because of cost or complexity, soft sensors offer a practical entry point using infrastructure you already have. For teams already using Raman, soft sensors can extend those capabilities more broadly. Either way, the goal is making better decisions faster based on what's actually happening in your cultures right now, not what happened yesterday when you pulled that sample.
So if you’ve been on the fence trying to decide whether Raman warrants the time and money, talk to us. Let’s talk, there’s now a clear alternative.
.png)
.png)
