Great AI needs less ego

why your model is still just a guess without domain knowledge

Dr. Asyikeen Azhar

Chief Technical Officer

my Master’s in data science 9 years ago was when I first saw how machine learning could be applied across industries. back then, the theory was ahead of what industry was ready for, as it is still now. i was one of the oldest in my cohort. 5 years in audit. no coding background. no computer science training.

i was fresh off the boat, a migrant in that completely foreign land. my industry knowledge stuffed into my suitcase and bleary eyed - new (programming) language(s), new tools, new way of thinking.

what saved me in that new found land, besides my over optimism, was just that - my suitcase full of trinkets. bits and pieces of industry context from working with different clients across sectors. and that turned out to matter more than I expected.

very quickly, it became clear:

no matter how good your algorithm is, or how well you engineer it - the model is only as good as the domain knowledge that shapes it.

that gap between model and domain shows up even in the most advanced systems.

i recently watched The Thinking Game, a documentary on DeepMind and AlphaFold. the problem they were tackling was protein folding, predicting a protein’s 3D structure purely from its amino acid sequence. something scientists had been trying to solve for decades.

in 2018, AlphaFold beat the second team in the Critical Assessment of protein Structure Prediction (CASP) by nearly 50%. on paper, that sounds like a breakthrough. but in reality, they still had a long way to go.

the predictions weren’t consistent. quality varied. and more importantly, they weren’t reliable enough for biologists to actually use. it was still closer to a very good guess than something you could build real science on.

so despite the huge margin, it was a humbling moment. the model was impressive, but it didn’t yet understand the problem.

beating benchmarks is not the same as solving the problem.

in 2019, they made a shift. they started incorporating domain knowledge more seriously. up until that point, the team was mostly engineers. strong ones, but nobody deeply trained in biology. they were trying to solve a biological problem without fully speaking the language.

researchers like Dr. Kathryn Tunyasuvunakool, a structural biologist - and notably one of the very few women in the room - became critical to the process. not to code the model. but to shape how the problem was defined, what signals actually mattered, and how biological constraints should be embedded into the system.

they ended up rewriting large parts of the data pipeline because the issue wasn’t just the model. it was what the model was being fed, and how it was being asked to learn.

the model didn’t suddenly become smarter. the team understood the problem better.

finally in 2020’s CASP, the accuracy improved exponentially and was deemed as a viable solution by the biological community. the tool can be used practically by scientists to predict how a protein folds into its 3D structure. they can now understand disease mechanisms, test hypotheses faster, and design experiments (and drugs) to bind to specific parts of the protein with much higher confidence.

i felt that part was almost glossed over. and that’s not unique to this case. it happens everywhere.

we celebrate the model.

we celebrate the engineering.

but the domain knowledge that makes the system work is often invisible.

i see this firsthand in my work as a data scientist in an MNC. the products that actually take off - the ones that become usable in real operations - are the ones built closely with domain experts. not just weekly progress update check-ins or surface-level input, but deep involvement in how the problem is framed, how decisions are made, and how the system should behave. everything else eventually falls apart.

and now, in my own startup, domain knowledge takes centre stage. i work closely with my co-founder. someone who has been on the ground, understands the workflows, the constraints, the real pain points.

we don’t start with the model. we start with:

what does solving this problem actually look like?
what decisions are being made?
what information matters?

only then do we translate that into data structures, system design, and algorithms. because in the end, building AI products isn’t just about intelligence. it’s about translation.

translating messy, real-world knowledge into something a system can understand and act on. you can have the best model in the world. but if you don’t understand the problem, you’re just building a very sophisticated guess.