Your Firm Has a Knowledge Problem. RAG Is the Infrastructure Fix.

I built a working version of it using Keppel Data Centres to see how far it could go.

Apr 23, 2026

Six years of ownership. A concentrated position. A PM who wants a divestment memo tomorrow. You’re a new analyst who just realized your predecessor covering this company quit three months ago, leaving behind 24 folders filled with unlabelled pdf files.

You read what you can find, call whoever you know and hope you haven’t missed anything important. You ask ChatGPT for help but triple-check every stat – the last time it made up a fake number that nearly cost you your job. Deep into your fifth coffee and third set of eyedrops, you’re sixty percent sure you have everything you need. Sixty percent. For a potential multi-million deal.

For Singapore AMs, this is part and parcel of the research cycle.

The latest AI models are incredibly intelligent. Ask covenant structures in private credit, or REIT capital recycling dynamics in Southeast Asia, and you’ll get a very impressive answer. But intelligence on its own is overrated, especially when you’re trying to get real work done.

Frontier models have 2 real problems. They makes things up, treating fake numbers like gospel truth. And they don’t know anything about your private data either. That’s why most PMs tend to be sceptical about what AI can do for them.

Enter RAG.

Retrieval-Augmented Generation (RAG) combines the best of both worlds – state-of-the-art reasoning and your own internal documents: board packs, deal memos, IC minutes, broker reports etc. It makes intelligence actually useful. The end output doesn’t just know finance in general, but your portfolio specifically.

To test it out, I built a RAG chatbot trained on Keppel Data Centre’s official financial and operational reports. I was testing for accuracy, and (more importantly) whether it would tell me if it didn’t have the data I was asking for.

Here’s what I found.

Click here to see the Dashboard:

Click here to see the Documentation:

What makes RAG useful

Before I get to the build, two properties of this architecture are worth understanding.

First, it significantly reduces hallucination risk. LLMs can make things up, particularly when they don’t know something. For a consumer chatbot that’s annoying. For an IC memo or a compliance decision it’s a serious liability. RAG constrains the model to answer from documents you’ve explicitly provided. Every claim can be traced back to a specific source, page, and paragraph. That traceability matters enormously when someone asks you to justify a call.

Second, nobody else sees your documents. When you use a public LLM interface and paste in a board pack or an internal research note, you’re sending that content somewhere. Depending on the provider and the settings, it may be used to improve the model. For a consumer query that’s fine. For non-public information about a private portfolio company, it isn’t. RAG keeps the knowledge base on your own infrastructure. The model reasons over your documents without storing them or training on them. You get the intelligence layer without the data exposure. Clean separation.

You can also layer access controls. A junior analyst queries public filings. A PM queries everything including restricted deal memos. A compliance officer has a view tuned to covenant documents and regulatory records. One knowledge base, segmented access.

What I actually built: Keppel Data Centres

To test how useful this architecture could be in practice, I built a RAG system over Keppel DC REIT. A well-covered SGX name with enough public documentation to stress-test the retrieval pipeline.

The knowledge base has three document types: financial reports from Keppel DC REIT, SGX filings and historical price data. The mix of structured financial tables and time-series price data is a reasonable proxy for what an AM would actually deal with across their listed holdings.

Some examples of what the system handles well:

“Based on the last three annual reports, is KDC’s debt maturity profile getting longer or shorter — and what does management say about refinancing risk?” — tests cross-document synthesis and whether the system can connect a quantitative trend to a qualitative commentary

“Which markets in KDC’s portfolio have seen the highest occupancy growth, and how does that compare to where they’ve been deploying capex?” — requires joining operational data across multiple periods

"What is the consensus view on Keppel DC's market performance?" — the system declined. No broker reports in the knowledge base, so rather than guess, it said so directly and pointed to what it did have. That's exactly the behaviour you want from a system handling investment decisions. A RAG setup that knows the boundary of its own knowledge is more useful than one that fills the gap with something plausible-sounding.

Retrieval works across heterogeneous document types. The system handles the transition between a financial table, a paragraph of management’s narrative, and a price series without losing coherence. That’s the foundational capability before extending to something more complex.

Where it gets really interesting: public and private, unified

Consider a Singapore AM holding two significant positions. One is Keppel Data Centres. The other is a Series B fintech in Indonesia.

Keppel has everything: Bloomberg page, SGX filings, MAS disclosures, four years of REIT financials, a dozen broker notes. The challenge is pulling the relevant signal from a large and fragmented document set quickly and reliably.

The Indonesian fintech has none of that. No Bloomberg page. No public filings. What the firm does have are a hodgepodge of data: quarterly board packs, emailed analyst notes, scanned pdfs of loan agreement, a site visit memo the PM wrote in 2023, the list goes on.

RAG is what makes this internal data intelligent, transforming a static document archive into something you can actually interrogate, cross-reference, and derive structured insight from.

The same RAG pipeline that indexes Keppel’s public filings can ingest the fintech’s board packs. Same query surface. Same interface. From the analyst’s perspective, they’re exactly the same. Ask a question and get a sourced answer, regardless of whether the underlying company is publicly listed or not.

The practical result: a cross-portfolio query that no Bloomberg subscription answers. “Which of our holdings - public and private - have revenue concentration in the Indonesian consumer segment?” currently requires an analyst to manually trawl multiple disconnected environments. With a unified RAG setup, it’s one query.

Others are already doing versions of this

The architecture isn’t speculative. Larger institutions are already running versions of it, and the gap is widening.

Morgan Stanley deployed an internal GPT-powered assistant over more than 100,000 research documents which can surface relevant content in natural language rather than keyword search. The result isn’t just speed. It’s institutional memory that doesn’t walk out the door when an analyst leaves.

BlackRock has been more public about where Aladdin is heading: AI-assisted research synthesis layered on top of existing portfolio analytics. The direction of travel is the same: frontier reasoning grounded in proprietary data.

Closer to home, DBS has built internal AI tools over their own document sets. A bank operating across multiple Southeast Asian markets, with all the regulatory complexity that entails, found it worth the investment. That’s a useful data point for any Singapore-based firm thinking about the compliance dimension.

What these firms have in common is a decision to treat their internal document library as infrastructure rather than an archive. The AI layer on top is almost secondary.

What this looks like day-to-day

The Keppel DC and Indonesian fintech example is simple. Two companies, two document environments, one unified query. But the underlying architecture touches nearly every core process an AM runs, because every core process has the same problem at its heart: someone needs to find the right information, from the right document, at the right time.

Earnings deep-dives. IC memo drafting. Cross-portfolio exposure checks. Covenant monitoring on private credit positions. Regulatory watch across MAS and SGX circulars. The specific workflow doesn’t matter much.

What they all have in common is a retrieval problem that currently gets solved by an analyst spending time they don’t have, searching through documents that aren’t organised for searching.

RAG doesn’t change the judgement call at the end. It compresses everything that happens before it.

RAG will be table stakes in three years. The question is who builds it first.

Think about what Excel looked like twenty years ago. The firms that built rigorous, well-structured models early didn’t just work faster. They developed a fluency with their own data that compounded over time. The model got better as more history went in. The analyst who built it understood the business more deeply because structuring the data forced precision.

By the time Excel proficiency became an assumed baseline, those firms had a head start measured in years, not months.

RAG is at that same inflection point now.

Within three years, having a RAG system over your firm’s internal knowledge base will be as standard as having a Bloomberg Terminal. Not because it’s a nice-to-have, but because the competitive pressure from firms that have built it will make the gap impossible to ignore.

And that gap will continue to grow. A better LLM released in 2027 immediately makes your 2024 document library more powerful. The firms that treated their internal documents as a strategic asset – indexed, structured, access-controlled – will sit on infrastructure that compounds in value as the models on top of it improve. The investment in the data layer pays forward

For mid-sized Singapore AMs specifically, the window is narrow but real. The institutional players – BlackRock, Morgan Stanley – have already moved. The question is whether the gap closes further before local firms act.

What comes next

I built the Keppel DC system to understand the architecture before extending it to private company document sets - the board packs and deal documentation that represent the knowledge layer most AMs aren’t yet querying systematically. That’s the next phase.

If you’re working through similar questions – what document types to prioritise, how to handle messy private company reporting, how to build access controls that actually hold – I’d like to hear from you. The infrastructure is more accessible than most people assume. The harder problem is workflow design, and that’s best solved with people who understand how an investment team actually makes decisions.

Discussion about this post

Ready for more?