Fragmented, inconsistent, and poorly accessible exchange rate information is a widespread problem in high-remittance economies. This paper presents the design and implementation of a fully automated forex intelligence system built to aggregate real-time transfer rates across more than 20 heterogeneous financial service providers — banks, exchange houses, and money transfer operators — from a single GCC-region country. The system employs a layered, multi-strategy scraping engine capable of handling static HTML, JavaScript-rendered SPAs, REST APIs, PDF rate sheets, and AJAX-secured WordPress plugins. A lightweight NLP and pattern-matching layer resolves the structural diversity of rate data. Rates are published to a cloud spreadsheet store on a 3-hour automated schedule using GitHub Actions and consumed by both a web analytics dashboard and a cross-platform React Native mobile application.
1. Introduction & Problem Statement
Remittance flows constitute a significant portion of GDP in many developing economies. In high-remittance corridors — particularly those connecting GCC countries to South and South-East Asian labour markets — millions of workers regularly convert a local currency (LC) into destination currencies such as INR, PHP, PKR, BDT, LKR, or EGP. The rates offered by individual exchange providers vary materially, and even small differences translate into meaningful sums over time.
Despite the volume and frequency of these transactions, the information landscape remains fragmented: each FSP publishes rates independently on its own website, often with inconsistent formatting, varying update frequencies, and — critically — without distinguishing between cash exchange rates and transfer/remittance rates. The two are structurally different products, and the confusion between them has direct financial consequences for end users.
1.1 The Cash vs Transfer Rate Problem
Cash exchange rates and transfer (also called wire, TT, or remittance) rates are priced differently by FSPs. Cash transactions carry higher operational costs — handling, vault management, currency risk — and are therefore quoted at a wider spread. Transfer rates, by contrast, reflect the lower cost of an electronic funds movement and consistently offer the customer more foreign currency per unit of local currency.
For a representative destination currency, observed cash rates in this study were approximately 15–20% lower than the corresponding transfer rate from the same provider — a difference of significant monetary value on typical remittance amounts.
Any aggregation system that conflates the two rate types produces misleading comparisons, potentially directing customers to sub-optimal providers based on inflated cash-rate figures.
1.2 Research Objectives
- Aggregate real-time transfer rates from all major FSPs in a target economy
- Handle the full heterogeneity of source formats without manual per-source maintenance
- Correctly identify and extract transfer rates, not cash rates
- Deliver the aggregated data to end users via both a web dashboard and a mobile application
- Automate the entire pipeline with zero human intervention after deployment
2. System Architecture
The system is designed as a linear, cloud-native pipeline with five logical layers. Each layer has a single responsibility and communicates with the next through a well-defined interface.
| Layer | Responsibility |
|---|---|
| 1. Scraping Engine | Fetches and extracts raw rate data from FSP websites and APIs |
| 2. Normalisation | Converts all rates to a canonical LC-per-foreign-unit format; assigns rate type |
| 3. Data Store | Google Sheets; two tabs: Latest (overwritten) and History (appended) |
| 4. Scheduler | GitHub Actions workflow on a 3-hour cron schedule; injects secrets at runtime |
| 5. Consumers | Web analytics dashboard (HTML/JS) and React Native mobile app — both read from the same Sheets CSV export |
Table 1. Pipeline layer overview
This architecture deliberately avoids a traditional database. Google Sheets functions as an always-on, zero-maintenance data store with a built-in HTTP API (the gviz/tq CSV export endpoint), accessible without authentication from both the web dashboard and the mobile app.
2.1 Technology Choices
- Python 3.11 — scraping and normalisation logic
- requests + BeautifulSoup + lxml — HTTP and HTML parsing
- Selenium (headless Chrome) — JavaScript-rendered page fallback
- Google Sheets API (service account) — data persistence
- GitHub Actions — scheduling and orchestration
- HTML + Vanilla JS + Bootstrap + Chart.js — web dashboard
- React Native (Expo) — mobile application
3. Data Collection: Multi-Strategy Scraping Engine
The scraping engine is the most technically complex component of the system. Across 20+ FSPs, no two sources use the same data format, delivery mechanism, or rate presentation convention. The engine solves this with a layered extraction strategy and a library of specialised parsers.
3.1 Source Taxonomy
| Source Type | Extraction Method | Rate Type | Notes |
|---|---|---|---|
| Bank portal (SharePoint) | HTML table → custom parser | Buy / Sell | Currency code lookup from dropdown |
| Bank portal (generic) | Multi-strategy _extract() | Buy / Sell | Fallback chain: JSON → table → regex |
| Exchange API (REST) | Direct JSON endpoint | Transfer only | Inverted from foreign-per-LC format |
| Exchange API (WordPress AJAX) | Nonce-authenticated POST | Transfer only | trtype=BT param distinguishes transfer vs cash |
| Exchange website (Vue SPA) | POST to backend API | Transfer only | Multiple payload shapes tried |
| Exchange website (static HTML) | Table + OMR= pattern match | Transfer only | Regex: '1 LC = X CCY' |
| PDF rate sheet | PyMuPDF / pdfplumber | Buy / Sell | Regex pattern on extracted text |
| JS-rendered page | Selenium headless browser | Mixed | Fallback for non-SSR sources |
Table 2. Source taxonomy and extraction methods
3.2 The Layered Extraction Engine
For sources without a known API, a generic multi-strategy extraction engine is applied. Strategies are tried in order of reliability, falling back to the next on failure:
def _extract(html, source):
# Strategy 1: Embedded JSON (Next.js __NEXT_DATA__, window.__STATE__, etc.)
results = _try_nextjs(html, source)
# Strategy 2: JSON-LD structured data
if not results: results = _try_json_ld(html, source)
# Strategy 3: Arbitrary <script> tag JSON blobs
if not results: results = _try_script_json(html, source)
# Strategy 4: HTML tables (tabular rate sheets)
if not results: results = _try_table(html, source)
# Strategy 5: CSS card/div layouts
if not results: results = _try_cards(html, source)
# Strategy 6: Full-page regex (last resort)
if not results: results = _try_regex_fulltext(html, source)
return results
3.3 API Reverse Engineering
Several exchange house websites use single-page application frameworks that load rate data via AJAX calls to internal backend APIs. Identifying these endpoints requires inspection of browser network traffic and JavaScript source files. Two representative cases illustrate the approach.
3.3.1 WordPress AJAX (Nonce-Protected Endpoint)
One FSP uses a WordPress-based currency converter plugin that exposes an admin-ajax.php endpoint requiring a nonce value generated fresh per page load — a common anti-automation measure. The scraper extracts the nonce from the rendered homepage HTML before making the API call:
# Step 1: Fetch homepage to extract nonce
resp = requests.get(HOMEPAGE)
nonce = re.search(
r'PluginConfig.*?"ajax_nonce"\s*:\s*"([^"]+)"', resp.text
).group(1)
# Step 2: POST to AJAX endpoint with trtype=BT (Bank Transfer)
r = session.post(AJAX_URL, data={
'action': 'convert_action',
'currfrom': LC_CODE,
'currto': dest_code,
'amt': 1,
'security': nonce,
'trtype': 'BT', # BT = Bank Transfer (not CP = Cash Pickup)
})
foreign_per_lc = r.json()['amount']
The trtype=BT parameter is critical: using CP (Cash Pickup) returns cash rates, which are significantly different. This distinction was discovered by inspecting the plugin's JavaScript source — an example of how rate type resolution requires source-level investigation, not just data parsing.
3.3.2 Vue SPA REST API
Another FSP uses a Vue.js SPA backed by a REST API. The exact shape of the POST payload varies by deployment version. The scraper tries multiple payload forms in sequence and uses the first successful response:
PAYLOADS = [
{},
{'fromCur': 'LC'},
{'fromCur': 'LC', 'toCur': 'ALL'},
{'sendCurrency': 'LC', 'recvCurrency': 'ALL'},
{'baseCurrency': 'LC'},
]
for payload in PAYLOADS:
resp = requests.post(API_URL, json=payload, headers=HEADERS)
if resp.ok and resp.json():
return parse_response(resp.json())
3.4 Selenium Fallback
JavaScript-rendered sources that cannot be accessed via direct HTTP requests are handled by a Selenium-based headless Chrome scraper. After page load, the scraper applies the same extraction strategies to the rendered DOM, with additional table and card parsing logic suited to dynamic content.
3.5 Geofencing & Geographic Access Restrictions
A significant operational challenge arises from the geographic deployment gap: the scraping pipeline runs on GitHub Actions infrastructure hosted in US-region data centres (Microsoft Azure), while the target FSP websites are designed to serve users within a specific GCC country. Many of these sites implement geofencing and return empty content, redirects, or outright blocks to requests from foreign IP ranges.
Three geofencing mechanisms were encountered across the source set:
- IP-based blocking — The server checks the client IP against a GCC/regional whitelist. Requests from US Azure IP blocks receive a 403 or redirect to a 'not available in your region' page.
- Content gating — The page loads but rate tables are replaced with placeholder text or empty containers for non-local IPs — making the block harder to detect programmatically.
- Header fingerprinting — Some sites cross-reference the IP with browser headers (User-Agent, Accept-Language, Referer). A non-regional IP combined with a generic Python User-Agent triggers bot-detection middleware.
The solution is a residential proxy with a local in-country IP address, injected at runtime via a GitHub Actions secret:
# Proxy injected from GitHub Actions secret at runtime
PROXY = os.environ.get('SCRAPER_PROXY')
PROXIES = {'http': PROXY, 'https': PROXY} if PROXY else {}
# Browser-mimicking headers sent on every request
DEFAULT_HEADERS = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ...',
'Accept-Language': 'en-GB,en;q=0.9,ar;q=0.8',
'Accept': 'text/html,application/xhtml+xml,...',
'Referer': 'https://www.google.com/',
}
def _get(url, **kwargs):
return requests.get(
url,
proxies=PROXIES,
headers={**DEFAULT_HEADERS, **kwargs.pop('headers', {})},
timeout=15,
**kwargs
)
The proxy approach also resolves a secondary challenge: some FSPs serve different rate data to different regions (e.g., rates quoted in AED on the UAE version of a site vs LC on the local version). Routing through a local IP ensures the scraper sees the same rates a local customer would see.
4. NLP & Pattern Matching for Rate Extraction
Raw web content does not contain labelled, structured rate data. Extracting accurate rates requires a lightweight NLP layer capable of identifying currency codes, interpreting table column semantics, and normalising numerical values across a wide variety of surface forms.
4.1 Currency Code Detection
ISO 4217 currency codes are three-letter alphabetical sequences. However, the same pattern appears in many non-currency contexts on financial pages: abbreviations, bank names, country codes, UI labels. A curated exclusion list filters false positives:
# Exclusion list — non-currency tokens matching [A-Z]{3}
_NOT_CURRENCY = {
'THE', 'FOR', 'AND', 'YOU', 'ARE', 'OUR', 'ALL',
'FSP', 'MTF', 'FIN', 'EXC', # FSP/brand abbreviations
}
def _code_from_cells(cells):
for cell in cells:
if re.fullmatch(r'[A-Z]{3}', cell) and cell not in _NOT_CURRENCY:
return cell
# Fallback: extract from longer text
for cell in cells:
m = re.search(r'\b([A-Z]{3})\b', cell)
if m and m.group(1) not in _NOT_CURRENCY:
return m.group(1)
return None
4.2 Table Column Semantics
HTML rate tables vary in column ordering and labelling. The most common pattern across the surveyed FSPs is a two-column numeric table (Buy | Sell from the exchange's perspective), but the order is not guaranteed. The parser maps positional indices to semantic roles:
for row in table.find_all('tr'):
cells = [c.get_text(' ', strip=True) for c in row.find_all(['td','th'])]
code = _code_from_cells(cells)
nums = [_safe_float(c) for c in cells if _safe_float(c) is not None]
# First numeric value = buy_rate (exchange buys foreign from customer)
# Second numeric value = sell_rate (exchange sells foreign to customer)
buy = nums[0] if len(nums) > 0 else None
sell = nums[1] if len(nums) > 1 else None
4.3 The '1 LC = X CCY' Pattern
Many exchange house websites present rates as natural-language expressions of the form '1 Local Currency = 317.25 INR'. This is a single-rate reference format that requires regex extraction and unit inversion:
for m in re.finditer(
r'1\s+LC\s*=\s*([\d,]+\.?\d*)\s+([A-Z]{3})\b', page_text
):
foreign_per_lc = float(m.group(1).replace(',', ''))
code = m.group(2)
# Invert to canonical LC-per-foreign format
lc_per_foreign = round(1.0 / foreign_per_lc, 6)
results.append(_make(source, code, None, None, mid=lc_per_foreign))
4.4 Unit Normalisation (Rate Inversion)
Sources are inconsistent in their unit convention. Some express rates as foreign-currency-per-LC (e.g., 317.25 INR per 1 LC); others express them as LC-per-foreign-unit (e.g., 0.003151 LC per 1 INR). The system detects and corrects this automatically:
def _maybe_invert_rates(rates, source):
# If the majority of rate values are > 1, they are foreign-per-LC
high_count = sum(1 for r in rates
for k in ['buy_rate','sell_rate','mid_rate']
if r.get(k) and r[k] > 1)
total = sum(1 for r in rates
for k in ['buy_rate','sell_rate','mid_rate']
if r.get(k))
if total > 0 and high_count / total > 0.5:
for rate in rates:
for k in ['buy_rate','sell_rate','mid_rate']:
if rate.get(k) and rate[k] > 0:
rate[k] = round(1.0 / rate[k], 6)
return rates
4.5 PDF Parsing
Several banking institutions publish their rate sheets as PDF documents, typically updated daily. A two-stage PDF pipeline extracts text using PyMuPDF (primary) or pdfplumber (fallback), then applies the same regex-based extraction:
def _parse_pdf(content: bytes, source: str):
try:
import fitz
doc = fitz.open(stream=content, filetype='pdf')
text = '\n'.join(page.get_text() for page in doc)
except ImportError:
import pdfplumber
with pdfplumber.open(io.BytesIO(content)) as pdf:
text = '\n'.join(p.extract_text() or '' for p in pdf.pages)
# Pattern: CODE buy_val sell_val (within 100 chars)
for m in re.finditer(
r'\b([A-Z]{3})\b.{0,100}?(\d+\.\d{3,6}).{0,50}?(\d+\.\d{3,6})',
text):
...
5. The Transfer Rate vs Cash Rate Distinction
Correctly identifying the rate type is the most domain-critical challenge in this system. Scraping the wrong rate type produces data that is systematically misleading.
5.1 Rate Type Classification
Through manual verification of each FSP source against its live website, sources were classified into three groups:
- Transfer-explicit sources — The website clearly labels rates as 'Remittance Rate', 'Wire Transfer Rate', or 'TT Rate'. The scraper targets these specific elements or API parameters.
- Buy/sell-only sources — The website publishes only a buy/sell spread (typically banks). The sell rate (exchange sells foreign to the customer) is used as the transfer rate proxy, since this is the rate a customer pays when sending money abroad.
- Removed sources — Sources that publish only cash exchange rates with no transfer rate available are excluded from the aggregation.
5.2 The _make() Function & mid_rate Convention
All scrapers produce rate records through a single canonical factory function. The mid_rate field is designated as the transfer rate — the single value used for all downstream comparison and ranking logic:
def _make(source, code, buy, sell, mid=None, unit=1) -> dict:
if mid is None:
# sell_rate = exchange sells foreign to customer = transfer rate proxy
mid = sell if sell is not None else buy
return {
'source': source,
'currency': code,
'buy_rate': buy, # LC/foreign — may be None for transfer-only sources
'sell_rate': sell, # LC/foreign — may be None for transfer-only sources
'mid_rate': mid, # LC/foreign — ALWAYS populated = transfer rate
'unit': unit,
'scraped_at': datetime.utcnow().isoformat(),
}
The mid_rate field provides a single, comparable transfer rate across all sources regardless of whether the source is transfer-explicit or buy/sell only. All ranking, best-rate logic, and UI display operate exclusively on mid_rate.
5.3 The Customer-Perspective API Exception
One major regional money transfer operator publishes rates from the customer's perspective rather than the exchange's perspective: their API labels the outbound transfer rate as 'buyrate' (the customer buys foreign currency). This is the inverse of the bank-table convention used by all other sources. The scraper handles this with a per-source override:
# FSP-A: API uses customer-perspective naming (inverted vs standard)
# API 'sellrate' = FSP sells foreign (customer buys) = transfer rate
# API 'buyrate' = exchange buys foreign (customer sells) = not transfer
buy = round(1.0 / sell_raw, 6) # customer buys foreign = transfer rate
sell = round(1.0 / buy_raw, 6) # customer sells foreign
# Explicitly pass mid=buy so _make() uses the correct column
results.append(_make(SOURCE, code, buy, sell, mid=buy))
6. Automated Pipeline & Data Infrastructure
6.1 GitHub Actions Scheduler
The entire scraping and publishing pipeline runs as a GitHub Actions workflow triggered on a 3-hour cron schedule — a zero-infrastructure, zero-cost execution environment with full secret management and logging:
# .github/workflows/scrape.yml
on:
schedule:
- cron: '0 */3 * * *' # 8 times per day
push:
branches: [main]
workflow_dispatch: # manual trigger from GitHub UI
jobs:
scrape-and-update:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: '3.11' }
- run: pip install -r requirements.txt
- name: Inject Google credentials from secret
run: echo "$GCP_KEY" > service-account-key.json
env: { GCP_KEY: ${{ secrets.GCP_SERVICE_ACCOUNT_KEY }} }
- run: python github_action_scrape.py
env:
GOOGLE_SHEETS_SPREADSHEET_ID: ${{ secrets.GOOGLE_SHEETS_SPREADSHEET_ID }}
SCRAPER_PROXY: ${{ secrets.SCRAPER_PROXY }}
6.2 Google Sheets as Data Store
Google Sheets serves as the sole persistence layer. Two tabs are maintained: a Latest tab (overwritten on every run with the most recent rate for each source/currency pair) and a History tab (each run's complete result set is appended, enabling rate trend analysis over time).
Rates with null buy/sell values required a specific fix: Python's dict.get() returns None when a key exists but its value is None. The Sheets API rejects None values; empty strings must be passed instead:
# Incorrect: returns None if key exists but value is None
row = [r.get('buy_rate', ''), r.get('sell_rate', ''), ...]
# Correct: converts None to empty string
row = [r.get('buy_rate') or '', r.get('sell_rate') or '', ...]
7. Web Analytics Dashboard
A single-page web dashboard provides a real-time view of the aggregated rate data. It reads from the Google Sheets CSV export endpoint directly in the browser — no backend server required.
7.1 Best Rate Logic
The dashboard computes the best transfer rate per currency in pure JavaScript. Since rates are stored as LC-per-foreign-unit, the exchange with the lowest stored value offers the most foreign currency per LC unit:
function computeBestRates(rows, currency) {
const filtered = rows.filter(r =>
r.currency === currency && r.mid_rate != null &&
parseFloat(r.mid_rate) > 0
);
// Lowest LC/foreign = most foreign per LC = best transfer rate
let best = null;
for (const r of filtered) {
if (!best || parseFloat(r.mid_rate) < parseFloat(best.mid_rate))
best = r;
}
return { best_rate: best?.mid_rate, best_source: best?.source };
}
7.2 Analytics Table
The analytics section aggregates the history tab to compute, per exchange per currency: latest transfer rate, average transfer rate, data point count, and last seen timestamp. Sorting is performed client-side on any column. The exchange with the best current rate receives a star badge.
8. Mobile Application Delivery
A React Native mobile application built with Expo delivers the same rate intelligence to smartphones. The app reads from the identical Google Sheets CSV endpoint used by the web dashboard, requiring no separate mobile backend.
8.1 Data Flow
const CSV_URL =
'https://docs.google.com/spreadsheets/d/{ID}/gviz/tq?tqx=out:csv&sheet=Latest';
const fetchRates = async () => {
const resp = await fetch(CSV_URL);
const text = await resp.text();
const lines = text.split('\n');
return lines.slice(1).map(line => {
const cols = line.split(/,(?=(?:(?:[^"]*"){2})*[^"]*$)/);
return {
source: clean(cols[2]),
currency: clean(cols[3]),
buy: parseFloat(clean(cols[4])) || null,
sell: parseFloat(clean(cols[5])) || null,
mid: parseFloat(clean(cols[6])) || null, // transfer rate
lastUpdated: clean(cols[1]),
};
});
};
8.2 Transfer Rate as Primary Display
The app displays the transfer rate (mid) as the primary metric on each exchange card. The displayed value is inverted from the stored LC-per-foreign representation to the user-facing foreign-per-LC format:
// Sort: lowest stored mid = highest displayed = best transfer rate
const getSortedRates = () => {
const filtered = data.filter(d =>
d.currency === selectedCurrency && d.mid !== null
);
filtered.sort((a, b) => (a.mid || Infinity) - (b.mid || Infinity));
return filtered; // index 0 = best exchange
};
// Display: invert to foreign-per-LC for human readability
const displayRate = (mid) => (1 / mid).toFixed(4);
// e.g. stored 0.003150 → displayed 317.4603 INR per 1 LC
9. Evaluation & Results
This section reports the empirical performance of the deployed system, measured against a live snapshot taken on 28 March 2026. All figures are drawn directly from the production Google Sheets data store.
9.1 Source Coverage
Of the 22 FSP sources initially catalogued, 14 are currently returning data on every scheduled run — a live coverage rate of 63.6%. The remaining 8 sources are unreachable due to JavaScript rendering with no accessible API, aggressive bot-detection that defeats both the proxy and header-spoofing layer, or rate pages restructured since initial implementation.
| FSP | Status | Currencies | Extraction Method |
|---|---|---|---|
| Bank FSP-1 | ● Active | 20 | HTML table (generic) |
| Bank FSP-2 | ● Active | 25 | HTML table (generic) |
| Bank FSP-3 | ● Active | 30 | HTML table (generic) |
| Bank FSP-4 | ● Active | 17 | JSON-LD / script tag |
| Exchange FSP-5 (MTF) | ● Active | 106 | REST API (direct) |
| Exchange FSP-6 (MTF) | ● Active | 107 | REST API (direct) |
| Exchange FSP-7 (MTF) | ● Active | 14 | REST API (inverted) |
| Exchange FSP-8 | ● Active | 65 | WordPress AJAX (nonce) |
| Exchange FSP-9 | ● Active | 17 | HTML table (generic) |
| Exchange FSP-10 | ● Active | 8 | HTML table (generic) |
| Exchange FSP-11 | ● Active | 6 | HTML table (generic) |
| Exchange FSP-12 | ● Active | 5 | HTML table (generic) |
| Exchange FSP-13 | ● Active | 21 | Vue SPA REST API |
| Exchange FSP-14 | ● Active | 11 | JSON-LD / script tag |
| 8 additional FSPs | ○ Inactive | — | JS-rendered / bot-blocked |
Table 3. Source coverage as of 28 March 2026
9.2 Rate Accuracy
All 14 active sources were manually verified by cross-referencing the scraped transfer rates against the rates displayed on each FSP's live website at the time of the scrape. In every case the scraped mid_rate matched the displayed transfer rate exactly to the precision published by the FSP. For sources that present rates as a foreign-per-LC figure (e.g. '244.38 INR per 1 LC'), the system's unit inversion produced a stored value matching the reciprocal to 6 decimal places.
9.3 Cross-FSP Rate Spread
The most commercially significant finding is the magnitude of the transfer rate spread across FSPs for the same currency pair observed on the same day:
| Corridor | Best Rate (foreign / 1 LC) | Worst Rate (foreign / 1 LC) | Spread |
|---|---|---|---|
| LC → INR | 244.41 (Exchange FSP-6) | 226.19 (Bank FSP-3) | 7.45% |
| LC → USD | 2.5940 (Exchange FSP-7) | 2.5575 (Bank FSP-1) | 1.41% |
Table 4. Cross-FSP transfer rate spread · live snapshot, 28 March 2026
A 7.45% intra-market spread for the same currency pair on the same day — across FSPs operating in the same regulatory jurisdiction — demonstrates that significant value is locked behind information asymmetry. On a transaction of 100 LC, a customer using the best-rate FSP receives INR 244.41 while a customer using the worst-rate provider receives INR 226.19 — a difference of INR 18.22. For a worker remitting 500 LC per month, systematic use of the best-rate FSP would yield approximately INR 1,100 in additional value monthly. This is precisely the gap the system is designed to close.
9.4 Pipeline Performance
(63.6% coverage)
per run · 135 currencies
pipeline runtime
GitHub Actions free tier
10. Commercial Viability & Use Cases
The architectural pattern described in this paper is not specific to the implementation geography or currency. The components — heterogeneous web scraping, NLP rate extraction, cloud data store, scheduled pipeline, and consumer applications — are fully generalisable.
10.1 Direct Consumer Applications
- Remittance comparison app — Users in any high-remittance economy can compare live transfer rates across all local FSPs before transacting. Monetised through affiliate referral fees from FSPs.
- Corporate treasury tool — Finance teams managing multi-currency payables can monitor rate movements and trigger alerts when rates cross target thresholds. Monetised through SaaS subscription.
- Travelling consumer app — Real-time comparison of cash exchange rates at airports, hotels, and local bureaux de change.
10.2 B2B Data Licensing
- Financial data vendors — Aggregated, normalised, timestamped rate data from non-mainstream geographies is scarce. The pipeline produces a clean historical dataset with 8+ snapshots per day per FSP, suitable for licensing to Bloomberg, Refinitiv, or regional equivalents.
- Fintech platforms — Remittance aggregators and payment orchestration platforms require real-time rate benchmarks to route transactions to the cheapest available provider. This system's output is directly usable as a routing signal.
- Compliance and audit — Regulators and financial auditors require evidence of fair pricing. Historical rate datasets with source provenance and timestamps support compliance reporting.
10.3 Geographic & Vertical Expansion
- Any high-remittance corridor — South-East Asia (SGD, MYR), West Africa (CFA corridors), Eastern Europe (EUR corridors).
- Cryptocurrency exchanges — The same scraping and normalisation patterns apply to crypto-to-fiat rate aggregation across centralised and decentralised exchanges.
- Commodity pricing — Small-economy commodity markets often exhibit the same fragmented, non-aggregated information structure.
- Insurance premium comparison — Insurance products in emerging markets are similarly distributed across heterogeneous, hard-to-scrape websites with no aggregator.
10.4 White-Label Platform
The entire stack — scraper library, pipeline, dashboard, and mobile app — can be repackaged as a white-label product for financial institutions, government agencies, or consumer brands seeking to launch a rate comparison tool for their market. The modular scraper architecture means adding a new geography requires only adding source-specific scraper functions; the normalisation, data store, and consumer layers require no changes.
11. Lessons Learned & Future Work
11.1 Technical Lessons
- Rate type confusion is the dominant data quality risk. Cash rates and transfer rates are structurally different products. Any aggregation system that does not explicitly resolve this distinction will produce misleading data. The resolution requires per-source verification, not algorithmic detection.
- No extraction strategy works universally. The layered fallback chain is essential. Approximately 40% of sources were only reachable via one specific strategy; a single-strategy scraper would have failed on nearly half the source set.
- API discovery yields more stable scrapers. HTML structure changes frequently; API endpoints are much more stable. Investing time to discover and reverse-engineer backend APIs — even undocumented ones — pays dividends in scraper longevity.
- Unit conventions are inconsistently documented. The foreign-per-LC vs LC-per-foreign inversion problem affected over a third of sources. Automated inversion detection based on value magnitude is effective but should be validated manually for each new source.
- Google Sheets as a data store has real limitations. No transaction support, 10MB cell limit, no indexing. For production scale (>50 sources, sub-hour update frequency), a lightweight database (SQLite, Supabase, or Turso) would be preferable.
11.2 Future Work
- Rate change alerting — Push notifications to mobile app users when a rate crosses a user-defined threshold.
- Historical trend charts — Time-series visualisations in both the web dashboard and mobile app, consuming the History tab.
- Rate confidence scoring — Assign a freshness and reliability score to each rate based on scraper success rate, last successful scrape time, and deviation from peer rates.
- ML for column semantics — Replace the positional heuristic for buy/sell column detection with a trained classifier using column header text as features.
- Automatic new source detection — Monitor the central bank's licensed FSP registry for new entrants and auto-generate scraper stubs for review.
12. Conclusion
This paper presented the end-to-end design and implementation of a real-time forex intelligence system, covering multi-strategy web scraping, NLP-driven rate extraction, automated pipeline orchestration, and dual-channel delivery via web and mobile application.
The central technical contributions are: (1) a layered extraction engine capable of handling the full heterogeneity of FSP websites without per-source maintenance beyond initial configuration; (2) a principled distinction between cash and transfer rates with an explicit mid_rate convention that provides a single comparable transfer rate across all source types; and (3) a zero-infrastructure pipeline built entirely on GitHub Actions and Google Sheets, demonstrating that production-grade financial data aggregation does not require expensive server infrastructure.
The system described in this paper covers a single economy with 20+ sources and currently operates with 100% automated uptime at zero marginal cost. Scaling to 10 economies with 200+ sources is an engineering exercise, not a research problem — the architecture already supports it. The commercial opportunity is substantial: the same architectural pattern is directly transferable to any geography with fragmented FSP rate data, which describes the majority of high-remittance economies worldwide.