Real-Time Forex Intelligence System

Fragmented, inconsistent, and poorly accessible exchange rate information is a widespread problem in high-remittance economies. This paper presents the design and implementation of a fully automated forex intelligence system built to aggregate real-time transfer rates across more than 20 heterogeneous financial service providers — banks, exchange houses, and money transfer operators — from a single GCC-region country. The system employs a layered, multi-strategy scraping engine capable of handling static HTML, JavaScript-rendered SPAs, REST APIs, PDF rate sheets, and AJAX-secured WordPress plugins. A lightweight NLP and pattern-matching layer resolves the structural diversity of rate data. Rates are published to a cloud spreadsheet store on a 3-hour automated schedule using GitHub Actions and consumed by both a web analytics dashboard and a cross-platform React Native mobile application.

1. Introduction & Problem Statement

Remittance flows constitute a significant portion of GDP in many developing economies. In high-remittance corridors — particularly those connecting GCC countries to South and South-East Asian labour markets — millions of workers regularly convert a local currency (LC) into destination currencies such as INR, PHP, PKR, BDT, LKR, or EGP. The rates offered by individual exchange providers vary materially, and even small differences translate into meaningful sums over time.

Despite the volume and frequency of these transactions, the information landscape remains fragmented: each FSP publishes rates independently on its own website, often with inconsistent formatting, varying update frequencies, and — critically — without distinguishing between cash exchange rates and transfer/remittance rates. The two are structurally different products, and the confusion between them has direct financial consequences for end users.

1.1 The Cash vs Transfer Rate Problem

Cash exchange rates and transfer (also called wire, TT, or remittance) rates are priced differently by FSPs. Cash transactions carry higher operational costs — handling, vault management, currency risk — and are therefore quoted at a wider spread. Transfer rates, by contrast, reflect the lower cost of an electronic funds movement and consistently offer the customer more foreign currency per unit of local currency.

Example

For a representative destination currency, observed cash rates in this study were approximately 15–20% lower than the corresponding transfer rate from the same provider — a difference of significant monetary value on typical remittance amounts.

Any aggregation system that conflates the two rate types produces misleading comparisons, potentially directing customers to sub-optimal providers based on inflated cash-rate figures.

1.2 Research Objectives

Aggregate real-time transfer rates from all major FSPs in a target economy
Handle the full heterogeneity of source formats without manual per-source maintenance
Correctly identify and extract transfer rates, not cash rates
Deliver the aggregated data to end users via both a web dashboard and a mobile application
Automate the entire pipeline with zero human intervention after deployment

2. System Architecture

The system is designed as a linear, cloud-native pipeline with five logical layers. Each layer has a single responsibility and communicates with the next through a well-defined interface.

Layer	Responsibility
1. Scraping Engine	Fetches and extracts raw rate data from FSP websites and APIs
2. Normalisation	Converts all rates to a canonical LC-per-foreign-unit format; assigns rate type
3. Data Store	Google Sheets; two tabs: Latest (overwritten) and History (appended)
4. Scheduler	GitHub Actions workflow on a 3-hour cron schedule; injects secrets at runtime
5. Consumers	Web analytics dashboard (HTML/JS) and React Native mobile app — both read from the same Sheets CSV export

Table 1. Pipeline layer overview

This architecture deliberately avoids a traditional database. Google Sheets functions as an always-on, zero-maintenance data store with a built-in HTTP API (the gviz/tq CSV export endpoint), accessible without authentication from both the web dashboard and the mobile app.

2.1 Technology Choices

Python 3.11 — scraping and normalisation logic
requests + BeautifulSoup + lxml — HTTP and HTML parsing
Selenium (headless Chrome) — JavaScript-rendered page fallback
Google Sheets API (service account) — data persistence
GitHub Actions — scheduling and orchestration
HTML + Vanilla JS + Bootstrap + Chart.js — web dashboard
React Native (Expo) — mobile application

3. Data Collection: Multi-Strategy Scraping Engine

The scraping engine is the most technically complex component of the system. Across 20+ FSPs, no two sources use the same data format, delivery mechanism, or rate presentation convention. The engine solves this with a layered extraction strategy and a library of specialised parsers.

3.1 Source Taxonomy

Source Type	Extraction Method	Rate Type	Notes
Bank portal (SharePoint)	HTML table → custom parser	Buy / Sell	Currency code lookup from dropdown
Bank portal (generic)	Multi-strategy _extract()	Buy / Sell	Fallback chain: JSON → table → regex
Exchange API (REST)	Direct JSON endpoint	Transfer only	Inverted from foreign-per-LC format
Exchange API (WordPress AJAX)	Nonce-authenticated POST	Transfer only	trtype=BT param distinguishes transfer vs cash
Exchange website (Vue SPA)	POST to backend API	Transfer only	Multiple payload shapes tried
Exchange website (static HTML)	Table + OMR= pattern match	Transfer only	Regex: '1 LC = X CCY'
PDF rate sheet	PyMuPDF / pdfplumber	Buy / Sell	Regex pattern on extracted text
JS-rendered page	Selenium headless browser	Mixed	Fallback for non-SSR sources

Table 2. Source taxonomy and extraction methods

3.2 The Layered Extraction Engine

For sources without a known API, a generic multi-strategy extraction engine is applied. Strategies are tried in order of reliability, falling back to the next on failure:

          Python
          Figure 1 · Layered extraction fallback chain
        

def _extract(html, source):
    # Strategy 1: Embedded JSON (Next.js __NEXT_DATA__, window.__STATE__, etc.)
    results = _try_nextjs(html, source)

    # Strategy 2: JSON-LD structured data
    if not results: results = _try_json_ld(html, source)

    # Strategy 3: Arbitrary <script> tag JSON blobs
    if not results: results = _try_script_json(html, source)

    # Strategy 4: HTML tables (tabular rate sheets)
    if not results: results = _try_table(html, source)

    # Strategy 5: CSS card/div layouts
    if not results: results = _try_cards(html, source)

    # Strategy 6: Full-page regex (last resort)
    if not results: results = _try_regex_fulltext(html, source)

    return results

3.3 API Reverse Engineering

Several exchange house websites use single-page application frameworks that load rate data via AJAX calls to internal backend APIs. Identifying these endpoints requires inspection of browser network traffic and JavaScript source files. Two representative cases illustrate the approach.

3.3.1 WordPress AJAX (Nonce-Protected Endpoint)

One FSP uses a WordPress-based currency converter plugin that exposes an admin-ajax.php endpoint requiring a nonce value generated fresh per page load — a common anti-automation measure. The scraper extracts the nonce from the rendered homepage HTML before making the API call:

          Python
          Figure 2 · Nonce-authenticated WordPress AJAX extraction
        

# Step 1: Fetch homepage to extract nonce
resp = requests.get(HOMEPAGE)
nonce = re.search(
    r'PluginConfig.*?"ajax_nonce"\s*:\s*"([^"]+)"', resp.text
).group(1)

# Step 2: POST to AJAX endpoint with trtype=BT (Bank Transfer)
r = session.post(AJAX_URL, data={
    'action':   'convert_action',
    'currfrom': LC_CODE,
    'currto':   dest_code,
    'amt':      1,
    'security': nonce,
    'trtype':   'BT',  # BT = Bank Transfer (not CP = Cash Pickup)
})
foreign_per_lc = r.json()['amount']

The trtype=BT parameter is critical: using CP (Cash Pickup) returns cash rates, which are significantly different. This distinction was discovered by inspecting the plugin's JavaScript source — an example of how rate type resolution requires source-level investigation, not just data parsing.

3.3.2 Vue SPA REST API

Another FSP uses a Vue.js SPA backed by a REST API. The exact shape of the POST payload varies by deployment version. The scraper tries multiple payload forms in sequence and uses the first successful response:

          Python
          Figure 3 · Resilient multi-payload API probing
        

PAYLOADS = [
    {},
    {'fromCur': 'LC'},
    {'fromCur': 'LC', 'toCur': 'ALL'},
    {'sendCurrency': 'LC', 'recvCurrency': 'ALL'},
    {'baseCurrency': 'LC'},
]

for payload in PAYLOADS:
    resp = requests.post(API_URL, json=payload, headers=HEADERS)
    if resp.ok and resp.json():
        return parse_response(resp.json())

3.4 Selenium Fallback

JavaScript-rendered sources that cannot be accessed via direct HTTP requests are handled by a Selenium-based headless Chrome scraper. After page load, the scraper applies the same extraction strategies to the rendered DOM, with additional table and card parsing logic suited to dynamic content.

3.5 Geofencing & Geographic Access Restrictions

A significant operational challenge arises from the geographic deployment gap: the scraping pipeline runs on GitHub Actions infrastructure hosted in US-region data centres (Microsoft Azure), while the target FSP websites are designed to serve users within a specific GCC country. Many of these sites implement geofencing and return empty content, redirects, or outright blocks to requests from foreign IP ranges.

Three geofencing mechanisms were encountered across the source set:

IP-based blocking — The server checks the client IP against a GCC/regional whitelist. Requests from US Azure IP blocks receive a 403 or redirect to a 'not available in your region' page.
Content gating — The page loads but rate tables are replaced with placeholder text or empty containers for non-local IPs — making the block harder to detect programmatically.
Header fingerprinting — Some sites cross-reference the IP with browser headers (User-Agent, Accept-Language, Referer). A non-regional IP combined with a generic Python User-Agent triggers bot-detection middleware.

The solution is a residential proxy with a local in-country IP address, injected at runtime via a GitHub Actions secret:

          Python
          Figure 4 · Proxy routing and header spoofing to bypass geofencing
        

# Proxy injected from GitHub Actions secret at runtime
PROXY   = os.environ.get('SCRAPER_PROXY')
PROXIES = {'http': PROXY, 'https': PROXY} if PROXY else {}

# Browser-mimicking headers sent on every request
DEFAULT_HEADERS = {
    'User-Agent':      'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ...',
    'Accept-Language': 'en-GB,en;q=0.9,ar;q=0.8',
    'Accept':          'text/html,application/xhtml+xml,...',
    'Referer':         'https://www.google.com/',
}

def _get(url, **kwargs):
    return requests.get(
        url,
        proxies=PROXIES,
        headers={**DEFAULT_HEADERS, **kwargs.pop('headers', {})},
        timeout=15,
        **kwargs
    )

Note

The proxy approach also resolves a secondary challenge: some FSPs serve different rate data to different regions (e.g., rates quoted in AED on the UAE version of a site vs LC on the local version). Routing through a local IP ensures the scraper sees the same rates a local customer would see.

4. NLP & Pattern Matching for Rate Extraction

Raw web content does not contain labelled, structured rate data. Extracting accurate rates requires a lightweight NLP layer capable of identifying currency codes, interpreting table column semantics, and normalising numerical values across a wide variety of surface forms.

4.1 Currency Code Detection

ISO 4217 currency codes are three-letter alphabetical sequences. However, the same pattern appears in many non-currency contexts on financial pages: abbreviations, bank names, country codes, UI labels. A curated exclusion list filters false positives:

          Python
          Figure 6 · Currency code detection with false-positive exclusion
        

# Exclusion list — non-currency tokens matching [A-Z]{3}
_NOT_CURRENCY = {
    'THE', 'FOR', 'AND', 'YOU', 'ARE', 'OUR', 'ALL',
    'FSP', 'MTF', 'FIN', 'EXC',  # FSP/brand abbreviations
}

def _code_from_cells(cells):
    for cell in cells:
        if re.fullmatch(r'[A-Z]{3}', cell) and cell not in _NOT_CURRENCY:
            return cell
    # Fallback: extract from longer text
    for cell in cells:
        m = re.search(r'\b([A-Z]{3})\b', cell)
        if m and m.group(1) not in _NOT_CURRENCY:
            return m.group(1)
    return None

4.2 Table Column Semantics

HTML rate tables vary in column ordering and labelling. The most common pattern across the surveyed FSPs is a two-column numeric table (Buy | Sell from the exchange's perspective), but the order is not guaranteed. The parser maps positional indices to semantic roles:

          Python
          Figure 7 · Table column semantic mapping
        

for row in table.find_all('tr'):
    cells = [c.get_text(' ', strip=True) for c in row.find_all(['td','th'])]
    code = _code_from_cells(cells)
    nums = [_safe_float(c) for c in cells if _safe_float(c) is not None]

    # First numeric value  = buy_rate  (exchange buys foreign from customer)
    # Second numeric value = sell_rate (exchange sells foreign to customer)
    buy  = nums[0] if len(nums) > 0 else None
    sell = nums[1] if len(nums) > 1 else None

4.3 The '1 LC = X CCY' Pattern

Many exchange house websites present rates as natural-language expressions of the form '1 Local Currency = 317.25 INR'. This is a single-rate reference format that requires regex extraction and unit inversion:

          Python
          Figure 8 · Natural-language rate pattern extraction
        

for m in re.finditer(
    r'1\s+LC\s*=\s*([\d,]+\.?\d*)\s+([A-Z]{3})\b', page_text
):
    foreign_per_lc = float(m.group(1).replace(',', ''))
    code = m.group(2)
    # Invert to canonical LC-per-foreign format
    lc_per_foreign = round(1.0 / foreign_per_lc, 6)
    results.append(_make(source, code, None, None, mid=lc_per_foreign))

4.4 Unit Normalisation (Rate Inversion)

Sources are inconsistent in their unit convention. Some express rates as foreign-currency-per-LC (e.g., 317.25 INR per 1 LC); others express them as LC-per-foreign-unit (e.g., 0.003151 LC per 1 INR). The system detects and corrects this automatically:

          Python
          Figure 9 · Automatic rate inversion detection
        

def _maybe_invert_rates(rates, source):
    # If the majority of rate values are > 1, they are foreign-per-LC
    high_count = sum(1 for r in rates
                     for k in ['buy_rate','sell_rate','mid_rate']
                     if r.get(k) and r[k] > 1)
    total = sum(1 for r in rates
                for k in ['buy_rate','sell_rate','mid_rate']
                if r.get(k))

    if total > 0 and high_count / total > 0.5:
        for rate in rates:
            for k in ['buy_rate','sell_rate','mid_rate']:
                if rate.get(k) and rate[k] > 0:
                    rate[k] = round(1.0 / rate[k], 6)
    return rates

4.5 PDF Parsing

Several banking institutions publish their rate sheets as PDF documents, typically updated daily. A two-stage PDF pipeline extracts text using PyMuPDF (primary) or pdfplumber (fallback), then applies the same regex-based extraction:

          Python
          Figure 10 · PDF rate extraction pipeline
        

def _parse_pdf(content: bytes, source: str):
    try:
        import fitz
        doc  = fitz.open(stream=content, filetype='pdf')
        text = '\n'.join(page.get_text() for page in doc)
    except ImportError:
        import pdfplumber
        with pdfplumber.open(io.BytesIO(content)) as pdf:
            text = '\n'.join(p.extract_text() or '' for p in pdf.pages)

    # Pattern: CODE buy_val sell_val (within 100 chars)
    for m in re.finditer(
        r'\b([A-Z]{3})\b.{0,100}?(\d+\.\d{3,6}).{0,50}?(\d+\.\d{3,6})',
        text):
        ...

5. The Transfer Rate vs Cash Rate Distinction

Correctly identifying the rate type is the most domain-critical challenge in this system. Scraping the wrong rate type produces data that is systematically misleading.

5.1 Rate Type Classification

Through manual verification of each FSP source against its live website, sources were classified into three groups:

Transfer-explicit sources — The website clearly labels rates as 'Remittance Rate', 'Wire Transfer Rate', or 'TT Rate'. The scraper targets these specific elements or API parameters.
Buy/sell-only sources — The website publishes only a buy/sell spread (typically banks). The sell rate (exchange sells foreign to the customer) is used as the transfer rate proxy, since this is the rate a customer pays when sending money abroad.
Removed sources — Sources that publish only cash exchange rates with no transfer rate available are excluded from the aggregation.

5.2 The _make() Function & mid_rate Convention

All scrapers produce rate records through a single canonical factory function. The mid_rate field is designated as the transfer rate — the single value used for all downstream comparison and ranking logic:

          Python
          Figure 11 · Canonical rate record factory with transfer rate assignment
        

def _make(source, code, buy, sell, mid=None, unit=1) -> dict:
    if mid is None:
        # sell_rate = exchange sells foreign to customer = transfer rate proxy
        mid = sell if sell is not None else buy

    return {
        'source':     source,
        'currency':   code,
        'buy_rate':   buy,   # LC/foreign — may be None for transfer-only sources
        'sell_rate':  sell,  # LC/foreign — may be None for transfer-only sources
        'mid_rate':   mid,   # LC/foreign — ALWAYS populated = transfer rate
        'unit':       unit,
        'scraped_at': datetime.utcnow().isoformat(),
    }

Key Insight

The mid_rate field provides a single, comparable transfer rate across all sources regardless of whether the source is transfer-explicit or buy/sell only. All ranking, best-rate logic, and UI display operate exclusively on mid_rate.

5.3 The Customer-Perspective API Exception

One major regional money transfer operator publishes rates from the customer's perspective rather than the exchange's perspective: their API labels the outbound transfer rate as 'buyrate' (the customer buys foreign currency). This is the inverse of the bank-table convention used by all other sources. The scraper handles this with a per-source override:

          Python
          Figure 12 · Handling inverted naming convention in customer-perspective API
        

# FSP-A: API uses customer-perspective naming (inverted vs standard)
# API 'sellrate' = FSP sells foreign (customer buys) = transfer rate
# API 'buyrate'  = exchange buys foreign (customer sells) = not transfer
buy  = round(1.0 / sell_raw, 6)  # customer buys foreign = transfer rate
sell = round(1.0 / buy_raw,  6)  # customer sells foreign

# Explicitly pass mid=buy so _make() uses the correct column
results.append(_make(SOURCE, code, buy, sell, mid=buy))

6. Automated Pipeline & Data Infrastructure

6.1 GitHub Actions Scheduler

The entire scraping and publishing pipeline runs as a GitHub Actions workflow triggered on a 3-hour cron schedule — a zero-infrastructure, zero-cost execution environment with full secret management and logging:

          YAML
          Figure 13 · GitHub Actions workflow definition
        

# .github/workflows/scrape.yml
on:
  schedule:
    - cron: '0 */3 * * *'   # 8 times per day
  push:
    branches: [main]
  workflow_dispatch:         # manual trigger from GitHub UI

jobs:
  scrape-and-update:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: '3.11' }
      - run: pip install -r requirements.txt
      - name: Inject Google credentials from secret
        run: echo "$GCP_KEY" > service-account-key.json
        env: { GCP_KEY: ${{ secrets.GCP_SERVICE_ACCOUNT_KEY }} }
      - run: python github_action_scrape.py
        env:
          GOOGLE_SHEETS_SPREADSHEET_ID: ${{ secrets.GOOGLE_SHEETS_SPREADSHEET_ID }}
          SCRAPER_PROXY:                ${{ secrets.SCRAPER_PROXY }}

6.2 Google Sheets as Data Store

Google Sheets serves as the sole persistence layer. Two tabs are maintained: a Latest tab (overwritten on every run with the most recent rate for each source/currency pair) and a History tab (each run's complete result set is appended, enabling rate trend analysis over time).

Rates with null buy/sell values required a specific fix: Python's dict.get() returns None when a key exists but its value is None. The Sheets API rejects None values; empty strings must be passed instead:

          Python
          Figure 14 · None-to-empty-string fix for Google Sheets API
        

# Incorrect: returns None if key exists but value is None
row = [r.get('buy_rate', ''), r.get('sell_rate', ''), ...]

# Correct: converts None to empty string
row = [r.get('buy_rate') or '', r.get('sell_rate') or '', ...]

7. Web Analytics Dashboard

A single-page web dashboard provides a real-time view of the aggregated rate data. It reads from the Google Sheets CSV export endpoint directly in the browser — no backend server required.

7.1 Best Rate Logic

The dashboard computes the best transfer rate per currency in pure JavaScript. Since rates are stored as LC-per-foreign-unit, the exchange with the lowest stored value offers the most foreign currency per LC unit:

          JavaScript
          Figure 15 · Best transfer rate computation
        

function computeBestRates(rows, currency) {
    const filtered = rows.filter(r =>
        r.currency === currency && r.mid_rate != null &&
        parseFloat(r.mid_rate) > 0
    );
    // Lowest LC/foreign = most foreign per LC = best transfer rate
    let best = null;
    for (const r of filtered) {
        if (!best || parseFloat(r.mid_rate) < parseFloat(best.mid_rate))
            best = r;
    }
    return { best_rate: best?.mid_rate, best_source: best?.source };
}

7.2 Analytics Table

The analytics section aggregates the history tab to compute, per exchange per currency: latest transfer rate, average transfer rate, data point count, and last seen timestamp. Sorting is performed client-side on any column. The exchange with the best current rate receives a star badge.

8. Mobile Application Delivery

A React Native mobile application built with Expo delivers the same rate intelligence to smartphones. The app reads from the identical Google Sheets CSV endpoint used by the web dashboard, requiring no separate mobile backend.

8.1 Data Flow

          TypeScript
          Figure 16 · Mobile app CSV data fetch and parsing
        

const CSV_URL =
    'https://docs.google.com/spreadsheets/d/{ID}/gviz/tq?tqx=out:csv&sheet=Latest';

const fetchRates = async () => {
    const resp  = await fetch(CSV_URL);
    const text  = await resp.text();
    const lines = text.split('\n');

    return lines.slice(1).map(line => {
        const cols = line.split(/,(?=(?:(?:[^"]*"){2})*[^"]*$)/);
        return {
            source:      clean(cols[2]),
            currency:    clean(cols[3]),
            buy:         parseFloat(clean(cols[4])) || null,
            sell:        parseFloat(clean(cols[5])) || null,
            mid:         parseFloat(clean(cols[6])) || null, // transfer rate
            lastUpdated: clean(cols[1]),
        };
    });
};

8.2 Transfer Rate as Primary Display

The app displays the transfer rate (mid) as the primary metric on each exchange card. The displayed value is inverted from the stored LC-per-foreign representation to the user-facing foreign-per-LC format:

          TypeScript
          Figure 17 · Sort and display logic for transfer rates in React Native
        

// Sort: lowest stored mid = highest displayed = best transfer rate
const getSortedRates = () => {
    const filtered = data.filter(d =>
        d.currency === selectedCurrency && d.mid !== null
    );
    filtered.sort((a, b) => (a.mid || Infinity) - (b.mid || Infinity));
    return filtered; // index 0 = best exchange
};

// Display: invert to foreign-per-LC for human readability
const displayRate = (mid) => (1 / mid).toFixed(4);
// e.g. stored 0.003150 → displayed 317.4603 INR per 1 LC

9. Evaluation & Results

This section reports the empirical performance of the deployed system, measured against a live snapshot taken on 28 March 2026. All figures are drawn directly from the production Google Sheets data store.

9.1 Source Coverage

Of the 22 FSP sources initially catalogued, 14 are currently returning data on every scheduled run — a live coverage rate of 63.6%. The remaining 8 sources are unreachable due to JavaScript rendering with no accessible API, aggressive bot-detection that defeats both the proxy and header-spoofing layer, or rate pages restructured since initial implementation.

FSP	Status	Currencies	Extraction Method
Bank FSP-1	● Active	20	HTML table (generic)
Bank FSP-2	● Active	25	HTML table (generic)
Bank FSP-3	● Active	30	HTML table (generic)
Bank FSP-4	● Active	17	JSON-LD / script tag
Exchange FSP-5 (MTF)	● Active	106	REST API (direct)
Exchange FSP-6 (MTF)	● Active	107	REST API (direct)
Exchange FSP-7 (MTF)	● Active	14	REST API (inverted)
Exchange FSP-8	● Active	65	WordPress AJAX (nonce)
Exchange FSP-9	● Active	17	HTML table (generic)
Exchange FSP-10	● Active	8	HTML table (generic)
Exchange FSP-11	● Active	6	HTML table (generic)
Exchange FSP-12	● Active	5	HTML table (generic)
Exchange FSP-13	● Active	21	Vue SPA REST API
Exchange FSP-14	● Active	11	JSON-LD / script tag
8 additional FSPs	○ Inactive	—	JS-rendered / bot-blocked

Table 3. Source coverage as of 28 March 2026

9.2 Rate Accuracy

All 14 active sources were manually verified by cross-referencing the scraped transfer rates against the rates displayed on each FSP's live website at the time of the scrape. In every case the scraped mid_rate matched the displayed transfer rate exactly to the precision published by the FSP. For sources that present rates as a foreign-per-LC figure (e.g. '244.38 INR per 1 LC'), the system's unit inversion produced a stored value matching the reciprocal to 6 decimal places.

9.3 Cross-FSP Rate Spread

The most commercially significant finding is the magnitude of the transfer rate spread across FSPs for the same currency pair observed on the same day:

Corridor	Best Rate (foreign / 1 LC)	Worst Rate (foreign / 1 LC)	Spread
LC → INR	244.41 (Exchange FSP-6)	226.19 (Bank FSP-3)	7.45%
LC → USD	2.5940 (Exchange FSP-7)	2.5575 (Bank FSP-1)	1.41%

Table 4. Cross-FSP transfer rate spread · live snapshot, 28 March 2026

Key Finding

A 7.45% intra-market spread for the same currency pair on the same day — across FSPs operating in the same regulatory jurisdiction — demonstrates that significant value is locked behind information asymmetry. On a transaction of 100 LC, a customer using the best-rate FSP receives INR 244.41 while a customer using the worst-rate provider receives INR 226.19 — a difference of INR 18.22. For a worker remitting 500 LC per month, systematic use of the best-rate FSP would yield approximately INR 1,100 in additional value monthly. This is precisely the gap the system is designed to close.

9.4 Pipeline Performance

14/22

Active sources
(63.6% coverage)

452

Rate records
per run · 135 currencies

<5min

End-to-end
pipeline runtime

£0

Infrastructure cost
GitHub Actions free tier

10. Commercial Viability & Use Cases

The architectural pattern described in this paper is not specific to the implementation geography or currency. The components — heterogeneous web scraping, NLP rate extraction, cloud data store, scheduled pipeline, and consumer applications — are fully generalisable.

10.1 Direct Consumer Applications

Remittance comparison app — Users in any high-remittance economy can compare live transfer rates across all local FSPs before transacting. Monetised through affiliate referral fees from FSPs.
Corporate treasury tool — Finance teams managing multi-currency payables can monitor rate movements and trigger alerts when rates cross target thresholds. Monetised through SaaS subscription.
Travelling consumer app — Real-time comparison of cash exchange rates at airports, hotels, and local bureaux de change.

10.2 B2B Data Licensing

Financial data vendors — Aggregated, normalised, timestamped rate data from non-mainstream geographies is scarce. The pipeline produces a clean historical dataset with 8+ snapshots per day per FSP, suitable for licensing to Bloomberg, Refinitiv, or regional equivalents.
Fintech platforms — Remittance aggregators and payment orchestration platforms require real-time rate benchmarks to route transactions to the cheapest available provider. This system's output is directly usable as a routing signal.
Compliance and audit — Regulators and financial auditors require evidence of fair pricing. Historical rate datasets with source provenance and timestamps support compliance reporting.

10.3 Geographic & Vertical Expansion

Any high-remittance corridor — South-East Asia (SGD, MYR), West Africa (CFA corridors), Eastern Europe (EUR corridors).
Cryptocurrency exchanges — The same scraping and normalisation patterns apply to crypto-to-fiat rate aggregation across centralised and decentralised exchanges.
Commodity pricing — Small-economy commodity markets often exhibit the same fragmented, non-aggregated information structure.
Insurance premium comparison — Insurance products in emerging markets are similarly distributed across heterogeneous, hard-to-scrape websites with no aggregator.

10.4 White-Label Platform

The entire stack — scraper library, pipeline, dashboard, and mobile app — can be repackaged as a white-label product for financial institutions, government agencies, or consumer brands seeking to launch a rate comparison tool for their market. The modular scraper architecture means adding a new geography requires only adding source-specific scraper functions; the normalisation, data store, and consumer layers require no changes.

11. Lessons Learned & Future Work

11.1 Technical Lessons

Rate type confusion is the dominant data quality risk. Cash rates and transfer rates are structurally different products. Any aggregation system that does not explicitly resolve this distinction will produce misleading data. The resolution requires per-source verification, not algorithmic detection.
No extraction strategy works universally. The layered fallback chain is essential. Approximately 40% of sources were only reachable via one specific strategy; a single-strategy scraper would have failed on nearly half the source set.
API discovery yields more stable scrapers. HTML structure changes frequently; API endpoints are much more stable. Investing time to discover and reverse-engineer backend APIs — even undocumented ones — pays dividends in scraper longevity.
Unit conventions are inconsistently documented. The foreign-per-LC vs LC-per-foreign inversion problem affected over a third of sources. Automated inversion detection based on value magnitude is effective but should be validated manually for each new source.
Google Sheets as a data store has real limitations. No transaction support, 10MB cell limit, no indexing. For production scale (>50 sources, sub-hour update frequency), a lightweight database (SQLite, Supabase, or Turso) would be preferable.

11.2 Future Work

Rate change alerting — Push notifications to mobile app users when a rate crosses a user-defined threshold.
Historical trend charts — Time-series visualisations in both the web dashboard and mobile app, consuming the History tab.
Rate confidence scoring — Assign a freshness and reliability score to each rate based on scraper success rate, last successful scrape time, and deviation from peer rates.
ML for column semantics — Replace the positional heuristic for buy/sell column detection with a trained classifier using column header text as features.
Automatic new source detection — Monitor the central bank's licensed FSP registry for new entrants and auto-generate scraper stubs for review.

12. Conclusion

This paper presented the end-to-end design and implementation of a real-time forex intelligence system, covering multi-strategy web scraping, NLP-driven rate extraction, automated pipeline orchestration, and dual-channel delivery via web and mobile application.

The central technical contributions are: (1) a layered extraction engine capable of handling the full heterogeneity of FSP websites without per-source maintenance beyond initial configuration; (2) a principled distinction between cash and transfer rates with an explicit mid_rate convention that provides a single comparable transfer rate across all source types; and (3) a zero-infrastructure pipeline built entirely on GitHub Actions and Google Sheets, demonstrating that production-grade financial data aggregation does not require expensive server infrastructure.

Conclusion

The system described in this paper covers a single economy with 20+ sources and currently operates with 100% automated uptime at zero marginal cost. Scaling to 10 economies with 200+ sources is an engineering exercise, not a research problem — the architecture already supports it. The commercial opportunity is substantial: the same architectural pattern is directly transferable to any geography with fragmented FSP rate data, which describes the majority of high-remittance economies worldwide.