Why We Chose To Move Our Analytics Platform To The Cloud?

Our analytics platform uses only corporate data stores and services only internal users and use cases. We chose to move it to the cloud. Our reasoning for doing this is as follows…

The usual perception amongst most enterprises is that cloud is risky, and when you have assets outside the enterprise perimeter, they are prone to be attacked. Given the mobility trends of our user population – both from the desire to access every workload from a mobile device that can be anywhere in the world as well as the BYOD trends, it is increasingly becoming clear that peripheral protection networks may not be the best solution given the large number of exceptions that have to be built into the firewalls to allow for these disparate use cases.

Here are some of the reasons why we chose otherwise:

  • Cost: Running services in the cloud is less expensive. Shared resources, especially when you can match dissimilar loads would be much more efficient to utilize and therefore would cost less. It makes sense to go to a cloud provider, where many clients have loads that are very different and therefore it makes sense for them to share resources compute and memory, thus reducing the total size of hardware needed to manage their peak utilization. During my PBM/Pharmacy days, I remember us sizing our hardware with a 100% buffer to be able to handle peak load between Thanksgiving and Christmas. But that was so wasteful and what we needed was capacity on demand. We did some capacity on demand with the IBM z-series, but then again our workload wasn’t running and scaling on commodity hardware and therefore more expensive than a typical cloud offering.
  • Flexibility: Easy transition to/from services instead of building and supporting legacy, monolithic apps.
  • Mobility: Easier to support connected mobile devices and transition to zero trust security.
  • Agility: New features/libraries available faster than in house, where libraries have to be reviewed, approved, compiled and then distributed before appdev can use them.
  • Variety of compute loads: GPU/TPU available on demand. Last year when we were running some deep learning models, upon introducing PCA our processing came to a halt given the large feature set we needed to form principal components out of. We needed GPUs to speed up our number crunching, but we only had GPUs in lab/poc environments and not in production. Therefore we ended up splitting our job into smaller chunks to complete our training process. If we were on the cloud this would have been a very different story.
  • Security: Seems like an oxymoron, but as I will explain later, cloud deployment is more secure than deploying an app/service within the enterprise perim.
  • Risk/Availability: Hybrid cloud workloads offer both risk mitigation and high availability/resilience.

Let’s consider the security aspect:

The cost of maintaining and patching a periphery based enterprise network is expensive and risky. One wrong firewall rule or access exception can jeopardize the entire enterprise. Consider the following:

A typical APT (Advanced Persistent Threat) has the following attack phases:

  • Reconnaissance & Weaponization
  • Delivery
  • Initial Intrusion
  • Command and Control
  • Lateral Movement
  • Data Ex-filteration

An enterprise usually allows for all incoming external email. Usually most organizations do not implement email authentication controls such as DMARC (Domain-based Message Authentication, Reporting & Conformance). You would also have IT systems that implement incoming file transfers, so some kind of an FTP gateway would be provided for. You also see third party vendors/suppliers coming in through a partner access gateway. Then you also have to account for users within the company that are remote and need to access firm resources remotely to do their job. That would mean a remote access gateway. You also do want to allow employees and consultants that are within the enterprise perim , to be able to access most websites and content outside implying you would have to provision for a web gateway. There would also be other unknown gateways that had been opened up in the past, that no one documented and now people are hesitant to remove those rules from the firewall since they do not know who is using it. Given all these holes built in to the enterprise perim, imagine how easy it is allow for web egress of confidential data, through even a simple misconfiguration of any of these gateways. People managing the enterprise perim have to be right every time while the threat actors just need one slip up.

As you can tell from the above diagram, the following sequence could result in a perimeter breach:

  1. Email accepted from anyone without regard to controls such as DMARC.
  2. Someone internal to the perimeter clicks on the phish and downloads malware on their own machine.
  3. Easy lateral traversal across the enterprise LAN, since most nodes within the perimeter trust each other.
  4. Web egress of sensitive data allowed through a web gateway.

On the other hand consider a Micro-Segmented Cloud workload which provides multi layer protection through DID (defense in depth). Since the workload is very specific, the rules on the physical firewall as well as in the virtual firewall or IPS (Intrusion Protection System) are very simple and only need to account for access allowed for this workload. Additional changes in the enterprise does not result in these rules being changed. So distributed workload through micro-segmentation and isolation actually reduces risk of a misconfiguration or a data breach.

Another key aspect to consider is the distribution of nodes. Looking at botnets, they have an interesting C&C (Command and Control) design; if the C&C node is knocked out, another node can assume the C&C function and the botnet continues its work attacking. Similarly cloud workloads can also implement the same C&C structure and if one of the C&C nodes is victim to a DoS (Denial of Service) attack, another can assume its role. Unlike an enterprise perimeter, a cloud deployment does not have a single attack surface and associated single point of failure, and a successful DoS attack does not mean that all assets/capabilities are neutralized, rather other nodes take over and continue to function seamlessly providing high availability.

It’s time to embrace the micro segmented cloud native architecture where a tested and certified set of risk controls are applied to every cloud workload. The uniformity of such controls, also allow risks to be managed much easier than the periphery protection schemes that most enterprises put around themselves.

Another important trend that enterprises should look to implement is zero trust security. The following diagram represents the responsibilities for each layer/component when implementing zero trust security. A very good writeup from Ed Amaroso on how zero trust security works is here.

Future of On-boarding Systems at Banks and Financial Institutions

I spent a number of years building on boarding systems for a large investment bank. When I came into the role, the immediate need was to implement FATCA across a spectrum of on boarding systems, all on different tech stacks, multitude of features and functions but none able to support all of our products. This fragmentation came about because of budget pressures, needs from various product areas, separate IT teams and the lack of hygiene discipline to sunset old and end of life systems.

While we were successful in creating a state of art on boarding platform, and a scalable, flexible regulatory program framework that could handle the changing regulations from regulatory regimes across the globe, and also in bringing all of our client facing product groups to our platform; the question that kept nagging me was – is this adding to our sustainable competitive advantage?

The answer that I arrived at was – No. This is an example of an industry-wide trapped value. Every firm on the street on boards clients and creates accounts. While clients aren’t thrilled about the experience and the documentation needed, most comply, since they could not trade without this. But reasonably assuming every client has relationships to multiple brokers and/or banks – why should they be subject to multiple disjointed onboarding experiences, with every firm asking for KYC (and other regulatory programs) documentation? Therefore, there is value created from centralized execution and administration.

In my opinion, this screams of an industry wide shared service, where clients get FATCA/KYC/MiFID certified just once and use these certifications across multiple brokers or banks. This ideally is a digital asset – a compliance certificate for any regulatory program across any regime from a shared service provider that is available to query and is immutable. The individual players can then use this digital asset as an underlying precondition in their trading/transaction/lending/advisory businesses. It should then be very trivial to move the regulatory compliance costs to the shared service provider, and given the multiple clients using the service, would lead to an overall cost reduction in banking/trading/advisory sectors. The client would also be happy, given a single point for regulatory program check and compliance.

An implementation using smart contracts on an immutable blockchain that is available to all parties, reflects real time compliance stats and allows brokers and banks to rely on this to facilitate their business would be much more efficient and this seems to be the reasonable evolution for this function.

Welcome any feedback or contrarian views…

Why On-Boarding Applications Require A Consistent Framework?

My bank has been trying to solve on-boarding for the last 25 years via a variety of on-boarding systems. Given the vagaries of budget cycles, people’s preferences and technology choices, we ended up with over 10-15 systems that did on boarding for specific products, regions and type of clients like Commodities / FX / Derivatives / Options / Swaps / Forwards / Prime Brokerage / OTC Clearing etc. With increased regulations especially FATCA (which I was hired to implement) meant wasteful and fractured capital expenditure in retrofitting each of these 10+ systems to be compliant with regulations. 

To address this, I made the case for going to a single on-boarding platform, where we could maximize feature reuse, optimize investment and be nimble with the capabilities we were rolling out. I refocused the team to move on boarding to this single platform called “The Pipe”. This included negotiating with stakeholders to agree on bare minimum functionality that would let them move to pipe. 

Ensured that all new feature development happened only on the go forward strategic platform. Designed an observer pattern to create FATCA cases (and later every other regulatory case) only on the pipe platform regardless of where the account or client was on-boarded.  This allowed for functionality on the legacy systems to be stymied and for our business to easily move over to the strategic platform. 

We streamlined delivery of functionality into a regular 4-week monthly development cycle followed by a test and deployment cycle. Achieved 99+% of all new client accounts being on-boarded on the pipe platform. Created a common regulatory platform that allows for all reg cases being created on the Pipe platform regardless of where it was created/updated. We were able to streamline development to rollout a new regulatory program in a single release cycle, which otherwise would have taken a project running for a year or more to implement. This helped us rationalize investment and also provided assurance to my business around regulatory compliance; 

Happy to share details around the challenges we faced and the strategies we employed to overcome them.

As always, I welcome any comments or compare notes on a similar situation that you may have come across.

Difference in motivating your team – healthcare vs. finance

You may need different techniques for motivating your team in healthcare and finance domains. Here’s some experience from the field.

In the Healthcare domain, when I was running projects in the field – we were able to connect the project outcomes to a patient’s health. For e.g. on a project where we were automating information collection (using a tablet coupled with a workflow tool – filenet) for a nurse visit for a patient infusion – we could correlate the accuracy of the data collection and the subsequent real time interventions we predicted to a reduction in ADE (Adverse Drug Events). It was certainly very motivating for the team to link the project outcomes to better patient health and the ability to save a life.

On another project “Therapeutic Resource Centers (TRC)” – this was similar to organizing our pharmacy and call center staff in a cluster of different cells specializing in a certain disease condition like cardiovascular or diabetes or neuro-psych or specialty. We were able to use a patient’s prescription history and medical claims to stratify them to a TRC. Once we achieved this, for any inbound contact from the patient for either a prescription or a phone call, we were able to route them to our appropriate therapeutic resource center cell. This resulted in a much more meaningful conversation with the patient regarding their health including conversations about adherence and gaps in therapy that had a direct impact on patient health. We were then able to use actuarial studies to prove the value of this model and introduced innovative products in the market to drive mail order dispensing at a much higher rate within our drug benefit plans to employers and health plans while promising for an overall reduction in healthcare spend and patient health.

In the finance domain its a lot harder to connect the outcomes of your projects to a fulfilling objective like patient health. Instead I have used a couple of different techniques to motivate my teams –

Close engagement with the business working with the regulators to understand the impact of their work. For example when we ran the risk ranking project, what motivated the team was their working very closely with the business in being able to visualize our regulatory rules and to be able to easily explain why we arrived at the result we had. What was motivating was also the fact that my team had come up with a way of visualizing the rules, that the business had not experienced before. We also were able to provide a complete audit history of how the risk ranking had changed over time by using a bi-temporal model to store our evaluation results and reason codes and the change in which client characteristics led to our evaluation of a different risk metric for the client. The business was a lot more comfortable in our annual sign offs after this visualization of the rules and audit records being available when ever we needed them for a regulatory audit.

A second way that I have used for motivating my team is being able to tie in the flexibility of our systems to the rapid changes in the regulatory landscape. Once we have a good design in place – that was able to extract our business rules out of the bowels of a program, these became a lot more accessible to change outside of the usual code release window. We were able to change our rules using a separate change cycle as opposed to our regular monthly or quarterly code release cycle – and be able to change things like high risk jurisdiction countries at a much faster pace (as needed and approved by the business). Being able to tie in our design to reduction of capital needed for our projects and also having to avoid the rapid code changes into production for every rule change definitely helped team morale as they were not constantly supporting emergency changes related to a rapidly changing regulatory landscape.