A Primer on Data Management

A number have of folks have asked me about my principles behind managing data. For me, I always apply first principles:

Types of data –

  1. Reference data  – major business entities and their properties; For e.g. and investment bank may consider client, securities and product information to be reference data; A pharmacy may consider Drug, Product, Provider, Patient to be reference data
  2. Master data – this is core business reference data of broad interest to a number of stakeholders; In this case an organization may want to master this data to identify entities that are the same but referenced differently by different systems or stakeholders; Typically you would use an MDM tool to achieve this.
  3. Transaction data – Events, Interactions and transactions. This measures the granular transactions that  different entities do either for trade, or transaction or service. For e.g. an investment bank may have deal transactions, or trade data from sales and trading desk that may be another example. A pharmacy may similarly have Rx fulfillment records as transaction data.
  4. Analytic data – Inference or analysis results derived from transaction data combined with reference data through various means such as correlation or regression etc.
  5. Meta data – data about data such as its, definition, form and purpose
  6. Rules data – Information governing system or human behavior of a process

Types of data storage paradigms –

  1. Flat files – usually used for unstructured data like log files
  2.  DBMS – data base management systems
    1. Hierarchical – a scheme that stores data as a tree of records, with each record having a parent and multiple children for e.g. IBM IMS (Information Management System). Any data access begins from the root record. My first experience with this was while  programming at CSX and British Telecom.
    2. Network – a modification of the hierarchical scheme above to allow for multiple parents and multiple children thus forming a generalized graph structure invented by Charles Bachman. It allows for a better modeling of real life relationships between natural entities. Examples of such implementations are IDS and CA IDMS (Integrated Database Management Systems). Again saw a few implementations at CSX.
    3. Relational or RDBMS – based on the relational model invented by Edgar F Codd at IBM. The general structure of a DB server consists of a storage model (data on disk organized in tables (rows and columns) and indexes, logs and control files), a memory model (similar to storage but consisting only of a portion of most frequently accessed data cached in memory + meta code, plan and SQL statements for accessing that data) and a process model (consisting of a reader, writer, logging and checkpoints). Most modern relational databases like DB2, Oracle, Sybase, My SQL, SQL Server etc. follow a variation of the above. This is by far the most prevalent DBMS model.
    4. Object  or ODBMS is where information is stored in the form of objects as used in OOP. These databases are not table oriented. Examples are Gemstone products (which are now available as Gemfire object cache which is notable for complex event processing, distributed caching, data virtualization and stream event processing) and Realm available as an open source ODBMS.
    5. Object-Relational DBMS which aim to bridge the gap between relational databases and object oriented modeling techniques used in OOP via allowing complex data, type inheritance and object behavior; examples include Illustra and PostgreSQL. Although most modern RDBMS like DB2, Oracle DB, SQL Server now claim  to support ORDBMS via compliance to SQL:1999 via structured types.
    6. NoSQL databases allow storage and retrieval of data modeled outside of tabular relations. Reasons for using them stem from scaling via clusters, simplicity of design and finer control over availability. Many compromise consistency in favor of availability, speed and partition tolerance. A partial list of such databases from wikipedia is as follows –
      1. Column: Accumulo, Cassandra, Druid, HBase, Vertica
      2. Document: Apache CouchDB, ArangoDB, BaseX, Clusterpoint, Couchbase, Cosmos DB, IBM Domino, MarkLogic, MongoDB, OrientDB, Qizx, RethinkDB
      3. Key-value: Aerospike, Apache Ignite, ArangoDB, Couchbase, Dynamo, FairCom c-treeACE, FoundationDB, InfinityDB, MemcacheDB, MUMPS, Oracle NoSQL Database, OrientDB, Redis, Risk, Berkley DB, SDBM/Flat File dbm
      4. Graph: AllegroGraph, ArangoDB, InfiniteGraph, Apache Giraph, MarkLogic, Neo4J, OrientDB, Virtuoso
      5. Multi-model: Apache Ignite, ArangoDB, Couchbase, FoundationDB, InfinityDB, Marklogic, OrientDB

Common Data Processing Paradigms and Questions about why we perform certain operations –

  • Transaction management – OLTP
    • Primarily read access – Is when a system is responsible for reading reference data but not maintaining it. A number of techniques can be utilized for this but primarily the approach is read-only services providing data realtime, readonly stored procs for batch or file access. A number of times people prefer to replicate a read only copy of the master database to reduce contention on the master.
    • Update access – Usually driven off a single master database to maintain consistency with a single set of services or jdbc/odbc drivers providing create/update/delete access, but certain use cases may warrant a multi-master setup.
    • Replication solutions – unidirectional (master-slave), bi-directional, multi-directional (multi-master) or on-premises to cloud replication solutions are available.
  • Analytics – OLAP
    • Star schema – a single large central fact table and one table for each dimension.
    • Snow flake – Is a variant of the star schema model where there still is a single large central fact table and one or more dimension tables, but the dimension tables are normalized in to additional tables.
    • Fact constellation or Galaxy: A collection of star schemas, which is a modification of the above where multiple fact tables share dimension tables.
  • When to create data warehouses vs. data marts vs. data lakes?
    • Data Warehouse – A data warehouse stores data that has been modeled and structured. It holds multiple subject areas with very detailed information and works to integrate all these data sources. It is available for business professionals for running analytics to support their business optimizations. It is fixed configuration, less agile and expensive for large data volumes with a mature security configuration. May not necessarily use a dimension model but feeds other dimension models.
    • Data marts – a mart that contains a single subject area, often with rolled up or summary information with the primary purpose of integrating information from a given subject area or source systems.
    • Data lakes – On the other hand contain structured, semi-structured, unstructured and raw data that is designed for low cost storage whose main consumers are data scientists who want to figure out new and innovative ways to use this data

Defining what your needs are for each of the above dimensions will usually allow you to choose the right product or implementation pattern for getting the most out of your system.

How To Measure Delivery Effectiveness For An IT Team

Common measures that should drive an application or application development team’s metrics collection and measurement:

  • Cadence – how frequent and regular is the release cycle
    • the number of releases per year
    • the probability of keeping a periodic release
  • Delivery throughput – how much content (functionality) is released every release
    • measures such as jira counts weighted by complexity or size
  • Quality – number of defects per cycle
    • defect counts from a defect management system such as ALM or quality center
    • change requests raised post dev commencement
  • Stability – Crashes/ breakage/incidents around the application
    • Crashes
    • Functionality not working
    • Application unavailable
    • Each of the above could be measured via tickets from an incident management system like Service Now
  • Scalability – how easily does the application expand and contract based on usage
    • measure application usage variability across time periods – for e.g. we planned for usage to double for Rx fulfillment at mail order pharmacies during Thanksgiving and Christmas  holidays than normal weeks
    • application scaling around peak usage + a comfortable variance allowance
    • shrinkage back to adjust to non peak usage to effectively manage TCO  and use capacity on demand techniques
  • Usability – how well and easily can your users access or work the application
  • Business Continuity
    • ability to recover in the event of a disaster
    • time to restore service in a continuity scenario

In my opinion, some key pre-requisites that drive good metrics are –

  • Good design and architecture
  • Code reviews and design conformance
  • Scalability isn’t an after thought
  • Usability is designed before the software is written
  • Automated regression and functional testing

I have implemented versions of delivery effectiveness for my teams at both Morgan Stanley and Medco and contrary to most practitioners beliefs, its not that hard to do.  Please reach out if you want a deeper how to discussion.

Part IIa: In Defense Of Government

This weekend I was listening to Fareed Zakaria on GPS @CNN (a program that I love) and he mentioned another aspect of government that I hadn’t covered in my post on “In Defense of Government”

His point was that we look towards our government to avoid catastrophes – and he mentioned this in respect to the climate crisis.

What really got me thinking was – this is a crucial and very under appreciated function. Essentially if the government is successful in averting the crisis, it is a non event (who remembers the collective action to fix the ozone layer hole in the atmosphere and globally phasing out CFL based coolants in refrigeration), since history does not remember the mundane – and hence the success of the institution is always under appreciated.

While if the government fails spectacularly (for e.g. The World Wars – that’s one for the history books – recounted and studied as a tale of misery and suffering for generations).

So how do we address this inherent bias in human nature – probably by also including our spectacular (although under appreciated) successes in avoiding crisis and a careful meticulous analysis of failure of strategy and policy on disasters – not just a tale of human suffering.

Part V: Future Of Work

Future Disruptions and New Maladies – what the future may hold…..

  • An aging population without a plan to take care of our old and infirm.
  • AI and ML: Machines are getting smarter and more intelligent. AI seems to be able to do a lot of the cognitive jobs currently held by humans.
  • Automation: Think self driving cars, trucks and other vehicles; IVRs that perform the jobs of CSRs, chat bots etc.
  • Are humans a single planet species? Can we defy gravity at scale (mass transportation as opposed to a few infrequent transgressions into space) – Will this open up new frontiers to explore and eventually inhabit new worlds?
  • Will we discover other intelligent species/civilizations? Again new frontiers for engagement outside our current boundaries of work – think Star Ship Enterprise and its crew – “Space: the final frontier. These are the voyages of the Starship Enterprise. Its continuing mission: to explore strange new worlds; to seek out new life and new civilizations; to boldly go where no one has gone before”.

All of these can be looked at as either opportunities or threats – it is what we make of it.  Change is never comfortable, but depending on how we embrace and adapt to it may mean our survival or ultimately demise.

Our response to these changes may take the following shape –

Protectionist Policies – Have they ever worked?

When we domesticated animals, did humans that performed manual labor, resist the entry of animals? – no, instead they evolved to managing these animals and accomplishing more work than what they could do individually.

Did humans resist mechanization – yes, but it wasn’t a disruption that they could resist for long because industrialization came with a number of benefits: it raised living conditions, increased life expectancy and also provided a preponderance of additional time that could be productively used in the pursuit of other interests and goals.

So why do we believe that the same response will be successful for the next round of redefinition of work?

Innovation: Can we reimagine work that humans do?

Why do we assume assembly line work is actually adding value?

Humans were first domesticated by agriculture and forced to leave their hunter-gatherer life style. For work that needed more strength than a single human, we used animal power for tasks such as ploughing or extracting oil.

Then came automation – where we invented gears, pulleys and mechanical arms or wheels driven by motors and generators to perform the  work on our behalf.

Next came CNC machines which in addition to the mechanical aspect of doing the job, could also be programmed to do the work without human intervention.

The final frontier is AI – where an automaton (or a software program since not all realms of work are physical) could sense the environment and determine the best action in order to achieve a set of finitely defined goals

I think we are ok as long as “we” get to define the goals but there are larger fears in society that when these automatons/programs take over and set their own goals – why would any human be needed?

Is that the right question? I do not know … does anyone?

Can AI and Intelligent robots really be an ally instead of a threat?

True if we do not hang on to our traditional definition for work and instead look for different ways to create value.

Will be a possibility if an alien species attacks us and we collaborate with our own robots to ward off this threat.

May be necessary if humans are not suited to be a space faring race and we need robots to help us in this hostile environment.

We will need them when we are older and frail and younger humans do not have the inclination or interest to look after us.

If we want to succeed, we will need to change the game …and have a plan for our workforce to transition to this new definition of work.

 🙂

 

AI And Humans (In My Game Too)

AI are robots that play games, do work etc. In my game there are 2 modes “Missions” and also “Skirmish” (I play star wars battlefront). In each game mode there are AI that try to blast you, they are programed to focus on you, the player. Each game has a difficulty that you can select and the AI are programmed to adapt to the difficulty that you selected and they will also adapt to which side you choose such as the rebellion or the empire. If you are on the rebellion, then the AI go to the empire.

Humans players are actual people (I hope you know that) that you can play against in a  mode called “Multiplayer”; In that mode you play against other humans (of course right?) The people that you play in “Multiplayer” can pick up power-ups and they can do “Emotes” in game. Each human that you play can customize star cards (Star cards are advantages they can use in game such as a jetpack).  Humans can do all of these things that an AI cannot.

The differences between humans and AI are:

  • AI cannot use star cards (I already said that right?)
  • Humans (regular players) cannot do the movements the AI does.
  • AI cannot pick up power-ups (yea I know i said that already.)
  • Humans (regular players) can use emotes (I KNOW I said that) but AI cannot.

How are they similar –

  • AI and human control players (of course right.)
  • AI and Humans (regular players) can focus on you, the reason I’m writing this is because whenever your in a server anywhere anytime (in the game) a player always focus on YOU yea YOU.
  • AI and regular players (humans) can get into ships (I don’t know how the AI gets into ships without power-ups but they still do.)

Thank you for reading this “article”! What I think is that AI and humans ARE THE SAME (at least in my game don’t know about real life)…

 

Part III: Consequences Of Rampant Capitalism…

We have seen a rapid growth in various world economies over the last century. It has been especially pronounced after world war II. Although of late this growth is not equally shared across the population. This results in income inequality and a sub set of the population loses all hope for upward mobility.

This segment of population is completely discounted and does not value the democratic choice that they exercise in elections, and consequently they make their decisions based on either  unachievable promises or even protectionist and racist policies.

 

 

Once the value of the vote goes down, what you see is election of Incompetent or Callous leaders into the government, which further endangers the existence of democracy or capitalism.

 

 

 

Now don’t get me wrong, Capitalism is the only economic system we have seen work – all others like socialism, communism, dictatorship, fascism etc. have all failed the test of time. The question really is how to temper capitalism to not become its own biggest enemy and threaten its own (and democracy’s) survival.

 

 

Our next post is a plan to tackle this….

 

Part IV: A Plan To Address The Maladies We Have Seen…

So what is the crux of the problem:

  • Capitalism promotes migration of jobs and work to the cheapest possible location;
  • Probably ethically right since it rewards the group that is the most desperate, but is unfair to the group that has a net outward migration of jobs
  • Results in massive job losses and desperation in one location while net influx of jobs and prosperity in another
  • Retraining and a repurpose of labor at source location is usually not attempted or isn’t very successful
  • The group losing the jobs is politically powerless to resist or prevent it

Let’s do a simple SWOT analysis of all the players involved :

Company:

Strength:  Capital; Agile Business and Production Processes

Weakness: Bad Political Reputation, and hence opponents try to fight this practice using protectionist/nationalist policies

Opportunity:  Find the optimal production cost to maximize profits and share holder returns

Threat:  Competition achieves a lower per unit cost of production and thus  loss of market share

Investors & Share Holders:

Strength: Capital

Weakness: Run towards the best returns, sometimes very short term focussed rather than long; i.e. support a steep discount rate

Opportunity:  Deploy capital that can be most productively used; Employ private equity/venture capital investment constructs to lock in capital to generate superior returns in the medium term instead of meagre short term gains.

Threat: Other investors generating better returns

Workers at the outbound location:

Strength: Political say at the local level

Weakness: No Capital to invest

Opportunity: Human capital that is free

Threat: A lower cost location comes up and takes on the production and jobs from them; Not trained in anything other than what they are currently doing

Workers at the inbound location:

Strength: Political say at the local level

Weakness: No Capital to invest

Opportunity: Human capital that is free

Threat: Another lower cost location comes up and takes over the production and jobs from them; May not have any training

Given the incentives, a capitalistic society gravitates towards specializing and optimizing labor, capital and production usually towards the detriment of its own workers but in favor of its investor class.

What I propose:

Change the game to utilize the strengths for each player – investors to provide capital to competitive projects, localities to use their resources to set up the right competitive projects using local resources and labor, executors that have a proven track record to implement the projects in a transparent and efficient manner, operators to run the venture profitably and an infrastructure exchange that removes friction from investment, project execution, operation and the ability to move investments in and out of these projects that compete for being a better investment.

Infrastructure Exchange Market

What?

An open market place for micro/ project level sponsorship.

Starts with a contractual agreement between sponsors/implementors/operators on initial project parameters

Proposal put up for bit to the investors – micro credit – or ordinary shares available to folks with a well defined investment profile

Types of projects accepted:

– wind farms

– solar farms

– energy storage pods

– efficiency projects – reduce energy consumption run rates

– later could move to any infrastructure project – build roads / bridges etc.

Why?

– a way to fund infrastructure from goal minded folks that believe in sponsoring a projects goals while making sure of decent returns on investments

Goals can be

– renewable energy

– create jobs

– reduce pollution

– build infrastructure

– competition drives performance

– easy in and out through trading in the infrastructure bank exchange

How?

– sponsors propose projects on exchange

– investors buy in interest; when project fully invested in – kicks off

– implementors run with it and complete execution

– hand over to operators that run and produce steady annuities

– project shares trade on exchange in terms of how well they are doing and can be traded

– all activities have a std set of metrics that are measured and published for full transparency

– create a path for projects to fail and be winded down

– create market on which mature projects performing as annuities can be traded and folks can move in and out of the projects

– should you create it as a bond structure – like infrastructure bonds backed by local/state government?

When?

– create exchange with functionality for each participant

– get investors to fund operations

– sign up sponsors / implementors / operators

Where?

– where there is the greatest need

– local governments struggling with creating jobs but have local resources that can be leveraged

– Examples of local resources

– Land

– Wind power

– Solar power

– Hydro power

Who?

Initial investors – those that fund the exchange

Implementors – those that set up the projects initially

Operators – those that run and maintain the projects – create local jobs;

Investors – who fund the projects

Sponsors – local governments/state that can provide resources (e.g. Land lease/ rent etc)

 

An exchange like the one described above will align the incentives between the workers and the investors to create a win-win situation for both….

We are currently working on a prototype for the platform….watch this space…

 

 

Part II: In Defense Of Government

We find “The Government” to be a common punching bag for most folks – politicians and common citizens alike.

Our politicians rail against a corrupt and ineffective government pointing to us  the ills of big government and why the regulations imposed by the government are stifling our industry.

There are folks that talk about the deconstruction of an institution that has evolved over centuries of human development. It seems, everyone has an example of some egregious behavior that they use to justify painting the whole institution bad and in need for pruning. And the travesty of the situation is that there is no one standing on the other side defending this vital institution and its usefulness.

So let me take on the defense for the institution called the “Government”, which allows us to cooperate in very large numbers, prescribes and maintains a rule of law and is able to undertake projects at a scale that is impossible/unsustainable for an individual/family or a small group.

Governments came about when humans started gathering into communities. They were three primary reasons to create a government –

  • Establishing a common benchmark of behavior in society and establish conformance
  • Achieve scale where an individual/family or small group could not
  • Making outsize bets in pushing expertise in any domain – agriculture, industry or technology (for e.g. core science) where short or medium term benefits may not justify a rational private investment

Establishing the Rule of Law:

Humans felt the need to create rules for common behavior that all members within a community to adhere to. Anyone not adhering to these rules was given punishments or incarcerations. The institution developed as a checks and balances for keeping civil behavior.

Bringing the Benefits of Scale:

If you look historically, humans formed collectives and villages/towns/cities etc. to take advantage of the power of the collective. There were some projects/endeavors that were beyond the scope of an individual or a small group like a family to accomplish. Hence we humans invented the construct of “The Government” to allow us to cooperate in larger numbers. For e.g.  maintaining a military for offense or defense or build roads that are more than point to point connections and useful for the entire community.

Before we assign all the blame to the government and we dismantle it – we also should be willing to give up all the gains achieved by this institution.

Are we ready to give up on our military, or our highways, the internet, the GPS system, antibiotics and miracle drugs we have on the market. All of these are innovations that started as government projects and were then handed over to private enterprise.

 

I do not discount the criticism leveled by some that there are government agents that take advantage of their power, or even some individuals who are free riders on the rest of society. Yes, you can find bad apples in any group, but let’s not use these examples to discredit the institution and forgo the benefits of having a functioning and effective government.