The Importance of Machine Learning Theory and How Math and Physics Majors Can Thrive in Machine Learning
A recent post on machine learning, talent, and education from Arthur Tisi, CEO of Meaning Bot, originally published on LinkedIn.
A long time ago, January 2006, Business Week published an article declaring "There has never been a better time to be a mathematician." The fact is that although this article is almost 15 years old, the article reinforces a consistently valid case for what math and physics students need to understand about how they may build a career in Artificial Intelligence, Machine Learning and NLP. We are in the age of the mathematicians and opportunities abound if you apply the right approach to the pedagogy of AI.
As the CEO of an AI company I am constantly on the hunt for hard driving, fun, caring and smart team members. We have the benefit of a growing marketplace and we are focused on being the best at math, delivery and a vision for what platforms we choose to focus on.
Since all AI companies are looking for talent (of course we are as well) it is important that we try to get the best talent from various available sources and academic backgrounds and undoubtedly we are looking more and more for smart and capable math and physics expertise that we can assist as they “surrender” to solving machine learning and more broadly AI problems. As part of this, I have given ample thought to the challenges that those who don’t have deep ML or CS experience face in becoming leading engineers and thinkers in the AI space.
Fear not, the future is EXTREMELY bright!!
In general, undergraduate math courses look something like this:
- Linear algebra (100/200 level course - maybe some 300)
- Discrete math
- Differential equations (ODEs and numerical)
- Theoretical statistics (100/200 level)
- Numerical analysis 100 (numerical linear algebra) and 200(quadrature)
- Abstract algebra
- Number theory
- Real analysis
- Complex analysis
- Intermediate analysis (point set topology)
- Intro to C++, C#, or perhaps Python
- Physics classes
As a math Ph.D. student expertise in, analysis, algebra, and topology classes along with pure math where homework problems consistent almost exclusively of proofs done with pen and paper are of enormous value.
While the "data science" demarcation problem is challenging it seems evident that the Math curriculum could lack preparation in many essential areas of data science. Chief among these are programming skill, knowledge of experimental statistics, and experience with math modeling. Few would argue that programming ability is not a key skill of data science. A data scientist need not have a degree in computer science, but being able to manipulate text files at the command-line, understanding vectorized operations, thinking algorithmically; these are the hacking skills that make for a successful data hacker. Many having briefly seen C#, R or Python freshman year and occasionally used Matlab to solve ODEs for homework assignments, would be unaware that manipulation of a file from the command-line was even possible, much less have been able to write a simple sed script; there was little difference with many grad school classmates.
Many data science positions require even more than the ability to solve problems with code. Many positions require an understanding of software engineering skills and tools such as writing reusable code, using version control, software testing, and logging. Though some gain a fair bit of programming skill in college, these skills, now essential in ML daily work, may initially remained foreign.
Sometimes Math training has a lack of statistics courses, and supplanting that with a few SAS, R or SPSS sessions does not quite satisfy. Mathematical statistics is valuable in picking up machine learning, although sometimes experimental statistics is missing altogether. Many data science teams are interested in questions of causal inference and design and analysis of experiments; some would make these essential skills for a data scientist. Moreover, machine learning, also a cornerstone of data science, is not a subject most Math majors would have defined until after they are finished with math coursework.
Yet even if statistics had play a more prominent role in coursework, those who have studied statistics know there is often a gulf between understanding textbook statistics and being able to effectively apply statistical models and methods to real world problems. This is only an aspect of a bigger issue: mathematical (including statistical) modeling is an extraordinarily challenging problem, but instruction on effectively modeling real world problems is absent from many math programs.
Although Math and Physics majors arm graduates with a wide variety of mathematical models, it is rarely clear exactly which model can or should be applied in a given situation.
Many people, even technical people, are uncertain as to what academic math is beyond undergraduate calculus. Mathematicians mostly work in the logical manipulation of abstractly defined structures. These structures rarely bear any necessary relationship to physical entities or data sets outside the abstractly defined domain of discourse. Though some might argue this is speaking only of "pure" mathematics, this is often true of what is formally known as "applied mathematics". John D. Cook has made similar observations about the limitations of pure and applied math (as proper disciplines) in dubbing himself a "very applied mathematician". Very applied mathematics is "an interest in the grubby work required to see the math actually used and a willingness to carry it out. This involves not just math but also computing, consulting, managing, marketing, etc."
The Importance of Learning Theory
The use of Learning Theory with regard to ML is critical to success in the space. Why ? Well understanding the proper application of an algorithm, which algorithm or algorithms to use, or more broadly, when to for example use or combine Supervised, Unsupervised or Reinforced Learning is critical. Dr. Andrew Ng, the esteemed AI expert often tells of meeting with data scientists only to find that a mathematical approach based on a specific algorithm, they have been working on for over six months should have been “obviously wrong” to them from the start. Further, the proper size of training data, when to add, when no further data is required is all based on Learning Theory. It is as Dr. Ng put is, the difference between a carpenter at school and the skills of a “Master Carpenter”.
Considering learning theory when designing an algorithm has a few important effects in practice:
- It can help make your algorithm behave right at a crude level of analysis, leaving finer details to tuning or common sense. A good example might be the Isomap, where the algorithm was informed by the analysis yielding substantial improvements in sample complexity over earlier algorithmic ideas.
- Additionally, an algorithm with learning theory considered in it’s design can be more automatic. Consider Rifkin’s claim: that the one-against-all reduction, when tuned well, can often perform as well as other approaches. The “when tuned well” caveat is however substantial, because learning algorithms may be applied by nonexperts or by other algorithms which are computationally constrained. A reasonable and worthwhile hope for other methods of addressing multiclass problems is that they are more automatic and computationally faster. The subtle issue here is: How do you measure “more automatic”?
Yet, for those who are non-believers of Learning Theory, consider that perhaps learning theory is most useful in it’s crudest forms. A good example comes in the architecting problem: how do you go about solving a learning problem? I mean this in the broadest sense imaginable:
- Is it a learning problem or not? Many problems are most easily solved via other means such as engineering, because that’s easier, because there is a severe data gathering problem, or because there is so much data that memorization works fine. Learning theory such as statistical bounds and online learning with experts helps substantially here because it provides guidelines about what is possible to learn and what not.
- What type of learning problem is it? Is it a problem where exploration is required or not? Is it a structured learning problem? A multitask learning problem? A cost sensitive learning problem? Are you interested in the median or the mean? Is active learning useable or not? Online or not? Answering these questions correctly can easily make a difference between a successful application and not. Answering these questions is partly definition checking, and since the answer is often “all of the above”, figuring out which aspect of the problem to address first or next is helpful.
- What is the right learning algorithm to use? Here the relative capacity of a learning algorithm and it’s computational efficiency are most important. If you have few features and many examples, a nonlinear algorithm with more representational capacity is a good idea. If you have many features and little data, linear representations or even exponentiated gradient style algorithms are important. If you have very large amounts of data, the most scalable algorithms (so far) use a linear representation. If you have little data and few features, a Bayesian approach may be your only option. Learning theory can help in all of the above by quantifying “many”, “little”, “most”, and “few”. How do you deal with the overfitting problem? One thing I realized recently is that the overfitting problem can be a concern even with very large natural datasets, because some examples are naturally more important than others.
Given this description of how traditional Mathematics schooling may leave candidates unprepared for a career in data science, one might ask how many who graduate with a Math degree can assume roles directly engaged in data science. Below an outline of reasons and suggestions offers a possible career path.
The Good News for Math and Physics Majors
First, the academic study of mathematics provides much of the theoretical underpinnings of data science.
Mathematics underlies the study of machine learning, statistics, optimization, data structures, analysis of algorithms, computer architecture, and other important aspects of data science. Knowledge of mathematics (potentially) allows the learner to more quickly grasp each of these fields. For example, learning how principle component analysis—a math model that can be applied and interpreted by someone without formal mathematical training—works will be significantly easier for someone with earlier exposure linear algebra. On a meta-level, training in mathematics forces students to think carefully and solve hard problems; these skills are valuable in many fields, including data science.
Aligned also is the fact that a number of Math courses can later play very important roles with regard to developing a data science toolkit. For example, future work in Bayesian inference can be made possible by prior knowledge of linear algebra, numerical analysis, stochastic processes, measure theory, and mathematical statistics.
If at all possible, also, an opportunity for a student to minor in computer science as an undergraduate can and will prove to be very valuable. The reason being that computer science provides a solid foundation for building programming skills in the future. Although academic exposure to computer science typically lacks providing any software engineer skills, one does receive a solid grasp of basic data structures, analysis of algorithms, complexity theory, and a handful of programming languages.
Of all the possible strategies and opportunities for a math or physics student to thrive in AI and Machine Learning is the power of your curiosity and the support of a team/company that gives you some time (6-12 months) to bone up.
A curiosity in computers and problem solving plays a key role in career success. Most of those with math and physics background exhibit this zeal. Those eager to learn something, anything new about computer programming, allows programming skill development (it doesn’t matter for what purpose – ex: building a platform in Hadoop or working with SQL).
Building some experience in Matlab is another sure way to become better exposed as something as complex as building code for ICA (Independent Component Analysis) can be handled in only a very few lines of code.
I have met many very successful ML professionals who have developed their skills by self-learning, studying hard and applying their innate scientific skills to apply ML algorithms. One person I know who has a strong background in Math and Physics is a team leader at Goldman Sachs, having locked himself away for close to six months only to come out a darn good applied data scientist.
A huge opportunity can arise from having employers who teach and give new team members the opportunity to learn on their own. Given the challenges of finding talented ML expertise, companies are developing a unique pedagogical approach to minting new ML talent by teaching and allowing self-knowledge.
Also, one can’t overvalue the participation in the data science community on social media platforms where you can have the ear of some of data science's brightest minds. Here you can build a peer network that can find solutions to problems and perhaps even your next job!
So, what is the wind up?
For those hiring data scientists, we recognize that mathematics as taught might not be the same mathematics we need from our team. Plenty of people with PhDs in mathematics would be unable to define linear regression or bloom filters. However, at the same time, there is an absolute recognition that math majors are taught to think well and solve hard problems; these skills are never undervalued. Math majors are also experienced in reading and learning math! They may be able to read academic papers and understand difficult (even if new) mathematical more quickly than a computer scientist or social scientist. Given enough practice and training, they will most likely be excellent programmers.
Any pedagogical approach to advancing one’s ML knowlege MUST be based on the adoption of Machine Learning Theory principals. Without the application of learning theory companies will not engage. We would prefer to have our team members adopt a “first-time right” approach to the application of algorithms, statistical models and data to solve complex problems as we serve our clients.
For those studying math, recognize that the field you love, in its formal sense, may not actually be keeping you away from enjoyable and lucrative careers but may be establishing the baseline for a deep understanding of opportunities that you may have not yet considered.
Do please consider taking computer science classes (e.g. data structures, algorithms, software engineering, machine learning) and statistics classes (e.g. experimental design, data analysis, data mining). For both students and graduates, recognize your math knowledge becomes very marketable when combined skills such as programming and machine learning; there are a wealth of good books, MOOCs, and blog posts that can help you learn these things. Moreover, the barrier to entry for getting started with production quality tools has never been lower. Don't let your coursework be the extent of your education. There is so much more to learn. Thanks to T. Hooper who has inspired this piece.
Feel free to reach out if you are interested in joining our team or have any follow-up thoughts/ideas.