Tuesday, 23 September 2014

Mathematics vs. Enigneering

 During the past two years, I devoted plenty of time into understanding the underlying mathematics in computer science (with a main focus on pattern recognition). This blog discusses some of my feeling and experience in the relationship between mathematics and its application in the real world (engineering problems).

A mathematical system typically starts by defining a set of fundamental concepts. These concepts include some nouns and the operations of the nouns. Then more complicated concepts are derived from these basic concepts using logical induction. For instance, almost all textbooks of mathematics analysis start the discussion by defining the concept of natural numbers, based on which integers are derived. Then one can construct a definition for the addition operation. Then the subtraction and multiplication operations can be defined on top of the addition operation. The a set of complicated concepts such as division, rational number, complex numbers and so on can be defined using the more fundamental concepts.

What’s the benefits of using such an approach?  In my opinion, the design of such a theoretical system allows us to analyse the properties of the more complicated concepts from the properties of simple ones. For instance, one can define the rational number as the division of two integer, noted as p/q, where q is not equal to 0. Once we have such a definition, properties of rational numbers can be derived by properties of integers. In fact, it is possible to construct a systems with a beginning of defining rational numbers. For instance, we can first define rational numbers, and the define an integer as some operations of rational numbers. However, it is much harder to define a rational number without the concepts of integers. The complication of the definition of the first concepts will make all the following definitions complicated. Therefore, even though one can construct a mathematical system oppositely which is also logically correct, the system is expected to be useless.

Mathematics is the abstract that approximate the real world. But it can never be used to correctly describe the real world.  Albert Einstein had a famous speech on this point: "As far as the laws of mathematics refer to reality, they are not certain; as far as they are certain, they do not refer to reality".  For example, in machine learning, we typically use Markov Chain Monte Carlo sampling method to inference graphical model. Suppose we want to design an algorithm that can randomly generate samples from a uniform distribution between [0,1]. Is it possible for us to do that? No. There are many troubles that we cann’t complete this task. One problem is that, we cannot represent all real values between 0 and 1, because the number of bits to represent a real value in a computer is finite. That is, set of numbers we can store on a computer is finite, while the number of real numbers in [0,1] is infinite. This indicates that, no matter how we design the algorithm, it is impossible to have an ideal random generator. The best we could do is the propose an algorithm that generates random numbers approaching to the target distribution.

Even sometimes we can model the real world using more complicated mathematical models which are closer though not 100% fit the real world situation, it is still highly valuable to apply a simpler model. It is well known that the electrical circuits theory is a mathematical approximation of Maxwell’s Equations of electromagnetism. In reality, an ideal circuit is impossible to exist in the real world. However, if one starts analyzing an electronic circuit by Maxwell’s Equations, it is unlikely to solve a problem within a limited time frame.  Another example is the Hidden Markov Model that is used in speech recognition. It is well known that the speech signal is not only first order dependency. However, if one tries to analyse the higher order dependencies, the model will be too complicated to analyse and use.

Therefore, for all problems, to provide in depth mathematical foundations, we have to first make some assumptions. Without any assumptions, a theory cann’t be set up. In machine learning, there  is a theory termed "no free lunch theorem"; correspondingly in the feature representation, there is another theory termed "Ugly ducking theorem". These theories have the same idea: all mathematical models contain some assumptions, and the performance of these models depend on whether the assumptions of the models approximate to the real world or not. Please note that, they can only approximate to the real world, and will never be the real world.

The hardest thing for research in pattern recognition is in fact to propose a hypothesis that fit the real world. Due to my personal experience, one can develop a theory in this way:

  1. set an assumption that fit a subset of real world problems (a strong assumption)
  2. set up a theory for this assumption;
  3. modify the assumption to the a weaker assumption;
  4. set up a theory that satisfies a more general situation.

In this way, one can construct a theory step by step, and develop it in a long term. However, the process of developing such a theory should be long and painful.

Hometown

Yesterday, I picked up my concert flute, which I hadn't used for a long time, to play a Japanese melody called "The Original Scener...