Mishka --- Bio Sketch

Mishka -- Bio Sketch (AI-related aspects) -- Nov 2012

Abstract. This biographical sketch is written near the end of November 2012. Its main purpose is to serve as a relatively short bio (7-8 printed pages) describing my background and evolution of my thinking to the extent it might be of interest for the AI community.

At the end of this sketch I am trying to make a case that the current state of software technology is sufficiently advanced to create general AI, and I conclude with a section related to the problem of AI friendliness.

My AI-related interests. Automation of programming, optimization over spaces of programs, machine learning over spaces of programs ("symbolic regression"), algebra and analysis over spaces of programs, neuroscience, spiking neural nets, mathematics of heterogenous spaces, philosophy of general AI ("AGI"), including research related to the "hard problem of consciousness", problems related to uploading, problems related to technological singularity and friendly AI.

What is not in this biographical sketch. The history of my work in the software industry is out of scope of this bio (see the Resume page on this web site for that). I've also kept personal and political aspects to the minimum.

School

I have grown up in Puschino, the Biological Center of the Soviet Academy of Sciences, a small "academic town" 60 miles south of Moscow founded in the early 1960-s. The town had a number of research institutes, housing and infrastructure for their staff, a major river (Oka), plenty of forests, an impressive Green Zone (city park), and not much else. The conversations were centered on scientific topics to a large extent.

I graduated from the Mathematical class of Moscow High School number 7. By the time we graduated we knew the theory of real numbers as equivalence classes of fundamental sequences of rationals, polynomials with coefficients in finite fields, and a good chunk of the first chapter of the Kolmogorov-Fomin textbook among other things. We were also able to program in machine code, Algol-60, and Fortran.

College

In those unfortunate times (early 1980-s) Moscow State University did its best not to admit students with any traces of Jewish blood. I went to the Applied Math program at Moscow Institute of Railroad Engineers, one of the two top math programs in Moscow among those which did not discriminate (the other one was at the Oil and Gas Institute).

Around the same time a group of brave Moscow scientists created an unofficial Independent University (a "moral predecessor" of the modern Moscow Independent University). All the teaching was pro bono, and students attended the University in the evening, after classes in their "official" colleges. For me the most useful of all classes was the theory of metric spaces taught under the heading of functional analysis by Alexander Shen. Another great class was linear algebra - but we were lucky to have an equally great linear algebra course in my college as well (the Oil and Gas Institute was less lucky in this sense, so teaching linear algebra at the Indepedent University was very important).

I attended the Independent University only during the last year of its existence, before it was closed by a KGB crackdown. Later I was unofficially attending various courses and seminars at the Moscow State University after the classes in my college.

While the Applied Math program at my college was among the top ones, its mathematical offerings were uneven. Some math classes were very good, but others were awful, and there was a distinct lack of advanced courses compared to the Moscow State University. At the same time, there was too much computer programming and computer-related nonsense, pretty uneven again in terms of quality of the classes, although some were taught by great professors.

The schedule was mandatory, there was no choice of courses at all. On the first day of a semester, one would find a fixed schedule of courses for his or her group (25-30 people who studied together for the entire 5 years) posted on the wall. Even the best computer classes were quite obsolete in terms of world-wide state of the art. We did not hear anything at all about Lisp, C, or Unix (that was 1981-1986). The main working computers were half-broken bootleg copies of IBM 360 mainframes (those bootleg copies were proudly called the "Unified System of Electronic Computers", and they spent more time being rebooted again and again and otherwise repaired, than working). This sad state of affairs was the result of the decision to surrender the race in computer technology made by Sovier leadership in the early 1970-s. This decision was aggravated by excessive respect for IBM technology and by mistakenly assuming that copying IBM would be enough to keep the gap with the West reasonably small.

The Dream

This set-up together with my inclinations towards more math and less computers led naturally to the dream to achieve maximal possible automation of the process of creating software and to try to do that by mathematical means. The reasoning went as follows: there existed a variety of math methods for effectively solving optimization problems over spaces of functions (e.g. variational problems). This was made possible by the rich mathematical structure existing on the spaces of mathematical functions.

So the central idea was to try to build as much of the nice, familiar standard math as possible over spaces of programs in hope that some of the methods for solving optimization problems over spaces of functions would then be transferable to solving optimization problems over spaces of programs (these days the modern convenient terminogy is to call this type of problems "symbolic regression").

I started to look for mathematical structures over spaces of programs and to ask various people where such structrures might be found. Alexander Shen pointed me to the recent paper by Dana Scott, "Domains for denotational semantics", and that was when I found the main topic of my studies I was conducting with some breaks here and there since 1984.

There was an informal working group, or a seminar, consisting of two people, me and Alexander Saevsky, and we studied this set of issues together for a few years (till my emigration to the United States in 1989). During that period I was studying domains as spaces of theories, as defined in the "Domains for denotational semantics" paper, and was obtaining results about those spaces, answering questions like what is a subdomain (subtype) of a given domain, how to build the domain of all subdomains (the type of all subtypes) of a given domain, etc.

Graduate School

However, what I really wanted was not to focus on those domains as spaces of logical theories, but to build a reasonably good chunk of analysis on them.

In 1992 I entered the PhD program in Computer Science at Brandeis in hope to achieve progress along these lines. It was extremely useful to attend classes and to help teach some of them. It also felt great to be a part of the vibrant academic and student community functioning in a modern liberal environment.

From the viewpoint of advancing my goals things did not look so great at first. We were taught a Lisp (Scheme), but there were not many hints on how its self-modification facilities could be used to organize meaningful program evolution. We were taught various AI methods including neural nets, but those neural nets had horrible stability-plasticity properties (they completely forgot old things when learning new things, so they needed to be reminded of all the old things together with the new things), and they did not look very helpful for advancing my goals.

In 1994 or 1995 I saw an ad on our seminar board announcing a talk at Wesleyan University in Connecticut: Bob Flagg of the University of Southern Maine was going to describe generalized metric structures over Scott domains. So I drove there, met Bob, and listened to his talk. The generalized metric structures were quasi-metrics, and they originally came from a 1987 paper by Ralph Kopperman, who used them to provide generalized metrizations for all topologies regardless of their separation properties.

In 1996 Josh Scott (no relation to Dana Scott :) ) and myself were trying to figure out how to actually compute the values of those quasi-metrics, in order to compute distances between programs as distances between their meanings in denotational semantics. It turned out that this could not be done, and the deep reason was that quasi-metrics in question were monotonic with respect to one variable, but anti-monotonic with respect to another variable, and hence not Scott continuous.

We were very disappointed by this at first, but then we discovered that the culprit is the axiom q(x,x)=0, and that there are no Scott distance functions expressing Scott topology and obeying the q(x,x)=0. However, if one takes a distance function with non-zero p(x,x), one can even retain the symmetry, p(x,y)=p(y,x), and still have a Scott continuous (and computable) p which provides a generalized metrization of the Scott topology in question.

These generalized distances should be interpreted as upper bounds of "underlying ideal true distances, which don't have to be made explicit". One can also add lower bounds obtaining computable interval-valued metrics. We called them relaxed metrics.

Bob Flagg advised us to compare relaxed metrics to partial metrics recently introduced by Steve Matthews. It turned out that the upper bound part of our distances usually satisfied all the axioms of partial metrics.

In 1997 and 1998 I did a series of joint papers with Svetlana Shorina from Moscow. These papers covered measure theory on Scott domains together with the discovery of co-continuous continuous valuations on the domain of Scott open sets and a construction to build those valuations, they also covered a natural way to build partial and relaxed metrics from such valuations, and they studied some finer points related to lower bounds, negative information, and relations of those to the theory of tolerances.

My PhD thesis was based on the results described above (see "My papers in computer science" section of this Web site). However, there was not enough math to advance my original goals. First of all, linear algebra on those spaces was fairly defective, e.g. the addition only formed a commutative monoid with a weak quasi-inverse instead of a group (this is just a property of interval numbers) and at that time we did not see a way around this. Another problem was how to go from constructions in domain theory back to constructions over programs in an effective way (that is, not via recursive enumeration which is nice theoretically, but horrible from the run-time complexity viewpoint).

I did not see a way to make further progress here, so for a few years I switched to other topics, including neuroscience and spiking neural nets, in hope that that they will have capabilities lacking in classical neural nets: nice stability-plasticity properties, holographic memory, explaination of subjective phenomenology. However, a few years later Steve Matthews and Ralph Kopperman "drafted" me back to the studies of partial metrics and related topics (see the section "My mathematical activity since 2005" below).

Raves, philosophy, technological singularity, theory of consciousness, phenomenological models, neuroscience, spiking neural nets

In the late 1990-s Sasha Chislenko introduced me to futurist philosophy and also to raves. This has profoundly affected my life and worldview, and I am extremely thankful to him for this. Among other things he introduced me to the notion of technological singularity.

The events of that period form too big a topic to fully cover them in a short essay. So I am just going to list some of the main ideas and research points I was involved with during that time and after. One should also note that rave community is very active politically, and I was quite involved in its political discourse and activity for a while. I'll omit this aspect from the present essay as well.

I always liked phenomenological models, subjective introspection, and the mode of thinking which is borderline between science and philosophy, and I liked mixing all of those together. Some of those explorations are posted in either the Essays section or in the "Science of Consciousness"/"My Writings on the Science of Consciousness" section of this web site, others remained as unpublished sketches.

The first of my studies of this kind was done in Russia and was a simple phenomenological model of subjective time accelerating as a person ages (the well-known phenomenon of "days becoming shorter"). That particular model links the "internal time" (subjective time) and physical time via a simple differential equation, and obtains the internal age of a person as the square root of his/her physical age (this holds to the extent to which we believe the assumtions of the model). Something as fundamental and simple as this is bound to be rediscovered multiple time, and I have seen at least one essay with the same result after I wrote mine.

I also liked to use group-selection style of evolutionary argument. Another essay from my Russian period argues that having a certain percentage of homosexual individuals increases the fitness of the human population because the division of labor (and later professional specialization) in the human population was quite often stratified along the gender lines. Thus the presence of homosexual members in the population catalyzes "informational procreation", i.e. the transfer of professionally and culturally important information to the next generation.

A number of my explorations were based on the results of subjective introspection. One (so far unpublished) sketch was trying to model the realm of various "alternative realities" in such a fashion that the "consensus reality" would have an equal status rather than preferred status. The model represented the universe of all realities as a "fibration" over a base space (a space of "the inner states of the person"), with fibers corresponding to realities. I am sure this or something quite similar was reinvented multiple times as well. If one conjectures that the points of a base space have wave nature, this opens the possibility of resonances between inner states of different people, and this might explain various phenomena of synchronization of the states of consciousness which are subjectively observed and reported.

Another obvious topic of interest in connection with all this is various attempts to understand the "Hard Problem of Consciousness", and also issues related to uploading and preservation or non-preservation of the subjective realm by various methods of uploading. In particular, I prefer to treat various conflicting approaches to the "Hard Problem" as "philosophical coordinate systems" rather than the matter of "absolute truth". I'd like to think that one should be able to switch between various coordinate systems of this kind at one's convenience.

Joining the multitude of texts analyzing "Penrose argument against computational intelligence" I also tried to see whether his argument makes sense. It seems that the main gap in Penrose argument is that he is viewing computers (and mathematicians) as closed systems, while in reality both computers and mathematicians can use the whole world around them as an "oracle". At the same time his suggestion that there might be a connection between fundamental physics and consciousness was thought-provoking, although my take on it was different from the take of Penrose. My position was that it was not true that we needed quantum gravity for a decent theory of consciousness (although we might eventually need it at some later stage when the theory of consciousness becomes more advanced). I thought it more likely that we might need a sufficiently general theory of consciousness to obtain a physically meaningful theory of quantum gravity.

I also tried to look at the possible post-singularity dynamics (while understanding how questionable any attempt of this kind has to be). Another essay, "Singularity is More Radical Then We Think", suggests that we are most likely to end up in one of the two major attractor areas, which I called the "power scenario" and "ethical scenario". Roughly speaking, they correspond to almost certain annihilation and to the "maximally benign scenario".

At that time the focus of my scientific studies switched from the mathematics of Scott domains to neuroscience for a few years. I hoped to obtain better understanding of the functioning of human consciousness and of our introspective observations. I was also searching for clues that would enable us to make general AI. In the process I became reasonably proficient in neuroscience by mid-to-late 2000-s. I was especially focusing on the spiking neural nets for a number of reasons. On one hand, the synchronization of spikes is thought to be one of the most important correlates of consciousness, on the other hand this synchronization is responsible for brain waves and might enable holographic memory, and we should expect that neural nets based on spikes would have much better stability-plasticity properties than the more traditional non-spiking neural nets, and that explicit spikes are the key to achieving the performance comparable to biological neural nets in this sense.

My mathematical activity since 2005

In 2005 Steve Matthews and Ralph Kopperman "drafted" me back to the studies of partial metrics and related topics. In 2006 we discovered the existence of "equivalence with a dual flavor" between partial metrics and fuzzy equalities. There were many arguments whether it is a true duality or merely an equivalence up to the choice of dual notation. In any case, this is a very fruitful correspodence which allows to transfer methods and results between the two fields and to study situations where the metric and logical considerations interplay. This topic is currently under active development (Nov 2012). The "My papers in computer science" section of this Web site has a subsection dedicated to this subject.

One interesting aspect is relationship to the categories and sheaves. My preference during the earlier period of my mathematical activity was to stay with the pre-categorical formalism in the spirit of Kolmogorov and Bourbaki. I felt that categories were not very fruitful in such areas of applied math as variational problems and other optimization problems, numerical methods, differential equations, etc, and hence using them would be counterproductive to my goals which were to transfer the methods of applied math to the spaces of programs.

However, the theory of fuzzy equalities has deep connections to categories and sheaves, and this pushed me towards doing more math within the categorical framework. One interesting thing which has become clear from this development is that we now know the mechanism for categorification of partial metrics. There is a well-known correspondence between quasi-metrics and enriched categories introduced by William Lawvere in 1973. It relies on the q(x,x)=0 axiom and hence is not applicable to partial metrics. It turned out that there is an appropriate formalism of "typed enrichment" for that (and for handling heterogeneous spaces in general) which is gaining popularity recently in the community studying fuzzy equalities (the technical name for it is "enrichment in quantaloids"). This formalism is developed for fuzzy equalities, and hence is applicable to partial metrics as well. The formalism of "typed enrichment" might eventually have applicability to a wide range of "heterogeneous spaces".

Rapid changes in software technology

During the last decade the software engineering technology underwent drastic changes. In particular, the ability to rapidly prototype and to assemble multilingual systems from ready components written in different programming languages increased greatly, leading to much higher productivity of small teams of engineers. The rise of Python and Python community is especially notable in this respect.

A variety of open source machine learning tools is now available, making it much easier to experiment with various machine learning methods in applied situations. The rise of R, R community, and its suite of machine learning tools should be especially noted here. Machine learning is no longer an esoteric discipline accessible only to a relatively small group of highly qualified professionals.

We see spectacular advances in a wide range of narrow AI technologies, and important advances in the technologies for manipulating formal texts (especially, in the community working with automated theorem provers).

Based on the combination of technological advances of the last decade, it seems that from the technical viewpoint we might be on the brink of developing general AI ("AGI" -- artificial general intelligence) which would be capable of doing its own research and software engineering, and hence we might be on the brink of an AI-induced technological singularity. My estimate is that there is a good chance of crossing this treshold in the very near future (I would not be surprised if this happens today, next week, next month, next year, or a couple of years from now).

Parallel approaches to AI friendliness

There is a wide range of approaches to the issue of AI friendliness and not much consensus between researchers in this field. This is quite appropriate given how unpredictable is the situation in the field of software, and how little if anything can we understand about the post-singularity dynamics.

On one pole there are people who hope for and work towards a formal framework for developing friendly AI and formal guarantees of friendliness. This seems to be a highly attractive approach, despite rather widespread doubts regarding its feasibility, but there is a strong chance that it will not be competitive enough compared to the other, more straight routes to general AI.

Motivated by rather slow progress in formal methods for AI friendliness, alternative approaches look at various possible flavors of AI and compare their chances of being friendly. There is a wide diversity of opinion here as well.

Then there is another pole - the opposite one to the pole of formal guarantees - which is somewhat underrepresented. It is an approach which might seem strange at the first glance. Can a text (whether in a natural language or in some formal language), or another chunk of information, which is not a part of the "winning AGI system" itself, but merely a small part of its input (e.g. downloaded by the AGI in question from the Web) make a difference where friendliness is concerned?

I want to add a few words about this underrepresented pole, with the hope to encourage people to think more about it. First of all, this approach relies on an assumption that the AGI in question will have reasonable capabilities to understand, interpret, and use human-created texts or other artifacts. If that is not the case, our chances for friendliness will probably be quite low. However, the ability to meaningfully handle human-created texts and other artifacts seems to provide a significant initial competitive advantage for AGIs. This inspires hope that such an ability will be present.

We note that from an "ordinary human point of view" this approach makes more sense than enforcement of formal constraints. It is natural to try to talk to your child (who in this case is supposed to be much smarter than you), and this appears somehow "more decent" than manipulating the situation in order to constrain the behavior of your (exceedingly smart) child.

We do see an occasional "letter to a future general AI" about ethical issues, and any such letter is an attempt at this approach.

This is also not a bad approach from a computer science viewpoint: data and programs are the same thing, hence any input to a program should be viewed as a chunk of code in the language defined by the program in question. But I have not seen any attempts to analyze the situation for different possible architectures of AI, or to take into account the vast sea of inputs of various kinds (including the body of all texts on friendly AI!), or anything like that. It is such a "metalevel analysis" of this approach that seems to be completely missing.

(Added in December 2012) Thinking about talks at AGI-Impacts 2012 conference

The current research and discussion on Friendly AI is very abstract. In particular, we are not really using software tools to help us understand the issues involved in Friendly AI, and we are not developing any detailed plans to do so. And this is the area where we really need computers to help us think.

We also need to think more how we might arrive at some consensus regarding ethics not so much among us, but between us and the developing AGI, and we need to focus more on how we are going to develop such ethics jointly with AGI, so that it is as much its creation as it is ours.

The idea of trying to control or manipulate an entity which is much smarter than a human does not seem ethical, feasible, or wise. What we might try to aim for is a respectful interaction.

Mishka --- November 2012

Back to Mishka's home page