The Mostly Color Channel: April 2018

Friday, April 27, 2018

Data Analysis Careers

On 25 April 2018, the European Commission increased its investment in AI research to €1.5 billion for the period 2018-2020 under the Horizon 2020 research and innovation program. This investment is expected to trigger an additional €2.5 billion of funding from existing public-private partnerships, for example on big data and robotics. It will support the development of AI in key sectors, from transport to health; it will connect and strengthen AI research centers across Europe, and encourage testing and experimentation. The Commission will also support the development of an "AI-on-demand platform" that will provide access to relevant AI resources in the EU for all users.

Additionally, the European Fund for Strategic Investments will be mobilized to provide companies and start-ups with additional support to invest in AI. With the European Fund for Strategic Investments, the aim is to mobilize more than €500 million in total investments by 2020 across a range of key sectors.

With the dawn of artificial intelligence, many jobs will be created, but others will disappear and most will be transformed. This is why the Commission is encouraging Member States to modernize their education and training systems and support labour market transitions, building on the European Pillar of Social Rights.

The annus mirabilis of deep learning was 2012 when Google was able to coax millions of users into crowdsourcing labeled images. They also had tens of thousands of servers that were not very busy at night. Most of all, however, Google has an incredible PR department that was able to create a meme.

Software defined storage (SDS) on commodity hardware made it very inexpensive to store large amounts of data. When the cloud is used for storage, there are no capital expenditures.
Ordinary citizens became willing to contribute vast amounts of data in barter for free search, email, and SNS services. They were also willing to label their data for free, creating substantial ground truth corpora that can be used as training sets.
High-frequency trading created a market for GPGPU hardware, resulting in much lower prices. Also, new workstation architectures made it possible to break the impasse caused by the end of Moore's law.
ML packages on CRAN made it easy to experiment with R. Torch and Weka made it easy to write applications capable of processing very large datasets.

Many companies are setting up analytics departments and are trying to hire specialists in this field. However, there is great confusion on what the new careers are and how they are different. Often, even the companies posting the job openings do not understand the differences.

Recently, in the Sunnyvale City Hall, two representatives from LinkedIn and a representative each from UCSC Silicon Valley Extension and California Science and Technology University, participated in a panel organized by NOVA, dispelling the confusion.

Essentially there are three professions: data analyst, data engineer, and data scientist:

Data analysts tends to be more entry level and do not necessarily need programming or domain knowledge: they visualize data, organize information and summarize data, often using SQL. Essentially, they deal with data "as is."
Data engineers do what is called data preparation, data wrangling, or data munging. They pull data from multiple, distributed (and often unstructured) data sources and get it ready for data scientists to interpret. They need a computer science background and should be skilled with programming, Hadoop, MapReduce, MySQL, and Spark.
Data scientists turn the munged data into actionable insights, after they have made sure the data is analytically rigorous and repeatable. They usually have a Ph.D. The ability to communicate is vital! They must have a core understanding of the business, be able to show why the data matters and how it can advance business goals and communicate this to business partners. They need to convince decision makers, usually at the executive level.

About this blog

The Internet is an amalgam of forms blurred under epistemological pressures. In Søren Kierkegaard’s words, under this flat shower of leveled information, where everybody is interested in everything and nothing is too trivial or too important, people just accumulate information and postpone decisions indefinitely, i.e., nobody takes action and nobody is responsible for truth — there is no mastery, just gossip. He called this the æsthetic sphere of existence, exhorting us to evolve to the ethical sphere, where we do not just accumulate information but take action and make commitments. Blogs are instruments to overcome flatness by creating opportunities for vertical activities. In this sense this blog is a view from my window — a collection of tidbits I judged relevant to computational color science and in general to the promotion of scientific excellence in areas of strategic importance for the future of research, economy and society.

The Mostly Color Channel

Friday, April 27, 2018

Data Analysis Careers

Search This Blog

Featured Post

Meta-Palette

Understanding Color

Cognitive Aspects of Color

The Color Thesaurus...

Popular Posts

Blog Archive

Labels

Contributors

Blogroll

About this blog

Privacy Policy