Wednesday, January 11, 2023

Better article PDF

When you publish an article, you want it to be discoverable by other researchers. This requires that it be indexed. Indexing systems need metadata, which they usually extract from the PDF files in digital libraries or other document repositories.

Ideally, a manuscript submission system collects the necessary metadata from the corresponding author when the final manuscript is submitted. Since author names are not unique, a good system will require each author to log into the system using their ORCID credentials, which also ensures that all coauthors know they are such.

Some metadata is not known by the submitter, for example, the DOI, volume and issue number, page, and publication date. Such metadata is added by the managing editor. In some journals, the managing editor creates all metadata, but it is safer when the publication system generates algorithmically the metadata from that provided by the corresponding author.

Some publishers omit the verified ORCID collection from the authors. In that case, you should put each author's identifier in the author declaration. When you create your manuscript using LaTeX, you can use the package orcidlink. In the preamble add

\usepackage{orcidlink}

and in the author declaration add your ORCID

\author{John Doe\,\orcidlink{nnnn-nnnn-nnnn-nnnn}}

Some publishers just publish in their digital libraries the final PDF they receive. If you just submit the default format file you generated, it will not have any metadata and your article will not be discoverable. It is safer, to submit the article in the PDF/A archival format and with your metadata included as XMP so it can be extracted by the web crawlers of the indexing organizations.

You could accomplish this using the full version of Acrobat, but as I wrote above, manual operations are not recommended and you should let the LaTeX typesetter do it. Fortunately, River Valley Technologies has contributed a package called pdfx to automate this step. You already have this package with the standard LaTeX installation.

After importing this package, you also import hyperref. Then you write an xmpdata file declaring the metadata. You can find all the information in the exhaustive help file. The import order is important, for example in the preamble you could declare

\usepackage[a-1b]{pdfx}

\usepackage{hyperref}

If you are an editor and maintain a template for your journal, you can also embed the xmpdata file at the top of the preamble. The help file explains how to do that.

You can check the PDF format and the metadata with the free Acrobat Reader. I have generated the following two documents using the method described above:

article

slides



Monday, June 21, 2021

Industrial Research

Previous related posts: Career Networking, Scholarly Publications.

With the industrial revolution, large companies introduced central research laboratories to accelerate the invention of new products. After the Sputnik crisis, in the USA these laboratories became very prestigious, under the influence of people like Joseph Licklider. The labs flourished, receiving government jobs paid under the cost-plus model, where the paid price was the cost of producing the technology plus a margin for the company's profit.

After receiving their Ph.D., young researchers would join learned societies in their field and attend their annual meetings to stay current in the field and to network. They would join a prestigious central lab with the plan to work there until retirement. With the pressure for excellence, there were always personal frictions, but they were offset by the companionship and respect researchers had for each other.

In research, the hierarchies are relatively flat. The organizations were very flexible, and researchers moved around in the lab and formed robust networks. They would attend the annual meetings of their societies and present their progress: their network was not only dense but also vast. By subscribing to the same journals, there was a shared knowledge of the state of the art.

At the end of the Cold War, the central labs rapidly disappeared. The government no longer had the need to outbrain the Russians, companies put more emphasis on quarterly results instead of long term success, therefore executives had less understanding for research. The universities adapted and started programs to teach students to become entrepreneurs. All this contributed to the central labs to disappear in a very short time.

At first, one would think that industrial research has completely disappeared. However, this is not the case, and today there are more researchers than during the Cold War. But the research infrastructure has radically changed. For example, professors no longer spend occasional time in industrial central labs as visiting scientists, but have more secure part-time positions in large companies, with job titles like Fellow.

Typically, a larger technology company has a VP of research with numerous researchers. The latter no longer sit in a central location but are dispersed throughout the engineering divisions. On one side, this allows them to glean important hard problems with which the engineers are grappling and get inspired for new technologies. On the other side, when engineers get stuck with a problem for which there is no clear solution on Stack Overflow, they can informally ask the local researcher for a lead.

The researchers are not embedded in the development organizations. They report to a remote manager, and they have long term goals instead of the daily Jira tasks. They do not have short term deadlines, but at the end of the day they cannot turn off their brains until the next morning. Today's researchers are much more lonely than the researchers in the central labs of yore.

Today's researchers are less dependent on learned societies and tend to network using LinkedIn. The annual meetings have disappeared and have been replaced by topical meetings. Instead of a steady shower of scientific articles in journals, researchers today do searches on Google Scholar for the knowledge they need at the moment. One of the corollaries is that today papers should no longer have memorable titles, but the titles have to have the important terms at the beginning, so they show up at the top in searches.

This requires learned societies to adapt. Researchers are isolated and change employers more often. Flexible regular meetups are maybe more important than rigid conferences. When in the past societies could solicit sponsorships from central labs, today the researchers are decentralized and there is no longer a budged item for sponsorships. Despite this, anecdotally there is more money for essential expenses and while in the past page charges were a barrier, today they are no longer important and researchers publish in journals with high impact factor, regardless of cost.

There is another important factor in the lives of researchers. Since they are now dispersed, it is more difficult to advance in the career. The Anglo-Saxon countries always had the concept of mentorship instead of the more formal master-apprentice system of other western countries. Other cultures are now copying the mentorship system to help researchers to succeed in life. This is a new role to which learned societies have to pay attention.

As I mentioned, researchers do searches for related art on the web instead of staying current by subscribing to journals. Search engines are based on n-grams and do not know the history of science. Therefore, it is easy to get the related art in the introduction wrong, and especially the references are often wrong. For example, the CIELAB color model operator was not introduced at the INTERACT-2010 conference, but much earlier and no later than 1976.

Thus, editors and reviewers have a much harder job verifying the introduction and the references of a manuscript. This and the current career path of researchers (see post on Career Networking) prompted, for example, the SPIE about 15 years ago to limit the terms of the editors in their journals. Learned societies must give high consideration to mentorship for their journals and conferences. Conference chairs and editors must have a good number of young researchers who work closely with the old hands to learn the ropes. Fortunately this is easy because, as noted earlier, today's researchers are lonely and long for mentors.



Sunday, June 20, 2021

Scholarly Publications

The evolution of approaches to employment was discussed here.

Fifty years ago, a professor in the mathematics department at the Swiss Federal Institute of Technology (ETH) was expected to publish a substantial paper every two years, at least one every 4 years. The paper was top quality and presented a true advance in the field. Once a year, the professor would also present at a conference. When the professor had come up with a lecture presenting a new field or a novel approach to an existing field, their Ph.D. students would sit in the front row of the auditorium and take extensive notes, which would be the basis for a new book.

Students were not expected to write papers. They would write reports and give seminar presentations to get the required credits.

The experimental physics departments was quite different: the American publish-or-perish way of life had taken over. By twenty years ago, also the mathematicians were living by the publish-or-perish paradigm. However, something else changed: the students would submit their reports to scholarly journals.

The flood of submissions required the editorial boards to change their criteria for reviewing manuscripts. Moreover, with the introduction thirty years ago of the World Wide Web by the European Organization for Nuclear Research (CERN) in Geneva, a huge number of publications became easily available, making the editor's job of separating the wheat from the chaff more urgent, otherwise researchers just waste their time reading useless articles.

Certain measures were easy to implement, like eliminating unintelligible manuscripts, plagiarisms, and nothing-wrong-papers (papers well written but not advancing the field). More editorial work was required for papers where the authors did some valid research, but did not understand it well themselves—in principle, author's supervisor would be responsible, but for the past twenty years they have been increasingly remiss of this duty. The other editorial task is to identify salame papers and reject them; salame papers refers to when the result of a project is sliced up and submitted as a series of papers. While this is OK for conference papers, it is not for journal papers.

A form of plagiarism was also to submit a paper to multiple journals using variations of the author's names. This was solved by requiring a persistent digital identifier (ORCID iD) and using a Digital Object Identifier (DOI) for every reference.

When fifty years ago papers were well written, today they tend to be sloppy. When an editor accepts a manuscript for publication, the authors tend to ignore the orthography and grammar errors pointed out by the reviewers, even when good authoring tools are available. This sloppiness increases the publication cost because a copy editor has to rework the manuscript. When a sweatshop is used, producing a 12-page article typically costs about $1,500 while using professional copy editors doubles or triples the cost. Societies usually slightly increase the publishing fee to allow for discounts for members and to subsidize financially challenged authors.

Fifty years ago, a scholarly journal covered its production costs by charging a small page charge and with subscriptions by institutional libraries. With the flood of articles in the past twenty years, the number of journals has increased and with the higher required production costs, libraries have an issue affording journals. In the long term, scholarly journals are only viable with the open access approach, where the authors pay the full publication costs.

For a large institution, the publication costs are significant especially when the copy editing cost go up with the increasing sloppiness. This leads to a hybrid solution where institutions pay a fixed yearly price and get a certain number of submissions and downloads.

Usually institutions are not too sensitive to the publication costs, as long as the journal has a good impact factor. A journal builds its impact factor not with the work of the copy editors, but with the work of the editors. For scholarly journals, these are usually researchers who volunteer their work. The question is how does one find a good team of editors-in-chief and associate editors? For this we have to look at how research evolved in the last decades.

Before, let me point out that editors have to ensure that relevant references to articles in their own journal should not be omitted and the journal must be indexed. Last but not least, the most important words of the articles should be at the beginning of the title, otherwise citing authors will miss it in a Google Scholar search.



Sunday, June 2, 2019

Efficiency

One of the main avenues to increase the quality of life and the welfare of society is to become more efficient. Concerning one of our most labor-saving devices, the computer, in the last decade, we have not done so well. While previously it had liberated us from typewriters and slide rulers, recently it has been distracting us with social media and CPU power has barely improved, although computers have become pocketable and are now ubiquitous.

However, it is still worthwhile to periodically update a workstation, for example by using a more powerful graphics card with a better GPGPU. Another worthwhile upgrade is to replace hard disk drives (HDD) with solid state drives (SSD). These have become very economical and reliable. From a hardware point of view, the easiest upgrade is to replace the internal 3.5" SATA disks with 2.5" SATA SSDs: it just takes a couple of minutes.

Open the workstation, slide out the drive tray (left), remove the disk, screw in a 2.5" to 3.5" SATA adapter (right), screw the SSD (middle) into the adapter, then slide the tray back in the workstation and close the lid.

replacing a hard disk drive with a solid state drive

The software side is a little more complicated and takes hours (but you do not have to watch it). Modern operating systems have a special partition on the system disk called GUID for the firmware. On some operating systems this partition contains the drivers, including the file system code. On my older workstation, the firmware is in a PROM, but the GUID partition is still required because the EFI system partition is used as a staging area for firmware updates. See this article for the macOS.

The safest update procedure for the software is to format the new drive and then do a fresh install of the operating system. This will start with a firmware update (press the power button until the power light flashes and you hear a long beep). In my case, the firmware was updated from version MP51.0087.B00 to version MP51.0089.B00. If you do not update your boot ROM, from time to time you will get DiskManagement error -69546 from macOS.

With the new firmware, reboot the workstation and install the operating system. In the end, do a full system migration from your backup disk. Your system is now much faster, especially when you boot it up. While I was at it, I also replaced my backup disk with a very inexpensive but high-quality consumer-grade SSD shown below.

an inexpensive consumer grade SSD can be used for backup

No other changes were required and the workstation has now become much more efficient.

One aspect that has become very inefficient in the last years is buying parts. We used to be able to go to the neighborhood store and find everything, but nowadays these stores are in bad shape and it has become difficult to find items due to online stores. In the case of the SSDs, it was not an issue because instead of picking them up in the store in 20 minutes I got them in a couple of days from the manufacturer.

However, for the 2.5" to 3.5" SATA adapter, I was less lucky. The electronic store had over a dozen different adapters, but there was no way to find out which one was for my workstation. I figured that if I order online I will get it in a couple of days from a warehouse in the Central Valley or in Utah, but I was in for a surprise. It came from overseas and unfortunately for the shipping company Palo Alto is near New York and the adapter went on a long random walk across the continent. I feel really bad for my huge carbon footprint to ship this small $16 part.

date / time activity
Wednesday, April 24, 2019 11:42 AM Received electronic information
Friday, April 26, 2019 10:57 PM Shipment information sent To FedEx
Saturday, April 27, 2019 10:49 AM [China-Shanghai Operations Center] waiting for transshipment
Saturday, April 27, 2019 11:47 AM [China-Shanghai Transit Center] left scanning
Sunday, April 28, 2019 4:58 AM [China-Shanghai Transit Center] left scanning - loaded car
Sunday, April 28, 2019 7:06 AM [China-Shanghai Pudong International Airport] Arrival at the airport - exi
Sunday, April 28, 2019 8:06 AM [China-Shanghai Pudong International Airport] Customs Release - Export
Sunday, April 28, 2019 12:17 PM [China-Shanghai Pudong International Airport] parcels from developing countries
Monday, April 29, 2019 10:03 AM [United States - Kennedy Airport] arriving at the airport - import
Thursday, May 2, 2019 11:41 AM [United States - Kennedy Airport] Customs Release - Import
Friday, May 3, 2019 7:47 PM [FEDEX SMARTPOST BREINIGSVILLE, PA]Arrived at FedEx location
Saturday, May 4, 2019 5:50 AM [FEDEX SMARTPOST BREINIGSVILLE, PA]Departed FedEx location
Saturday, May 4, 2019 10:34 PM [JEWETT, IL]In transit
Sunday, May 5, 2019 10:41 AM [QUAPAW, OK]In transit
Sunday, May 5, 2019 9:44 PM [SAN JON, NM]In transit
Monday, May 6, 2019 8:54 AM [TOPOCK, AZ]In transit
Tuesday, May 7, 2019 2:04 AM [WALNUT, CA]In transit
Tuesday, May 7, 2019 2:11 PM [BAKERSFIELD, CA]In transit
Wednesday, May 8, 2019 9:53 PM [FEDEX SMARTPOST SACRAMENTO, CA]Arrived at FedEx location
Thursday, May 9, 2019 12:27 AM [FEDEX SMARTPOST SACRAMENTO, CA]Departed FedEx location
Thursday, May 9, 2019 2:42 AM Shipment information sent To US Postal Service
Saturday, May 11, 2019 Delivered

Monday, October 22, 2018

Career Networking

In the US, the multigenerational workforce is divided into five age groups, which have quite different approaches to employment.

The traditionalists (or silent generation, born 1925–1945), have these stereotypical characteristics: striving for financial security; "waste not, want not"; nobility of sacrifice for the common good; focus on quality and simplicity; loyal to employers and expect loyalty in return; believe promotions, raises and recognition should come from job tenure; work ethic focused on timeliness and productivity; conformity and following authority.

The baby boomers (born 1946–1964), have these stereotypical characteristics: the importance of hard-work (instilled by parents); loyalty to an employer would lead to reward and seniority; willingness to take on additional responsibilities; conscientious and dependable; service-oriented; ambitious; dutiful.

The generation X (born 1965–1981) have these stereotypical characteristics: the importance of education; shaping one's own career path; work-life balance and autonomy; innovation and entrepreneurialism; comfortable with challenging conventional wisdom; outcome-oriented; collaborative decision making.

The millennials (or gen Y, born 1982–1997) have these stereotypical characteristics: need intellectual challenge; entrepreneurial; value continuous learning opportunities; achievement / results-oriented; innovative and open to new ideas; collaborative decision makers; like praise and recognition; value teamwork and equality; value independence / autonomy; seek meaningful work; value work-life balance and flexibility; value fun at work; technology-driven

The centennials (or iGen or gen Z, born 1998 and later) are just entering the workforce and the stereotypes have not yet been formed.

The traditionalists rely on local organizations like the Rotary or the golf club for networking, but also professional societies and conference attendance. The baby boomers participate actively in international societies and conferences, building a global network. The generation X still participates in conferences but is less active in professional societies and the organization of conferences. The millennials are on social media and use search engines to find information and attend local meet-ups for networking.

While in the past peoples managed contacts using a Rolodex, membership directories, etc., today colleagues are constantly on the move and everybody has to maintain their personal contact information on a professional network like Xing or LinkedIn, through which they connect to their professional contacts.

Professional network sites make money by selling your information to business intelligence and salespeople as well as recruiters looking for employees. The service is free for you, but you have to maintain your own information.

The sites are continuously improved, so you have to keep monitoring your profile for changes in the way your information is organized to be more valuable to paying customers. For example, the skills section is sorted by the number of endorsements you receive for each skill, which is not what you want. Edit this section by clicking on the pencil on the top right, then unpin the top three skills, reorder the skills by dragging the horizontal lines on the right, and pin your top three skills.

When LinkedIn bought SlideShare, your presentations appeared in the media section. However, the original site was abandoned and your media is in cold storage. To get acceptable access times, you have to upload your PDFs again directly into LinkedIn. Furthermore, videos are no longer supported, so you have to upload them to YouTube and then make them available in your LinkedIn profile as linked media.

If you apply to a job on a professional social network by clicking on the "apply" button, the probability that you will have that job in your profile is very low. Instead, you have to click on your best connection working there because jobs go mostly through internal referral.

Of course, you have to have a contact working there. The quality of the contact is important because this person has to be your advocate. You can easily increase your network by turning on Bluetooth in your mobile LinkedIn app and invite all people in your vicinity, but they will not be your advocates. Your network has to be dense.

LinkedIn connection map

The best way to create a dense network is to organize conferences because people will remember well your skills and leadership qualities. The second best is presenting at conferences and the easiest is to present at meet-ups. Even easier is to write a blog, but you should post at least once a week and advertise each post on LinkedIn and Twitter.

Thursday, July 12, 2018

Zuckerberg did not get it

Formal written question to our Newell Road neighbor, Facebook CEO Mark Zuckerberg on Edgewood Drive:

Describe how your business philosophy distinguishes the harm to individuals from the harm to society.

The officially recorded answer for posterity:

We recognize that we have made mistakes, and we are committed to learning from this experience to secure our platform further and make our community safer for everyone going forward. As our CEO Mark Zuckerberg has said, when you are building something unprecedented like Facebook, there are going to be mistakes. What people should hold us accountable for is learning from the mistakes and continually doing better—and, at the end of the day, making sure that we’re building things that people like and that make their lives better.

Particularly in the past few months, we’ve realized that we need to take a broader view of our responsibility to our community. Part of that effort is continuing our ongoing efforts to identify ways that we can improve our privacy practices. We’ve heard loud and clear that privacy settings and other important tools are too hard to find and that we must do more to keep people informed. So, we’re taking additional steps to put people more in control of their privacy. For instance, we redesigned our entire settings menu on mobile devices from top to bottom to make things easier to find. We also created a new Privacy Shortcuts in a menu where users can control their data in just a few taps, with clearer explanations of how our controls work. The experience is now clearer, more visual, and easy-to-find. Furthermore, we also updated our terms of service that include our commitments to everyone using Facebook. We explain the services we offer in language that’s easier to read. We’ve also updated our Data Policy to better spell out what data we collect and how we use it in Facebook, Instagram, Messenger, and other products.

Obviously, he did not get it. A net worth of $77.6 billion does not make you smart. Rejoice, there is hope for you.

neighbors

洋、お誕生日おめでとうございます。

Wednesday, July 4, 2018

New Swiss State Secretary for Education, Research and Innovation

Today, the Swiss Federal Council appointed Martina Hirayama as the new State Secretary for Education, Research and Innovation at the request of the Federal Department of Economic Affairs, Education and Research EAER.

Martina Hirayama

Martina Hirayama has been president of the Institute Council of METAS, the Federal Institute of Metrology, since 2012. She has also been vice president of the board of Innosuisse, Switzerland’s Innovation Promotion Agency (up to the end of 2017 the Commission for Technology and Innovation) since 2011 and a member of the Swiss National Science Foundation’s Foundation Council since 2016. Since 2011 Ms Hirayama has been dean of the ZHAW School of Engineering and is a member of the ZHAW’s Executive Board. Since 2014 she has also been Head of International Affairs.

Martina Hirayama studied chemistry at the University of Fribourg, at the ETH Zurich and at Imperial College London, obtaining a doctorate in technical sciences from the ETH. She later took a postgraduate degree in economics at the same institution. Following her doctorate she was group leader at the Institute of Polymers at the ETH Zurich, from 1995. During this time, Ms Hirayama co-founded a start-up in new coating technologies, and was CEO of the company until 2008. In 2003 she began lecturing in industrial chemistry at Zurich University of Applied Sciences Winterthur ZHW, where she developed and headed the field of polymer materials and obtained her professorship. From 2007 to 2010 she developed the Institute of Materials and Process Engineering. Ms Hirayama is a citizen of both Switzerland and Germany.

With such wide-ranging experience in research, teaching, entrepreneurship, management and administration, Ms Hirayama is very well equipped to head the State Secretariat for Education, Research and Innovation SERI. She has impressive expertise at the interface between science and business. The Federal Council has chosen a person with huge initiative and creativity, with a broad network in the field of education, research and innovation as well as politics, public administration and the private sector.

Ms Hirayama perfectly meets the exacting requirements of this position of State Secretary for Education, Research and Innovation. The important task of equipping Switzerland’s excellent ERI system for the digital future falls to the state secretariat she will now head. The Confederation, cantons, professional organisations and other players must work together to continue to strengthen both vocational and professional education and training and academic education, and to maintain Switzerland’s position as a world leader in research and innovation.

Saturday, June 30, 2018

Creative Professions

The Silicon Valley has seen radical changes in how people work. By people, I mean mostly the creative professionals who conceive the products that made the valley famous. Today, these professionals are not as creative as in the past. We are transitioning to the gig economy, where professionals do not have a fixed job but use the internet to find small assignments. The pay is very low, there are no benefits, and the money is all made by the service website owners: all work is purely transaction oriented and in the case of software, when an app breaks, it is simply abandoned.

There are conventional jobs with an employment contract and benefits, but the setting is more that of factory workers doing piecework controlled by the company's GitHub site. The companies do not invest in their workers, which do not learn new technologies and hop to a new employer every couple of years.

The gig economy is different from consulting. Consultants earn approximately twice the salary of a regular employee and make a considerable investment to deepen their expertise.

In the past, an employee was a resource groomed by companies. The new trend is the reason the most creative products now come from outside the Silicon Valley. The new centers for innovation include (from west to east) London, Lausanne, Zurich, Berlin Beijing, Shanghai, Taipei, Seul, Tokyo, …

Before the transition to piecework, a paradigm popular in the Silicon Valley was that of the field dependence of cognitive styles, going back to Herman Witkin in 1962. This paradigm was used to give employees work in which they could excel, form powerful synergistic teams, and also to design user interfaces.

People with a field dependent cognitive style, are driven by an inner motor (god). They think in a global context and tend to think in parallel, making associations. Field-dependent employees often work well in teams, as they tend to be better at interpersonal relationships. When designing user interfaces, approaches that connect different parts of a topic are useful for field-dependent learners. For example, users can discuss what they know about a topic, predict content, or look at and read related material.

People driven by a field-independent cognitive style are driven by an outer motor, for example, the product's user. They are analytical, detail oriented, and tend to think sequentially, drawing inferences. Field-independent workers tend to rely less on managers or colleagues for support. In user interfaces, approaches such as extensive reading and writing, which users can carry out alone, are useful.

Research labs looked for employees that have both a field dependent and a field-independent cognitive style. Such people can envision new theories and can also reduce them to practice by implementing them. Such an activity is called speculative design.

This paradigm can be extended to the pieceworker of today, who is driven by greed (self). It is also useful to extend the idea to other activities, as shown in this diagram:

creative professions and speculative design

Friday, April 27, 2018

Data Analysis Careers

On 25 April 2018, the European Commission increased its investment in AI research to €1.5 billion for the period 2018-2020 under the Horizon 2020 research and innovation program. This investment is expected to trigger an additional €2.5 billion of funding from existing public-private partnerships, for example on big data and robotics. It will support the development of AI in key sectors, from transport to health; it will connect and strengthen AI research centers across Europe, and encourage testing and experimentation. The Commission will also support the development of an "AI-on-demand platform" that will provide access to relevant AI resources in the EU for all users.

Additionally, the European Fund for Strategic Investments will be mobilized to provide companies and start-ups with additional support to invest in AI. With the European Fund for Strategic Investments, the aim is to mobilize more than €500 million in total investments by 2020 across a range of key sectors.

With the dawn of artificial intelligence, many jobs will be created, but others will disappear and most will be transformed. This is why the Commission is encouraging Member States to modernize their education and training systems and support labour market transitions, building on the European Pillar of Social Rights.

The annus mirabilis of deep learning was 2012 when Google was able to coax millions of users into crowdsourcing labeled images. They also had tens of thousands of servers that were not very busy at night. Most of all, however, Google has an incredible PR department that was able to create a meme.

  1. Software defined storage (SDS) on commodity hardware made it very inexpensive to store large amounts of data. When the cloud is used for storage, there are no capital expenditures.
  2. Ordinary citizens became willing to contribute vast amounts of data in barter for free search, email, and SNS services. They were also willing to label their data for free, creating substantial ground truth corpora that can be used as training sets.
  3. High-frequency trading created a market for GPGPU hardware, resulting in much lower prices. Also, new workstation architectures made it possible to break the impasse caused by the end of Moore's law.
  4. ML packages on CRAN made it easy to experiment with R. Torch and Weka made it easy to write applications capable of processing very large datasets.

Many companies are setting up analytics departments and are trying to hire specialists in this field. However, there is great confusion on what the new careers are and how they are different. Often, even the companies posting the job openings do not understand the differences.

Recently, in the Sunnyvale City Hall, two representatives from LinkedIn and a representative each from UCSC Silicon Valley Extension and California Science and Technology University, participated in a panel organized by NOVA, dispelling the confusion.

Essentially there are three professions: data analyst, data engineer, and data scientist:

  • Data analysts tends to be more entry level and do not necessarily need programming or domain knowledge: they visualize data, organize information and summarize data, often using SQL. Essentially, they deal with data "as is."
  • Data engineers do what is called data preparation, data wrangling, or data munging. They pull data from multiple, distributed (and often unstructured) data sources and get it ready for data scientists to interpret. They need a computer science background and should be skilled with programming, Hadoop, MapReduce, MySQL, and Spark.
  • Data scientists turn the munged data into actionable insights, after they have made sure the data is analytically rigorous and repeatable. They usually have a Ph.D. The ability to communicate is vital! They must have a core understanding of the business, be able to show why the data matters and how it can advance business goals and communicate this to business partners. They need to convince decision makers, usually at the executive level.
data analysis careers

Monday, March 26, 2018

Stanford Workshop on Medical VR and AR

5 April 2018, there will be a public workshop on medical head-mounted displays in Stanford. The workshop is designed to support collaborations between the engineers who are developing VR and AR technologies and the surgeons and clinicians who are using these technologies to treat their patients.

The workshop features talks by researchers who are developing VR and AR technologies to advance healthcare and panel discussions with Stanford physicians who are using VR and AR applications for surgical planning and navigation and for alleviating pain and anxiety in their patients.

There will be an interactive demo session featuring research projects, clinical applications, and startup ventures.

Seating is limited, so if you wish to attend, we recommend that you register now at the website https://scien.stanford.edu/index.php/medicalvrar.

Stanford Workshop on Medical VR and AR