10 Programming Languages And Tools Data Scientists Use Now

Looking to get started with data science but not sure how to get started? Take a look at our list of 10 programming languages and tools that are hot among data scientists and data analysts right now.

6/29/2016

An article from June 2016 found at informationweek.com and commented on here.

The "languages" the author mentions are:

  1. R - yes, a true programming language
  2. Python- yes, a true programming language
  3. Scala- yes, a true programming language
  4. SQL- a TOOL if anything, NOT a programming language but a query language
  5. Excel- a TOOL, NOT a programming language but a spreadsheet application, that do have means for programming (e.g. scripts and expressions)
  6. SAS- a TOOL, NOT a programming language per se but do have a scripting language
  7. Java- yes, a true programming language
  8. MatLab- a TOOL, NOT a programming language per se but do have a scripting language
  9. SPSS- a TOOL, NOT a programming language per se but do have a scripting language
  10. Julia- yes, a true programming language

Looking to break into the field of data science or to gain the skills to be able to transition to this field in the future? Interested in becoming a data analyst and perhaps eventually moving into a data scientist role?

Then you're probably doing some research about what data science tools and programming languages you should learn first to to maximize your chances of landing your dream job. Should you focus on mastering R? Or would it be better to make Python a priority? If you already know one or both of these languages, which ones should you focus on next? Are there up-and-coming tools?

Whether you are preparing to apply for your first job at a new company or looking to transfer to a position in your current company, we've collected this list of the top skills, programming languages, and tools that current data scientists and analysts are using now and the ones that employers say they want in job listings for data scientists. Take a look at this list as a starting point for helping you decide what skills you should learn next.

Pursuing a career as a data analyst or data scientist is a great idea. Careers website Glassdoor ranked Data Scientist as the Best Job in America for 2016, paying a median base salary of $116,840, with plenty of job openings available. Data scientist also topped Glassdoor's list of best jobs for best Work-Life Balance

Professionals with the right skills are in high demand today, since so many businesses find themselves drowning in data. These organizations are in industries from financial services to oil and gas, to retail, healthcare, and alternative energy.

They are collecting enormous amounts of data and need help managing it, analyzing it, and using it predict problems and solve them. So there are opportunities across a wide range of industries solving some of the world's most interesting and biggest challenges.

Ready to get started? Take a look at our list. If we've missed something, please feel free to add it into the comments section below.

Jessica Davis has spent a career covering the intersection of business and technology at titles including IDG's Infoworld, Ziff Davis Enterprise's eWeek and Channel Insider, and Penton Technology's MSPmentor. She's passionate about the practical use of business intelligence, ... View Full Bio

 

1. R

R  There are two top tools that data scientists and analysts use, and one of them is R. Created in 1995 by Ross Ihaka and Robert Gentleman, this open source language is meant for data analysis and data visualizations. R has an active user community, and there are many R packages designed for applying the language to particular analytics problems. This language has attracted the attention of enterprises as well. Perhaps the best illustration of this is Microsoft's acquisition of Revolution Analytics in January 2015. The company offered a commercially supported enterprise platform around the R language.   (Image: The R Project for Statistical Computing)

There are two top tools that data scientists and analysts use, and one of them is R. Created in 1995 by Ross Ihaka and Robert Gentleman, this open source language is meant for data analysis and data visualizations. R has an active user community, and there are many R packages designed for applying the language to particular analytics problems. This language has attracted the attention of enterprises as well. Perhaps the best illustration of this is Microsoft's acquisition of Revolution Analytics in January 2015. The company offered a commercially supported enterprise platform around the R language.

(Image: The R Project for Statistical Computing)

R is definitely a language to know for analytics, statistics, and data scrientists.

2. Python

Python  The other top tool data scientists and data analysts use is Python. The earliest version of the language has been around since 1990. It was created by Guido Van Rossem. If you are looking at job listings for data scientists and analysts, one of the top skills requirements is either Python or R knowledge, and often both. Python is considered more of a general purpose language. At the Datacamp site, Python is called a good language for beginners to programming, while R is characterized as having a steep learning curve.   (Image: Python.org)

The other top tool data scientists and data analysts use is Python. The earliest version of the language has been around since 1990. It was created by Guido Van Rossem. If you are looking at job listings for data scientists and analysts, one of the top skills requirements is either Python or R knowledge, and often both. Python is considered more of a general purpose language. At the Datacamp site, Python is called a good language for beginners to programming, while R is characterized as having a steep learning curve.

(Image: Python.org)

This general purpose programming language has become a popular with scientist due to libraries suitable for science, analytics, and presentation (graphs and such)

3. Scala

Scala combines functional and object-oriented programming. It works with both Java and Javascript. It is the hot language to learn right now, since organizations aspire to work with real-time data. That's because it is an implementation language of many of the technologies that enable streaming data such as Apache Spark and Apache Kafka. The O'Reilly 2015 Data Science Salary Survey noted that Scala use increased by 10% in 2015.

(Image: Henrik Jonsson/iStockphoto)

4. SQL

SQL  While not used for big data, SQL (Structured Query Language) remains a hugely popular tool among data analysts. The O'Reilly survey noted that 68% of survey respondents said they use SQL. SQL is a standard language for relational database management systems, and RDBMS is still where so much traditional enterprise data resides. So SQL remains an essential tool for enterprise organizations.    (Image: patpitchaya/iStockphoto)

While not used for big data, SQL (Structured Query Language) remains a hugely popular tool among data analysts. The O'Reilly survey noted that 68% of survey respondents said they use SQL. SQL is a standard language for relational database management systems, and RDBMS is still where so much traditional enterprise data resides. So SQL remains an essential tool for enterprise organizations.

(Image: patpitchaya/iStockphoto)

SQL is good to know for getting data from databases, and putting back results, but it is a QUERY language, NOT a programming language.

5. Excel

Excel  And let's face it, even in the age of high-level tools, Excel is still a very popular tool used by many. The O'Reilly survey said that 59% of data scientists and analysts use Excel, and that number has barely changed year-over-year. Excel is a tried-and-true workhorse of data analysis. Its ubiquity and ease of use for non-programmers and analysts makes it a tool of choice for most organizations, even as they use other tools, too.   (Image: 100pk/iStockphoto)

And let's face it, even in the age of high-level tools, Excel is still a very popular tool used by many. The O'Reilly survey said that 59% of data scientists and analysts use Excel, and that number has barely changed year-over-year. Excel is a tried-and-true workhorse of data analysis. Its ubiquity and ease of use for non-programmers and analysts makes it a tool of choice for most organizations, even as they use other tools, too.

(Image: 100pk/iStockphoto)

Fully agree, Excel is a great application and TOOL. And the more advanced features you commend, the better.

6. SAS

SAS is one of the Leaders in the Gartner Magic Quadrant for Advanced Analysts, and a Visionary in the Magic Quadrant for Business Intelligence and Analytics Platforms. Originally developed at North Carolina State University, the software suite was spun into its own company in 1976. It remains a popular tool among data analysts. An analysis performed by machine learning startup Crowdflower that looked data from thousands job openings for data scientists posted to LinkedIn found that between 15% to 20% called for job candidates who had experience with SAS.

(Image: SAS)

SAS denotes a software suite (a whole bunch of applications and add-ons, libraries/modules), and if anything is a bunch of TOOLS. And most, if not all, comes at fairly serious money.

Programming/scripting/automating tasks can be done via SAS language.

7. Java

That same Crowdflower analysis showed an even higher demand in those data scientist job listings for Java -- between 35% and 40% of data scientist job listings on LinkedIn said they wanted candidates with experience in this language. The O'Reilly survey showed use of Java declining from 32% in 2014 to 23% in 2015, but that still represented almost a quarter of data scientists responding to the survey as using the language.

(Image: serg3d/iStockphoto)

Among the most used programming languages in the World (Java and C/C++ usually combats for the number 1 spot). Unsure if we would prioritize this language for scientests use however...

8. MatLab

This propriety programming language developed by MathWorks and originally released in 1984 was big in academic and math circles, and continues to be used, particularly in academia, because of its suitability for data acquisition and mathematical modeling. The O'Reilly survey shows its use in decline, however, and the Crowdflower analysis also shows only about 10% to 15% of job listings calling for the skill.

(Image: Wcam via Wikimedia Commons)

MATLAB foremost denotes a software, an application (like Excel) and NOT a programming language (as original author states). Yes, it does have a programming language but it's proprietary and most important - it can basically only execute inside MATLAB application itself, just like scripting languages in SAS, SPSS, STATA, and other applications.

* From Wikipedia: "MATLAB has a number of competitors. Commercial competitors include MathematicaTK SolverMaple, and IDL. There are also free open source alternatives to MATLAB, in particular GNU OctaveScilabFreeMatJulia, and Sage which are intended to be mostly compatible with the MATLAB language. "

9. SPSS

This tool is about even with MatLab on the Crowdflower analysis, and it's the tool that puts IBM into the Leaders square of the Gartner Magic Quadrant for Advanced Analytics. Many universities teach this tool as part of their analytics degree programs. IBM acquired SPSS in 2009 and currently offers a range of products under the name.

(Image: serg3d/iStockphoto)

SPSS is llike SAS and MatLab foremost a software, an application (like Excel), or rather a suite of applications and modules, a TOOL and NOT a programming language.

10. Julia

Julia Julia is not as well-known. It's a newer tool that is considered not as mature as the others on the list. A year ago, the creators of Julia launched a startup to provide training, commercial support, and consulting for those who want to use the language. It is considered a free alternative to proprietary tools for data science, and more modern than languages like Python and R, according to VentureBeat.   (Image: Julia Computing LLC)

Julia is not as well-known. It's a newer tool that is considered not as mature as the others on the list. A year ago, the creators of Julia launched a startup to provide training, commercial support, and consulting for those who want to use the language. It is considered a free alternative to proprietary tools for data science, and more modern than languages like Python and R, according to VentureBeat.

(Image: Julia Computing LLC)