Home
/
Blog
/
Tech Tutorials
/
How can R Users Learn Python for Data Science ?

How can R Users Learn Python for Data Science ?

Author
Manish Saraswat
Calendar Icon
January 12, 2017
Timer Icon
3 min read
Share

Explore this post with:

Introduction

The best way to learn a new skill is by doing it!

This article is meant to help R users enhance their set of skills and learn Python for data science (from scratch). After all, R and Python are the most important programming languages a data scientist must know.

Python is a supremely powerful and a multi-purpose programming language. It has grown phenomenally in the last few years. It is used for web development, game development, and now data analysis / machine learning. Data analysis and machine learning is a relatively new branch in python.

For a beginner in data science, learning python for data analysis can be really painful. Why?

You try Googling "learn python," and you'll get tons of tutorials only meant for learning python for web development. How can you find a way then?

In this tutorial, we'll be exploring the basics of python for performing data manipulation tasks. Alongside, we'll also look how you do it in R. This parallel comparison will help you relate the set of tasks you do in R to how you do it in python! And in the end, we'll take up a data set and practice our newly acquired python skills.

Note: This article is best suited for people who have a basic knowledge of R language.

Machine learning challenge, ML challenge

Table of Contents

  1. Why learn Python (even if you already know R)
  2. Understanding Data Types and Structures in Python vs. R
  3. Writing Code in Python vs. R
  4. Practicing Python on a Data Set

Why learn Python (even if you already know R)

No doubt, R is tremendously great at what it does. In fact, it was originally designed for doing statistical computing and manipulations. Its incredible community support allows a beginner to learn R quickly.

But, python is catching up fast. Established companies and startups have embraced python at a much larger scale compared to R.

r machine learning vs python machine learning

According to indeed.com (from Jan 2016 to November 2016), the number of job postings seeking "machine learning python" increased much faster (approx. 123%) than "machine learning in R" jobs. Do you know why? It is because

  1. Python supports the entire spectrum of machine learning in a much better way.
  2. Python not only supports model building but also supports model deployment.
  3. The support of various powerful deep learning libraries such as keras, convnet, theano, and tensorflow is more for python than R.
  4. You don't need to juggle between several packages to locate a function in python unlike you do in R. Python has relatively fewer libraries, with each having all the functions a data scientist would need.

Understanding Data Types and Structures in Python vs. R

These programming languages understand the complexity of a data set based on its variables and data types. Yes! Let's say you have a data set with one million rows and 50 columns. How would these programming languages understand the data?

Basically, both R and Python have pre-defined data types. The dependent and independent variables get classified among these data types. And, based on the data type, the interpreter allots memory for use. Python supports the following data types:

  1. Numbers – It stores numeric values. These numeric values can be stored in 4 types: integer, long, float, and complex.
    • Integer – Whole numbers such as 10, 13, 91, 102. Same as R's integer type.
    • Long – Long integers in octa and hexadecimal. R uses bit64 package for hexadecimal.
    • Float – Decimal values like 1.23, 9.89. Equivalent to R's numeric type.
    • Complex – Numbers like 2 + 3i, 5i. Rarely used in data analysis.
  2. Boolean – Stores two values (True and False). R uses factor or character. Case-sensitive difference exists: R uses TRUE/FALSE; Python uses True/False.
  3. Strings – Stores text like "elephant", "lotus". Same as R's character type.
  4. Lists – Like R’s list, stores multiple data types in one structure.
  5. Tuples – Similar to immutable vectors in R (though R has no direct equivalent).
  6. Dictionary – Key-value pair structure. Think of keys as column names, values as data entries.

Since R is a statistical computing language, all the functions to manipulate data and reading variables are available inherently. On the other hand, python hails all the data analysis / manipulation / visualization functions from external libraries. Python has several libraries for data manipulation and machine learning. The most important ones are:

  1. Numpy – Used for numerical computing. Offers math functions and array support. Similar to R’s list or array.
  2. Scipy – Scientific computing in python.
  3. Matplotlib – For data visualization. R uses ggplot2.
  4. Pandas – Main tool for data manipulation. R uses dplyr, data.table.
  5. Scikit Learn – Core library for machine learning algorithms in python.

In a way, python for a data scientist is largely about mastering the libraries stated above. However, there are many more advanced libraries which people have started using. Therefore, for practical purposes you should remember the following things:

  1. Array – Similar to R's list, supports multidimensional data with coercion effect when data types differ.
  2. List – Equivalent to R’s list.
  3. Data Frame – Two-dimensional structure composed of lists. R uses data.frame; python uses DataFrame from pandas.
  4. Matrix – Multidimensional structure of same class data. In R: matrix(); in python: numpy.column_stack().

Until here, I hope you've understood the basics of data types and data structures in R and Python. Now, let's start working with them!

Writing Code in Python vs. R

Let's use the knowledge gained in the previous section and understand its practical implications. But before that, you should install python using Anaconda's Jupyter Notebook. You can download here. Also, you can download other python IDEs. I hope you already have R Studio installed.

1. Creating Lists

In R:

my_list <- list('monday','specter',24,TRUE)
typeof(my_list)
[1] "list"

In Python:

my_list = ['monday','specter',24,True]
type(my_list)
list

Using pandas Series:

import pandas as pd
pd_list = pd.Series(my_list)
pd_list
0     monday
1    specter
2         24
3       True
dtype: object

Python uses zero-based indexing; R uses one-based indexing.

2. Matrix

In R:

my_mat <- matrix(1:10, nrow = 5)
my_mat
     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10

# Select first row
my_mat[1,]

# Select second column
my_mat[,2]

In Python (using NumPy):

import numpy as np
a = np.array(range(10,15))
b = np.array(range(20,25))
c = np.array(range(30,35))
my_mat = np.column_stack([a, b, c])

# Select first row
my_mat[0,]

# Select second column
my_mat[:,1]

3. Data Frames

In R:

data_set <- data.frame(Name = c("Sam","Paul","Tracy","Peter"),
                       Hair_Colour = c("Brown","White","Black","Black"),
                       Score = c(45,89,34,39))

In Python:

data_set = pd.DataFrame({'Name': ["Sam","Paul","Tracy","Peter"],
                         'Hair_Colour': ["Brown","White","Black","Black"],
                         'Score': [45,89,34,39]})

Selecting columns:

In R:

data_set$Name
data_set[["Name"]]
data_set[1]

data_set[c('Name','Hair_Colour')]
data_set[,c('Name','Hair_Colour')]

In Python:

data_set['Name']
data_set.Name
data_set[['Name','Hair_Colour']]
data_set.loc[:,['Name','Hair_Colour']]

Practicing Python on a Data Set

import numpy as np
import pandas as pd
from sklearn.datasets import load_boston

boston = load_boston()

boston.keys()
['data', 'feature_names', 'DESCR', 'target']

print(boston['feature_names'])
['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO' 'B' 'LSTAT']

print(boston['DESCR'])
bos_data = pd.DataFrame(boston['data'])
bos_data.head()

bos_data.columns = boston['feature_names']
bos_data.head()

bos_data.describe()

# First 10 rows
bos_data.iloc[:10]

# First 5 columns
bos_data.loc[:, 'CRIM':'NOX']
bos_data.iloc[:, :5]

# Filter rows
bos_data.query("CRIM > 0.05 & CHAS == 0")

# Sample
bos_data.sample(n=10)

# Sort
bos_data.sort_values(['CRIM']).head()
bos_data.sort_values(['CRIM'], ascending=False).head()

# Rename column
bos_data.rename(columns={'CRIM': 'CRIM_NEW'})

# Column means
bos_data[['ZN','RM']].mean()

# Transform numeric to categorical
bos_data['ZN_Cat'] = pd.cut(bos_data['ZN'], bins=5, labels=['a','b','c','d','e'])

# Grouped sum
bos_data.groupby('ZN_Cat')['AGE'].sum()

# Pivot table
bos_data['NEW_AGE'] = pd.cut(bos_data['AGE'], bins=3, labels=['Young','Old','Very_Old'])
bos_data.pivot_table(values='DIS', index='ZN_Cat', columns='NEW_AGE', aggfunc='mean')

Summary

While coding in python, I realized that there is not much difference in the amount of code you write here; although some functions are shorter in R than in Python. However, R has really awesome packages which handle big data quite conveniently. Do let me know if you wish to learn about them!

Overall, learning both the languages would give you enough confidence to handle any type of data set. In fact, the best part about learning python is its comprehensive documentation available on numpy, pandas, and scikit learn libraries, which are sufficient enough to help you overcome all initial obstacles.

In this article, we just touched the basics of python. There's a long way to go. Next week, we'll learn about data manipulation in python in detail. After that, we'll look into data visualization, and the powerful machine learning library in python.

Do share your experience, suggestions, and questions below while practicing this tutorial!

Subscribe to The HackerEarth Blog

Get expert tips, hacks, and how-tos from the world of tech recruiting to stay on top of your hiring!

Author
Manish Saraswat
Calendar Icon
January 12, 2017
Timer Icon
3 min read
Share

Hire top tech talent with our recruitment platform

Access Free Demo
Related reads

Discover more articles

Gain insights to optimize your developer recruitment process.

How I used VibeCode Arena platform to build code using AI and leant how to improve it

I Used AI to Build a "Simple Image Carousel" at VibeCodeArena. It Found 15+ Issues and Taught Me How to Fix Them.

My Learning Journey

I wanted to understand what separates working code from good code. So I used VibeCodeArena.ai to pick a problem statement where different LLMs produce code for the same prompt. Upon landing on the main page of VibeCodeArena, I could see different challenges. Since I was interested in an Image carousal application, I picked the challenge with the prompt "Make a simple image carousel that lets users click 'next' and 'previous' buttons to cycle through images."

Within seconds, I had code from multiple LLMs, including DeepSeek, Mistral, GPT, and Llama. Each code sample also had an objective evaluation score. I was pleasantly surprised to see so many solutions for the same problem. I picked gpt-oss-20b model from OpenAI. For this experiment, I wanted to focus on learning how to code better so either one of the LLMs could have worked. But VibeCodeArena can also be used to evaluate different LLMs to help make a decision about which model to use for what problem statement.

The model had produced a clean HTML, CSS, and JavaScript. The code looked professional. I could see the preview of the code by clicking on the render icon. It worked perfectly in my browser. The carousel was smooth, and the images loaded beautifully.

But was it actually good code?

I had no idea. That's when I decided to look at the evaluation metrics

What I Thought Was "Good Code"

A working image carousel with:

  • Clean, semantic HTML
  • Smooth CSS transitions
  • Keyboard navigation support
  • ARIA labels for accessibility
  • Error handling for failed images

It looked like something a senior developer would write. But I had questions:

Was it secure? Was it optimized? Would it scale? Were there better ways to structure it?

Without objective evaluation, I had no answers. So, I proceeded to look at the detailed evaluation metrics for this code

What VibeCodeArena's Evaluation Showed

The platform's objective evaluation revealed issues I never would have spotted:

Security Vulnerabilities (The Scary Ones)

No Content Security Policy (CSP): My carousel was wide open to XSS attacks. Anyone could inject malicious scripts through the image URLs or manipulate the DOM. VibeCodeArena flagged this immediately and recommended implementing CSP headers.

Missing Input Validation: The platform pointed out that while the code handles image errors, it doesn't validate or sanitize the image sources. A malicious actor could potentially exploit this.

Hardcoded Configuration: Image URLs and settings were hardcoded directly in the code. The platform recommended using environment variables instead - a best practice I completely overlooked.

SQL Injection Vulnerability Patterns: Even though this carousel doesn't use a database, the platform flagged coding patterns that could lead to SQL injection in similar contexts. This kind of forward-thinking analysis helps prevent copy-paste security disasters.

Performance Problems (The Silent Killers)

DOM Structure Depth (15 levels): VibeCodeArena measured my DOM at 15 levels deep. I had no idea. This creates unnecessary rendering overhead that would get worse as the carousel scales.

Expensive DOM Queries: The JavaScript was repeatedly querying the DOM without caching results. Under load, this would create performance bottlenecks I'd never notice in local testing.

Missing Performance Optimizations: The platform provided a checklist of optimizations I didn't even know existed:

  • No DNS-prefetch hints for external image domains
  • Missing width/height attributes causing layout shift
  • No preload directives for critical resources
  • Missing CSS containment properties
  • No will-change property for animated elements

Each of these seems minor, but together they compound into a poor user experience.

Code Quality Issues (The Technical Debt)

High Nesting Depth (4 levels): My JavaScript had logic nested 4 levels deep. VibeCodeArena flagged this as a maintainability concern and suggested flattening the logic.

Overly Specific CSS Selectors (depth: 9): My CSS had selectors 9 levels deep, making it brittle and hard to refactor. I thought I was being thorough; I was actually creating maintenance nightmares.

Code Duplication (7.9%): The platform detected nearly 8% code duplication across files. That's technical debt accumulating from day one.

Moderate Maintainability Index (67.5): While not terrible, the platform showed there's significant room for improvement in code maintainability.

Missing Best Practices (The Professional Touches)

The platform also flagged missing elements that separate hobby projects from professional code:

  • No 'use strict' directive in JavaScript
  • Missing package.json for dependency management
  • No test files
  • Missing README documentation
  • No .gitignore or version control setup
  • Could use functional array methods for cleaner code
  • Missing CSS animations for enhanced UX

The "Aha" Moment

Here's what hit me: I had no framework for evaluating code quality beyond "does it work?"

The carousel functioned. It was accessible. It had error handling. But I couldn't tell you if it was secure, optimized, or maintainable.

VibeCodeArena gave me that framework. It didn't just point out problems, it taught me what production-ready code looks like.

My New Workflow: The Learning Loop

This is when I discovered the real power of the platform. Here's my process now:

Step 1: Generate Code Using VibeCodeArena

I start with a prompt and let the AI generate the initial solution. This gives me a working baseline.

Step 2: Analyze Across Several Metrics

I can get comprehensive analysis across:

  • Security vulnerabilities
  • Performance/Efficiency issues
  • Performance optimization opportunities
  • Code Quality improvements

This is where I learn. Each issue includes explanation of why it matters and how to fix it.

Step 3: Click "Challenge" and Improve

Here's the game-changer: I click the "Challenge" button and start fixing the issues based on the suggestions. This turns passive reading into active learning.

Do I implement CSP headers correctly? Does flattening the nested logic actually improve readability? What happens when I add dns-prefetch hints?

I can even use AI to help improve my code. For this action, I can use from a list of several available models that don't need to be the same one that generated the code. This helps me to explore which models are good at what kind of tasks.

For my experiment, I decided to work on two suggestions provided by VibeCodeArena by preloading critical CSS/JS resources with <link rel="preload"> for faster rendering in index.html and by adding explicit width and height attributes to images to prevent layout shift in index.html. The code editor gave me change summary before I submitted by code for evaluation.

Step 4: Submit for Evaluation

After making improvements, I submit my code for evaluation. Now I see:

  • What actually improved (and by how much)
  • What new issues I might have introduced
  • Where I still have room to grow

Step 5: Hey, I Can Beat AI

My changes helped improve the performance metric of this simple code from 82% to 83% - Yay! But this was just one small change. I now believe that by acting upon multiple suggestions, I can easily improve the quality of the code that I write versus just relying on prompts.

Each improvement can move me up the leaderboard. I'm not just learning in isolation—I'm seeing how my solutions compare to other developers and AI models.

So, this is the loop: Generate → Analyze → Challenge → Improve → Measure → Repeat.

Every iteration makes me better at both evaluating AI code and writing better prompts.

What This Means for Learning to Code with AI

This experience taught me three critical lessons:

1. Working ≠ Good Code

AI models are incredible at generating code that functions. But "it works" tells you nothing about security, performance, or maintainability.

The gap between "functional" and "production-ready" is where real learning happens. VibeCodeArena makes that gap visible and teachable.

2. Improvement Requires Measurement

I used to iterate on code blindly: "This seems better... I think?"

Now I know exactly what improved. When I flatten nested logic, I see the maintainability index go up. When I add CSP headers, I see security scores improve. When I optimize selectors, I see performance gains.

Measurement transforms vague improvement into concrete progress.

3. Competition Accelerates Learning

The leaderboard changed everything for me. I'm not just trying to write "good enough" code—I'm trying to climb past other developers and even beat the AI models.

This competitive element keeps me pushing to learn one more optimization, fix one more issue, implement one more best practice.

How the Platform Helps Me Become A Better Programmer

VibeCodeArena isn't just an evaluation tool—it's a structured learning environment. Here's what makes it effective:

Immediate Feedback: I see issues the moment I submit code, not weeks later in code review.

Contextual Education: Each issue comes with explanation and guidance. I learn why something matters, not just that it's wrong.

Iterative Improvement: The "Challenge" button transforms evaluation into action. I learn by doing, not just reading.

Measurable Progress: I can track my improvement over time—both in code quality scores and leaderboard position.

Comparative Learning: Seeing how my solutions stack up against others shows me what's possible and motivates me to reach higher.

What I've Learned So Far

Through this iterative process, I've gained practical knowledge I never would have developed just reading documentation:

  • How to implement Content Security Policy correctly
  • Why DOM depth matters for rendering performance
  • What CSS containment does and when to use it
  • How to structure code for better maintainability
  • Which performance optimizations actually make a difference

Each "Challenge" cycle teaches me something new. And because I'm measuring the impact, I know what actually works.

The Bottom Line

AI coding tools are incredible for generating starting points. But they don't produce high quality code and can't teach you what good code looks like or how to improve it.

VibeCodeArena bridges that gap by providing:

✓ Objective analysis that shows you what's actually wrong
✓ Educational feedback that explains why it matters
✓ A "Challenge" system that turns learning into action
✓ Measurable improvement tracking so you know what works
✓ Competitive motivation through leaderboards

My "simple image carousel" taught me an important lesson: The real skill isn't generating code with AI. It's knowing how to evaluate it, improve it, and learn from the process.

The future of AI-assisted development isn't just about prompting better. It's about developing the judgment to make AI-generated code production-ready. That requires structured learning, objective feedback, and iterative improvement. And that's exactly what VibeCodeArena delivers.

Here is a link to the code for the image carousal I used for my learning journey

#AIcoding #WebDevelopment #CodeQuality #VibeCoding #SoftwareEngineering #LearningToCode

The Mobile Dev Hiring Landscape Just Changed

Revolutionizing Mobile Talent Hiring: The HackerEarth Advantage

The demand for mobile applications is exploding, but finding and verifying developers with proven, real-world skills is more difficult than ever. Traditional assessment methods often fall short, failing to replicate the complexities of modern mobile development.

Introducing a New Era in Mobile Assessment

At HackerEarth, we're closing this critical gap with two groundbreaking features, seamlessly integrated into our Full Stack IDE:

Article content

Now, assess mobile developers in their true native environment. Our enhanced Full Stack questions now offer full support for both Java and Kotlin, the core languages powering the Android ecosystem. This allows you to evaluate candidates on authentic, real-world app development skills, moving beyond theoretical knowledge to practical application.

Article content

Say goodbye to setup drama and tool-switching. Candidates can now build, test, and debug Android and React Native applications directly within the browser-based IDE. This seamless, in-browser experience provides a true-to-life evaluation, saving valuable time for both candidates and your hiring team.

Assess the Skills That Truly Matter

With native Android support, your assessments can now delve into a candidate's ability to write clean, efficient, and functional code in the languages professional developers use daily. Kotlin's rapid adoption makes proficiency in it a key indicator of a forward-thinking candidate ready for modern mobile development.

Breakup of Mobile development skills ~95% of mobile app dev happens through Java and Kotlin
This chart illustrates the importance of assessing proficiency in both modern (Kotlin) and established (Java) codebases.

Streamlining Your Assessment Workflow

The integrated mobile emulator fundamentally transforms the assessment process. By eliminating the friction of fragmented toolchains and complex local setups, we enable a faster, more effective evaluation and a superior candidate experience.

Old Fragmented Way vs. The New, Integrated Way
Visualize the stark difference: Our streamlined workflow removes technical hurdles, allowing candidates to focus purely on demonstrating their coding and problem-solving abilities.

Quantifiable Impact on Hiring Success

A seamless and authentic assessment environment isn't just a convenience, it's a powerful catalyst for efficiency and better hiring outcomes. By removing technical barriers, candidates can focus entirely on demonstrating their skills, leading to faster submissions and higher-quality signals for your recruiters and hiring managers.

A Better Experience for Everyone

Our new features are meticulously designed to benefit the entire hiring ecosystem:

For Recruiters & Hiring Managers:

  • Accurately assess real-world development skills.
  • Gain deeper insights into candidate proficiency.
  • Hire with greater confidence and speed.
  • Reduce candidate drop-off from technical friction.

For Candidates:

  • Enjoy a seamless, efficient assessment experience.
  • No need to switch between different tools or manage complex setups.
  • Focus purely on showcasing skills, not environment configurations.
  • Work in a powerful, professional-grade IDE.

Unlock a New Era of Mobile Talent Assessment

Stop guessing and start hiring the best mobile developers with confidence. Explore how HackerEarth can transform your tech recruiting.

Vibe Coding: Shaping the Future of Software

A New Era of Code

Vibe coding is a new method of using natural language prompts and AI tools to generate code. I have seen firsthand that this change makes software more accessible to everyone. In the past, being able to produce functional code was a strong advantage for developers. Today, when code is produced quickly through AI, the true value lies in designing, refining, and optimizing systems. Our role now goes beyond writing code; we must also ensure that our systems remain efficient and reliable.

From Machine Language to Natural Language

I recall the early days when every line of code was written manually. We progressed from machine language to high-level programming, and now we are beginning to interact with our tools using natural language. This development does not only increase speed but also changes how we approach problem solving. Product managers can now create working demos in hours instead of weeks, and founders have a clearer way of pitching their ideas with functional prototypes. It is important for us to rethink our role as developers and focus on architecture and system design rather than simply on typing c

Vibe Coding Difference

The Promise and the Pitfalls

I have experienced both sides of vibe coding. In cases where the goal was to build a quick prototype or a simple internal tool, AI-generated code provided impressive results. Teams have been able to test new ideas and validate concepts much faster. However, when it comes to more complex systems that require careful planning and attention to detail, the output from AI can be problematic. I have seen situations where AI produces large volumes of code that become difficult to manage without significant human intervention.

AI-powered coding tools like GitHub Copilot and AWS’s Q Developer have demonstrated significant productivity gains. For instance, at the National Australia Bank, it’s reported that half of the production code is generated by Q Developer, allowing developers to focus on higher-level problem-solving . Similarly, platforms like Lovable or Hostinger Horizons enable non-coders to build viable tech businesses using natural language prompts, contributing to a shift where AI-generated code reduces the need for large engineering teams. However, there are challenges. AI-generated code can sometimes be verbose or lack the architectural discipline required for complex systems. While AI can rapidly produce prototypes or simple utilities, building large-scale systems still necessitates experienced engineers to refine and optimize the code.​

The Economic Impact

The democratization of code generation is altering the economic landscape of software development. As AI tools become more prevalent, the value of average coding skills may diminish, potentially affecting salaries for entry-level positions. Conversely, developers who excel in system design, architecture, and optimization are likely to see increased demand and compensation.​
Seizing the Opportunity

Vibe coding is most beneficial in areas such as rapid prototyping and building simple applications or internal tools. It frees up valuable time that we can then invest in higher-level tasks such as system architecture, security, and user experience. When used in the right context, AI becomes a helpful partner that accelerates the development process without replacing the need for skilled engineers.

This is revolutionizing our craft, much like the shift from machine language to assembly to high-level languages did in the past. AI can churn out code at lightning speed, but remember, “Any fool can write code that a computer can understand. Good programmers write code that humans can understand.” Use AI for rapid prototyping, but it’s your expertise that transforms raw output into robust, scalable software. By honing our skills in design and architecture, we ensure our work remains impactful and enduring. Let’s continue to learn, adapt, and build software that stands the test of time.​

Ready to streamline your recruitment process? Get a free demo to explore cutting-edge solutions and resources for your hiring needs.

Top Products

Explore HackerEarth’s top products for Hiring & Innovation

Discover powerful tools designed to streamline hiring, assess talent efficiently, and run seamless hackathons. Explore HackerEarth’s top products that help businesses innovate and grow.
Frame
Hackathons
Engage global developers through innovation
Arrow
Frame 2
Assessments
AI-driven advanced coding assessments
Arrow
Frame 3
FaceCode
Real-time code editor for effective coding interviews
Arrow
Frame 4
L & D
Tailored learning paths for continuous assessments
Arrow
Get A Free Demo