Synkoc Data Science Internship · Week 1 · Lesson 1 of 11
Python
for Data Science
Welcome to Lesson 1 of the Synkoc Data Science Internship. Python for Data Science. I am your instructor, and I want to start with a fact that changes how seriously you take this lesson. Python is ran...
Variables & Types
Control Flow
Functions
Data Structures
🧑💻
Synkoc Instructor
Data Science Internship · Bangalore
⏱ ~65:00
📗 Lesson 1 of 11
Here is your roadmap for this lesson
We cover five complete pillars. First, variables and data types — the way you store every piece of data in any program. Second, control flow — how Python makes decisions and repeats actions. Third, functions — how you write reusable, professional code. Fourth, data structures — lists, dictionaries, tuples, and sets, which are the direct predecessors to the Pandas DataFrames you will use from Lesson 4 onwards. And fifth, file handling — because data science always starts with reading data from files. These five skills are used in every single line of professional data science code. Master them
Chapter 1 of 5
01
Variables & Types
Chapter 1: Variables and Data Types. A variable is a named container in your computer's memory that stores a value. You create a variable by writing the name on the left, then an e
Python has four fundamental data types
The string type, written as str, stores text — column names, labels, category values, file paths. The integer type, written as int, stores whole numbers — counts of rows, years, the number of trees in a random forest. The float type stores decimal numbers — accuracy scores like 0.943, prices, measurement values, probabilities. And boolean, written as bool, stores True or False — whether a customer churned, whether a transaction is fraud, whether a model converged. These four types appear in every line of data science code you will ever write.
Type conversion is essential in data science because real-world data is messy
When you read a CSV file, numbers often come in as strings — the text '42' instead of the integer 42. If you try to add a string '42' to a number, Python raises an error. You convert with int(), float(), str(), and bool(). For example, int('42') gives you the integer 42. float('3.14') gives you the float 3.14. More importantly, Pandas has a method called astype that converts entire DataFrame columns at once — pd.to_numeric for converting strings to numbers, or astype(int) for explicit conversion. Understanding types prevents the most common class of beginner bugs.
Chapter 2 of 5
02
Control Flow
Chapter 2: Control Flow. Control flow determines which lines of Python execute, in what order, and how many times. Without control flow, Python simply runs every line from top to b
The if statement checks a condition and runs code only when that condition is Tr
You write if, then your condition, then a colon, then indent the code you want to run. You can chain elif for multiple conditions and else for the default case. In data science, if statements are used everywhere: if accuracy is above 0.90, deploy the model. If missing values exceed 30 percent, drop the column. If the p-value is below 0.05, reject the null hypothesis. Every business decision in data science is ultimately an if statement.
The for loop iterates over a sequence and runs your code for each item
You write for, then a variable name that will hold each item, then in, then the sequence, then a colon, then indent your code. Python's range function generates sequences of numbers. Enumerate adds an index to any sequence. List comprehensions give you the power of a for loop in a single readable line — output equals open bracket expression for variable in sequence if condition close bracket. In data science, list comprehensions are used constantly to apply transformations to columns, filter rows, and build feature vectors.
Chapter 3 of 5
03
Functions
Chapter 3: Functions. A function is a named, reusable block of code. You define it once with the def keyword, give it a name, list any parameters it accepts in parentheses, add a c
Default parameters make functions flexible
When you define a parameter with equals and a value, callers can omit that argument and the default is used. For example, def train_model open paren data comma test_size equals 0.2 close paren. Calling train_model with just data automatically uses test_size of 0.2. Lambda functions are one-line anonymous functions: lambda x colon x times 2. They are used constantly with map, filter, sorted, and Pandas apply. Docstrings — triple-quoted strings immediately after the function definition — document what the function does, its parameters, and its return value. Senior engineers judge code quality la
Chapter 4 of 5
04
Data Structures
Chapter 4: Data Structures. A single variable holds one value. A data structure holds many values with a defined organisation. Python's four core data structures are: the list for
Lists are ordered, mutable sequences created with square brackets
They support indexing with zero-based integers — list zero gives the first element, list negative one gives the last. Slicing extracts subsequences: list colon 5 gives the first 5 elements. Common methods: append adds to the end, extend adds multiple items, pop removes the last, sort reorders in place, sorted returns a new sorted copy. In data science, lists store feature names, accuracy scores across folds, and batch results.
Dictionaries map keys to values using curly braces
You create a dictionary as name equals open curly brace key colon value pairs separated by commas close curly brace. Access values with dict open bracket key close bracket or dict.get method for safe access. Iterate with .items() to get both key and value. In data science, dictionaries store model configurations, hyperparameter grids, evaluation metric results, and category encoding maps. The Pandas DataFrame is fundamentally a dictionary of column arrays — understanding dictionaries makes Pandas much more intuitive.
Sets store unique values with no duplicates and no defined order
Created with set() or curly braces. Extremely useful for finding unique values, checking membership in O(1) time, and set operations — union, intersection, difference. In data science, sets are used to find unique categories, check if a feature set contains duplicates, and compute vocabulary size in NLP. Tuples are immutable lists — they cannot be modified after creation. Used for function return values, dictionary keys, and coordinate pairs. Immutability is a safety feature — it prevents accidental modification.
Let us bring everything together with a complete data science mini-pipeline usin
We have a list of dictionaries representing customer records. We define a clean_records function that filters out records with missing ages or negative spending. We define a compute_stats function that takes a list of numbers and returns mean, min, max, and standard deviation using only Python built-ins. We define a segment_customers function that categorises each customer as Premium, Standard, or Basic using if-elif-else. We loop over the results and print a formatted report using f-strings. Every concept from this lesson — variables, types, control flow, functions, data structures — works to
Lesson 1 complete
You now have the Python foundation to learn everything else in this Data Science Internship. Variables store data. Data types define what operations are valid. If statements make decisions. For loops process collections. Functions package reusable logic. Lists, dictionaries, tuples, and sets organise complex data. These are the exact skills you will use every day as a data scientist. The practical lab for this lesson has 6 hands-on exercises that build each of these skills step by step. Complete every single exercise before moving to Lesson 2. In the next lesson we cover Statistics and Probabi