Synkoc AI/ML Internship · Week 1 · Lesson 1 of 11
Python for
AI & Machine Learning
Master the complete Python foundation — Variables, Loops, Functions & Data Structures. Every concept connects directly to real ML code.
📦 Variables
🔁 Loops
⚙️ Functions
🗂️ Data Structures
🧑💻
Synkoc Instructor
AI/ML Professional · Bangalore
⏱ ~60 minutes
🟢 Beginner Friendly
By the end of this lesson, you will be able to...
📦
Create & use Variables
Store any data type. Understand why every ML model's weights, accuracy and labels are stored in variables.
🔁
Write Loops that process data
Iterate over entire datasets automatically. Understand how ML training loops over every example in every epoch.
⚙️
Build reusable Functions
Write clean, reusable logic with def & return. Every sklearn algorithm is a function you call with data.
🗂️
Organise with Data Structures
Use Lists, Dicts, Tuples, Sets. Direct predecessors to NumPy arrays and Pandas DataFrames in Week 2.
Chapter 1 of 4
01
Variables
The most fundamental concept in all of programming. Every ML model's weights, accuracy, and labels live in variables.
What is a Variable?
A variable is a named container that stores a value in memory. Give it a name, assign a value — Python handles the rest.
📦
name = value
The equals sign means assignment — store the right-side value under the left-side name. Any time Python sees that name, it retrieves the stored value from memory. You can reassign any time.
student_name = "Priya" # String — text in quotes
exam_score = 94.5 # Float — decimal number
batch_size = 32 # Integer — whole number
is_trained = False # Boolean — True or False
⚡ML Connection: learning_rate = 0.001 · epochs = 100 · accuracy = 0.956 · model_name = "RandomForest" — every ML project config lives in variables exactly like these.
The 4 Core Data Types
Python auto-detects type from the value you assign. These 4 types cover 95% of everything you store in an ML project:
🔢
Integer (int)
Whole numbers. For epoch counts, batch sizes, neuron counts, tree counts.
epochs = 100
batch_size = 32
n_trees = 200
ML: epochs, layers, trees
📏
Float (float)
Decimals. For accuracy %, learning rates, model weights, probabilities.
accuracy = 0.956
lr = 0.001
dropout = 0.2
ML: accuracy, loss, weights
📝
String (str)
Text in quotes. For class labels, file paths, feature names, model names.
label = "spam"
file = "data.csv"
model = "SVM"
ML: labels, paths, names
✅
Boolean (bool)
True or False only. For flags, conditions, binary classification outputs.
verbose = True
is_trained = False
use_gpu = True
ML: flags, binary outputs
variables_demo.py● LIVE
1# ── ML Project Config ────────────────────────
2project_name = "Synkoc ML Internship" # String
3learning_rate = 0.001 # Float
4epochs = 100 # Integer
5verbose = True # Boolean
6
7print(f"Project: {project_name}")
8print(f"LR: {learning_rate} | Epochs: {epochs}")
9epochs = 200 # Reassignment — update any time
10print(f"Updated epochs: {epochs}") # 200
4 types in one config block. String, Float, Integer, Boolean. f-strings embed values inline. Reassignment on line 9 — this is how learning rate decay works in ML training.
Variables in Real ML Projects
Every professional ML project starts with a config block. Every setting has a descriptive variable name — change one variable to update the entire project.
ml_project_config.py
1# ── Project Config ──────────────────────────────
2project_name = "Synkoc Student Pass/Fail Predictor"
3dataset_path = "data/students.csv"
4target_col = "passed" # What we are predicting
5
6# ── Model Hyperparameters ───────────────────────
7learning_rate = 0.001 # How fast the model learns
8epochs = 100 # Training rounds
9test_size = 0.2 # 20% held back for testing
10random_state = 42 # Seed for reproducibility
11verbose = True # Print training progress
💡
Professional Tip
Always use descriptive names like learning_rate not just lr. Change one variable → entire project updates. Every ML team at every company follows this pattern.
Chapter 2 of 4
02
Loops
Repeat actions over data without writing the same code thousands of times. The engine behind every ML training process ever built.
What is a Loop?
A loop says: "For every item in this collection — do this action." Write your logic once. Python repeats it automatically for every item.
🔁
for item in collection:
Three parts: the for keyword, a variable name that holds the current item, and the collection. Colon ends the line. Indented code below runs once per item — automatically, for every item, start to end.
scores = [78, 92, 65, 88, 71]
for score in scores:
print(f"Processing: {score}")
# Visits: 78 → 92 → 65 → 88 → 71
⚡ML Connection: Training on 10,000 records × 50 epochs = 500,000 loop iterations. The for loop handles every single one automatically — you write the logic once.
loops_demo.py● LIVE
1scores = [78, 92, 65, 88, 71]
2
3for score in scores:
4 print(f"Score: {score}")
5
6passes = 0
7for s in scores:
8 if s >= 70:
9 passes += 1
10print(f"Pass rate: {passes/len(scores)*100:.0f}%") # 80%
for score in scores: visits 78, 92, 65, 88, 71 automatically. Counter pattern (lines 6-9) is exactly how accuracy_score() works inside sklearn.
The for Loop — Every Part Explained
Six parts. Every part matters. Missing any one causes an error immediately.
💡 Real Life Analogy — The Delivery Driver
A driver has 100 addresses. Picks up package 1, drives to address 1, delivers, returns. Address 2 — same. Address 3 — same. Identical action for every address in order until the list is empty. That is a for loop. The list is your collection. Each address is one item. The delivery action is your loop body. Python is the infinitely patient driver — never skips, never gets tired.
for_loop_anatomy.py
1for student in class_list:
2 print(student) # runs once per student
3# Part 1: "for" → keyword that starts the loop
4# Part 2: "student" → YOUR variable (holds current item)
5# Part 3: "in" → connects variable to collection
6# Part 4: "class_list" → the collection to loop through
7# Part 5: ":" → colon — NEVER forget this!
8# Part 6: 4 spaces → indentation = inside the loop
⚠️
Most Common Beginner Mistakes
Forgetting the : gives SyntaxError. Forgetting the 4-space indent gives IndentationError. Python uses indentation as actual syntax — not just style.
range() — Loops with a Counter
When you need to repeat something a fixed number of times — not over a list — use range(). This is how every neural network epoch loop is written.
🔢
range(n) — count 0 to n-1
Gives 0, 1, 2 ... n-1. Standard form for epoch training loops in ML.
for epoch in range(5):
print(f"Epoch {epoch+1}/5")
# Epoch 1/5 ... Epoch 5/5
📈
range(start, stop)
Numbers from start up to (not including) stop. Use when tracking index positions.
scores = [85, 72, 91, 68]
for i in range(len(scores)):
print(f"Student {i+1}: {scores[i]}")
⚡ML Connection: for epoch in range(100): is the standard training loop. 100 epochs = 100 complete passes through your training data. The epoch variable tracks progress for printing, learning rate decay, and saving checkpoints.
The while Loop — Repeat Until Done
Repeats as long as its condition is True. Stops the moment it becomes False. Use when you don't know how many iterations are needed in advance.
🔄
while condition: → run. False → stop.
Checks condition before every iteration. Critical: your loop body must eventually make the condition False — otherwise it runs forever, an infinite loop crash.
accuracy = 0.50
while accuracy < 0.90:
accuracy += 0.08
print(f"Training... acc={accuracy:.2f}")
print("Target reached!")
⚡ML Connection: Early stopping uses this pattern — keep training while validation loss is still improving. Stop when it plateaus. You don't know if this takes 10 or 50 epochs.
Nested Loops & Loop + if — The ML Training Pattern
A loop inside a loop is nested. This is the exact structure running inside every neural network training call ever built.
🔁
Loop + if/else — Filter data
Process only items meeting a condition. This is how you filter a dataset — keep only fraud rows, only adult records.
data = [85, 42, 91, 55, 78]
passed, failed = [], []
for score in data:
if score >= 60:
passed.append(score)
else:
failed.append(score)
⚙️
Nested Loop — Deep Learning Training
Outer loop: epochs. Inner loop: batches. This IS what model.fit() runs inside Keras every time you train a neural network.
for epoch in range(epochs):
print(f"=== Epoch {epoch+1} ===")
for batch in batches:
loss = train_step(batch)
print(f" loss: {loss:.4f}")
💡
This Is the Core of Every Neural Network
Every Keras model.fit() runs this nested loop internally. Outer: for epoch in range(100). Inner: for batch in dataloader. Inside: forward pass → loss → backprop → update weights. You will write this in Week 4 Deep Learning.
loops_real_ml.py● LIVE
1dataset = [{"name":"Priya","score":85},{"name":"Raj","score":52},{"name":"Kavya","score":91}]
2
3# Loop 1: compute average
4total = 0
5for s in dataset:
6 total += s["score"]
7print(f"Avg: {total/len(dataset):.1f}") # 76.0
8
9# Loop 2: filter passed students
10passed = [s for s in dataset if s["score"] >= 60]
11print(f"Passed: {len(passed)}/{len(dataset)}") # 2/3
12
13# Loop 3: epoch training simulation
14for epoch in range(3):
15 loss = 1.0 - (epoch * 0.3)
16 print(f"Epoch {epoch+1}/3 loss={loss:.1f}")
3 patterns every ML engineer uses daily. Loop to compute mean = what df.mean() does. List comprehension filter = what df[df.score>=60] does. Epoch loop = what model.fit() runs inside.
The Loop Analogy
Synkoc Instructor Analogy
"Imagine a Synkoc instructor marking 30 exam papers. She picks up paper 1, grades it, puts it down. Paper 2 — same process. Paper 3 — same. She repeats the identical action for every paper until done. That is a for loop. The pile of papers is your list. Each paper is one item. The grading action is your loop body. Python is the instructor — infinitely patient, never skipping, completing every iteration without mistakes."
🤖
In Real ML Training
10,000 records × 100 epochs = 1,000,000 iterations. The for loop — the exact same one you are learning right now — handles all of it. TensorFlow and PyTorch training loops are built on this exact concept.
Chapter 3 of 4
03
Functions
Write code once, use it a thousand times. Every sklearn algorithm — LinearRegression, KMeans, RandomForest — is a function you call with your data.
What is a Function?
A function is a named, reusable block of code. Define once with def. Call from anywhere with any data. return sends the result back.
⚙️
def function_name(parameters):
Four parts: def starts it, a descriptive name, parameters (placeholder names for inputs), and return. Call it by writing the name with actual values — called arguments.
def calculate_accuracy(correct, total):
return (correct / total) * 100
result = calculate_accuracy(87, 100)
print(f"Accuracy: {result}%") # → 87.0%
⚡ML Connection: model.fit(X, y) · model.predict(X_test) · accuracy_score(y_true, y_pred) — you already call functions every time you use sklearn. Now you write your own.
functions_demo.py● LIVE
1def compute_average(scores):
2 """Returns the mean of a list"""
3 return sum(scores) / len(scores)
4
5avg = compute_average([85, 92, 78, 96])
6print(f"Average: {avg:.2f}") # 87.75
7
8def grade(score, threshold=60):
9 return "PASS" if score >= threshold else "FAIL"
10print(grade(75)) # PASS
11print(grade(75, 80)) # FAIL
def defines the function. return sends result back. Default parameter threshold=60 works exactly like RandomForestClassifier(n_estimators=100) in sklearn.
Function Anatomy — Every Part Explained
Six components. Each has a specific job. Understand each one and you can read any function in any ML library.
💡 Real Life Analogy — The ATM Machine
ATM built once. Name on front: WITHDRAW CASH. You insert card and type amount — inputs. Machine runs internal logic. Gives cash and receipt — return values. You never need to know how it works. Same inputs, same outputs, every time. A Python function works identically. Define once, call from anywhere with any inputs, get back the correct output every single time.
function_anatomy.py
1# ↓ def keyword ↓ function name ↓ parameters
2def calculate_bmi(weight_kg, height_m):
3 """Calculate Body Mass Index for health ML models"""
4 bmi = weight_kg / (height_m ** 2)
5 return round(bmi, 2) # ← return sends result back
6result = calculate_bmi(70, 1.75) # call with arguments
7print(result) # → 22.86
💡
Parameter vs Argument
Parameter = placeholder in the definition (weight_kg). Argument = actual value when calling (70). Python substitutes 70 wherever weight_kg appears in the body.
Default Parameters & Multiple Returns
Two features used in every sklearn function. Default params make functions flexible. Multiple returns let one function give back several values at once.
🎯
Default Parameters
Give a parameter a default value. Caller overrides or leaves it. Exactly how sklearn works — most parameters have sensible defaults you rarely need to change.
def train_model(data,
epochs=100,
lr=0.001):
pass
train_model(data) # uses defaults
train_model(data, epochs=50) # override one
🔘
Multiple Return Values
Return several values separated by commas. Caller unpacks into separate variables. Exactly how train_test_split works in sklearn.
def split_data(data, ratio=0.8):
n = int(len(data) * ratio)
return data[:n], data[n:]
# Unpack both return values
train, test = split_data(dataset)
⚡ML Connection: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) — default parameter and 4 return values unpacked in one line. This is the exact pattern above.
Variable Scope — Inside vs Outside
Scope means where a variable can be seen. Write pure functions for reliable, predictable ML code.
📌
Local = inside function only. Global = visible everywhere.
Local variables disappear after return. Two functions can both use a variable named result without any conflict. Best practice: write pure functions — only use parameters as inputs. Same inputs always give same outputs. Every sklearn algorithm is pure.
score = 95 # GLOBAL — visible everywhere
def check_pass(s):
threshold = 60 # LOCAL — only inside here
return s >= threshold
passed = check_pass(score) # True
# print(threshold) ← NameError! threshold is local only
functions_ml_pipeline.py● LIVE
1def load_data(filepath):
2 """Load and return dataset as list of dicts"""
3 return [{"name":"Priya","score":85},{"name":"Raj","score":52},{"name":"Kavya","score":91}]
4
5def normalise(values):
6 """Scale to 0-1 range — same as StandardScaler"""
7 mn, mx = min(values), max(values)
8 return [(v-mn)/(mx-mn) for v in values]
9
10def predict(score, threshold=0.5):
11 return "PASS" if score >= threshold else "FAIL"
12
13data = load_data("students.csv")
14scores = [s["score"] for s in data]
15normed = normalise(scores)
16for i, s in enumerate(data):
17 print(f"{s['name']}: {normed[i]:.2f} → {predict(normed[i])}")
Real 3-function ML pipeline. load_data → normalise (= StandardScaler) → predict (= classifier). In Week 3 replace with sklearn equivalents. The pipeline structure stays identical.
Chapter 4 of 4
04
Data Structures
Organise collections of data — the containers that hold your entire dataset before feeding it into any ML model.
The 4 Core Data Structures
When a single variable is not enough, use a structure. These 4 are the direct foundation of NumPy arrays and Pandas DataFrames in Week 2:
📋
List [ ]
Ordered, changeable, allows duplicates. Access by index from 0. The most-used structure in all of data science.
scores = [85, 92, 78, 96]
labels = ["pass","fail","pass"]
🗂️
Dictionary { key: value }
Key-value pairs — access by name. One dictionary = one complete row of your ML dataset with feature names mapped to values.
student = {"name":"Rahul",
"score":91, "passed":True}
🔒
Tuple ( )
Like a list but immutable — cannot be changed. Use for fixed shapes and configs that must never be accidentally modified.
img_shape = (224, 224, 3)
split_ratio = (0.8, 0.2)
⚡
Set { }
Unordered, unique values only — duplicates auto-removed. Pass 10,000 labels in, get only unique class names back.
classes = {"cat","dog","bird"}
unique = set(all_labels)
data_structures_demo.py● LIVE
1scores = [85, 92, 78] # List
2student = {"name":"Priya", "score":94} # Dict
3shape = (28, 28, 1) # Tuple
4labels = {"spam", "ham"} # Set
5
6print(scores[0]) # 85 (index from 0)
7print(student["name"]) # Priya (by key)
8print(shape[0]) # 28 (immutable)
9print(len(labels)) # 2 (unique only)
10
11# List of dicts = dataset = what Pandas DataFrame IS
12dataset = [{"name":"Priya","score":94}, {"name":"Raj","score":72}]
List index from 0 · Dict by name · Tuple immutable · Set unique only. List of dicts at bottom = exactly what a Pandas DataFrame is internally.
Lists In Depth — Most Used Structure in Data Science
Every feature column is a list. Every prediction sequence is a list. Every training batch is a list. Master this completely.
💡 Real Life Analogy — The Marks Register
Teacher's marks register: scores in order, row 1 = student 1, row 2 = student 2. Find by row number. Add, remove, update — changeable. A Python list is this register. Each item has a numbered index starting at 0. Order preserved exactly as inserted.
lists_complete.py
1scores = [85, 92, 78, 96, 71]
2print(scores[0]) # 85 (first — index 0)
3print(scores[-1]) # 71 (last)
4print(scores[1:3]) # [92, 78] (slice)
5scores.append(88) # add to end
6scores.sort() # sort ascending
7print(len(scores)) # 6 (count)
8print(sum(scores)) # total
9print(max(scores)) # highest
10print(min(scores)) # lowest
⚡Pandas connection: df["score"].tolist() converts a column to a list. len(df) = len(list). df["score"].max() = max(list). Pandas is powered by lists internally.
Dictionaries In Depth — One Row = One Dictionary
One dict = one complete record. A list of dicts = a dataset. This is exactly what a Pandas DataFrame is internally.
💡 Real Life Analogy — The Aadhaar Card
Aadhaar card = dictionary. Named fields: name, date of birth, address, number — each with a value. You find info by field name, not by position. "Give me the name" not "give me item 3". Dictionary access: by key, not index. Every row of a dataset is this card.
dictionaries_complete.py
1student = {"name":"Priya", "age":21, "score":94.5, "passed":True}
2print(student["name"]) # Priya
3print(student["score"]) # 94.5
4student["grade"] = "A" # add new key
5student["score"] = 96.0 # update existing
6
7# List of dicts = dataset = what Pandas DataFrame IS
8dataset = [
9 {"name":"Priya", "score":94.5},
10 {"name":"Raj", "score":72.0},
11]
⚡Pandas connection: df.iloc[0] returns first row as a dict-like object. df.to_dict("records") returns exactly a list of dicts. Understanding dicts means understanding Pandas internally.
Tuples & Sets — When to Use Each
Tuples for values that must never change. Sets for instantly finding unique values. Both have specific ML jobs.
🔒
Tuple ( ) — Lock it in place
Round brackets. Immutable after creation. Use for values fixed by design — image dimensions, model input shapes that must never change.
# Image dimensions — never change
img_shape = (224, 224, 3)
# Keras model input shape
input_shape = (784,) # MNIST
# Try to change → TypeError!
# img_shape[0] = 128 ← BLOCKED
📑
Set { } — Unique values only
Curly brackets, values only. Removes ALL duplicates automatically. Find unique class labels instantly.
labels = ["spam","ham","spam",
"ham","spam","ham"]
classes = set(labels)
print(classes) # {"spam","ham"}
n = len(classes) # 2
print("spam" in classes) # True
💡
Decision Rule — Which Structure to Use
Ordered changeable collection → List. Access by name → Dict. Fixed immutable values → Tuple. Unique values only → Set. Pandas DataFrame = optimised list of dicts. NumPy array = optimised list of numbers.
all_4_structures_ml.py● LIVE
1# All 4 structures in one ML pipeline
2config = {"model":"RF", "trees":100} # DICT
3dataset = [{"age":22,"label":"no"},
4 {"age":35,"label":"yes"},
5 {"age":28,"label":"yes"}] # LIST OF DICTS
6
7ages = [r["age" ] for r in dataset] # LIST
8labels = [r["label"] for r in dataset]
9classes = set(labels) # SET
10shape = (len(ages), 1) # TUPLE
11print(f"Dataset: {shape} | Classes: {classes}")
All 4 in one pipeline. Dict for config. List of dicts = dataset (same as pd.read_csv()). List comprehension = df["age"].tolist(). Set for unique classes. Tuple for shape. You now understand the internals of every Pandas and sklearn operation.
All 4 Pillars Together — Complete Program
student_analyzer.pyComplete Program
1# DATA STRUCTURE — list of dicts, one per student
2students = [{"name":"Priya", "scores":[85,92,78,96]},
3 {"name":"Rahul", "scores":[70,65,80,75]},
4 {"name":"Anjali","scores":[95,98,92,97]}]
5
6# FUNCTION — takes any list, returns average
7def compute_average(scores):
8 return sum(scores) / len(scores)
9
10# LOOP — process every student automatically
11for s in students:
12 name = s["name"] # VARIABLE
13 avg = compute_average(s["scores"]) # FUNCTION CALL
14 print(f"{name}: Average = {avg:.1f}")
This program uses all 4 pillars: a list of dicts holds the data · a function computes averages · a loop processes every student · variables store name and avg. This is the exact pattern used in real ML data pipelines.
Lesson Summary
You have completed the Python foundation. Here is what you can now do in every ML project:
📦
Variables
Store any data type with a meaningful name. Configure ML projects professionally. Know int, float, str, bool and when to use each.
🔁
Loops
Iterate over any list with a for loop. Combine with if/else. Understand that ML training is a massive nested loop over data and epochs.
⚙️
Functions
Define reusable logic with def and return. Use parameters and defaults. Understand that every sklearn call is a function like the ones you now write.
🗂️
Data Structures
Use Lists, Dicts, Tuples, Sets. These are the direct foundation of NumPy arrays and Pandas DataFrames you will use in Week 2.
🚀
Python Complete!
Foundation mastered. Open the Practical Lab to write real code across 5 tasks. Complete the lab, then take the Quiz. Then — Statistics for Data Science.
✅ Video — Done
✏️ Practical Lab — Next
❓ Quiz — After Lab