Computer Science: Data Science Fundamentals
Data analysis, pandas, visualization, and the data science pipeline.
Study these flashcards with spaced repetition
Track your progress, master difficult cards, and export to Anki. Free to start.
Start Studying — FreeFlashcards in This Deck
What is a pandas DataFrame?
A two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).
Which pandas function is used to load data from a comma-separated values file into a DataFrame?
pd.read_csv()
What is the primary data structure in NumPy used for numerical computing?
The ndarray (n-dimensional array).
In a standard box plot, what does the box itself represent?
The Interquartile Range (IQR), which spans from the first quartile (Q1) to the third quartile (Q3), containing the middle 50% of the data.
What is the primary goal of the 'Data Cleaning' stage in the data science pipeline?
To identify and correct errors, handle missing values, and remove inconsistencies to ensure the data is accurate and high-quality for analysis.
Which Python library is built on top of Matplotlib and provides a high-level interface for drawing attractive statistical graphics?
Seaborn
Explain the concept of 'Broadcasting' in NumPy.
Broadcasting is a mechanism that allows NumPy to perform arithmetic operations on arrays of different shapes by 'stretching' the smaller array to match the dimensions of the larger one.
Describe the three steps of the 'Split-Apply-Combine' strategy used in pandas groupby operations.
1. Split: Data is divided into groups based on a key. 2. Apply: A function (like mean or sum) is calculated for each group. 3. Combine: Results are merged into a new data structure.
What is the difference between an 'inner join' and an 'outer join' when using pd.merge()?
An inner join returns only rows with matching keys in both DataFrames, while an outer join returns all rows from both, filling missing matches with NaN.
What is the formula for Min-Max Normalization to scale a feature x to the range [0, 1]?
x_scaled = (x - x_min) / (x_max - x_min)
+10 more cards — sign up to see all
Frequently Asked Questions
How many flashcards are in this Computer Science: Data Science Fundamentals deck?
This deck contains 20 flashcards with a mix of difficulty levels: 6 easy, 10 medium, and 4 hard cards.
Is this flashcard deck free to use?
Yes! You can study these flashcards for free with our spaced repetition system. Create a free account to track your progress and save your study history.
Can I export these flashcards to Anki?
Pro users can export any deck to Anki (.apkg format) with one click. Free users can export to CSV. Start studying for free and upgrade when you need Anki export.
What is spaced repetition?
Spaced repetition is a study technique that shows you cards at increasing intervals based on how well you know them. Cards you struggle with appear more often, while mastered cards are shown less frequently. This is proven to be one of the most effective ways to memorize information.
Related Flashcard Decks
Ready to study?
Create a free account and start studying these flashcards with spaced repetition.
Get Started — Free