SnapCards

Computer Science: Data Science Fundamentals

20 cards|
6 easy10 medium4 hard
computer sciencedata scienceanalytics

Data analysis, pandas, visualization, and the data science pipeline.

Study these flashcards with spaced repetition

Track your progress, master difficult cards, and export to Anki. Free to start.

Start Studying — Free

Flashcards in This Deck

1
easy

What is a pandas DataFrame?

A two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

2
easy

Which pandas function is used to load data from a comma-separated values file into a DataFrame?

pd.read_csv()

3
easy

What is the primary data structure in NumPy used for numerical computing?

The ndarray (n-dimensional array).

4
easy

In a standard box plot, what does the box itself represent?

The Interquartile Range (IQR), which spans from the first quartile (Q1) to the third quartile (Q3), containing the middle 50% of the data.

5
easy

What is the primary goal of the 'Data Cleaning' stage in the data science pipeline?

To identify and correct errors, handle missing values, and remove inconsistencies to ensure the data is accurate and high-quality for analysis.

6
easy

Which Python library is built on top of Matplotlib and provides a high-level interface for drawing attractive statistical graphics?

Seaborn

7
medium

Explain the concept of 'Broadcasting' in NumPy.

Broadcasting is a mechanism that allows NumPy to perform arithmetic operations on arrays of different shapes by 'stretching' the smaller array to match the dimensions of the larger one.

8
medium

Describe the three steps of the 'Split-Apply-Combine' strategy used in pandas groupby operations.

1. Split: Data is divided into groups based on a key. 2. Apply: A function (like mean or sum) is calculated for each group. 3. Combine: Results are merged into a new data structure.

9
medium

What is the difference between an 'inner join' and an 'outer join' when using pd.merge()?

An inner join returns only rows with matching keys in both DataFrames, while an outer join returns all rows from both, filling missing matches with NaN.

10
medium

What is the formula for Min-Max Normalization to scale a feature x to the range [0, 1]?

x_scaled = (x - x_min) / (x_max - x_min)

+10 more cards — sign up to see all

Frequently Asked Questions

How many flashcards are in this Computer Science: Data Science Fundamentals deck?

This deck contains 20 flashcards with a mix of difficulty levels: 6 easy, 10 medium, and 4 hard cards.

Is this flashcard deck free to use?

Yes! You can study these flashcards for free with our spaced repetition system. Create a free account to track your progress and save your study history.

Can I export these flashcards to Anki?

Pro users can export any deck to Anki (.apkg format) with one click. Free users can export to CSV. Start studying for free and upgrade when you need Anki export.

What is spaced repetition?

Spaced repetition is a study technique that shows you cards at increasing intervals based on how well you know them. Cards you struggle with appear more often, while mastered cards are shown less frequently. This is proven to be one of the most effective ways to memorize information.

Related Flashcard Decks

Ready to study?

Create a free account and start studying these flashcards with spaced repetition.

Get Started — Free