北美数据科学SQL面经(NULL)

What are missing values in SQL?

Missing values, also known as NULL values, represent the absence of data in a column or attribute of a table. They occur when no value is provided or when the value is unknown or not applicable.

How are missing values handled in SQL?

In SQL, missing values are typically handled using functions and expressions specifically designed for dealing with NULL values. Some commonly used functions include IS NULL, IS NOT NULL, COALESCE, and NULLIF.

How can you identify missing values in SQL?

You can use the IS NULL and IS NOT NULL operators to identify missing values in SQL. For example, to select rows where a particular column has a missing value, you can use the following query:

How can you handle missing values in SQL?

There are several ways to handle missing values in SQL:

  1. Filtering: You can exclude rows with missing values using the WHERE clause.

  2. Filling with default values: You can use the COALESCE function to replace missing values with a default value.

  3. Conditional replacement: You can use the CASE WHEN statement to conditionally replace missing values based on certain criteria.

  4. Ignoring missing values: Some SQL functions and aggregate functions automatically ignore missing values when performing calculations.

  5. Data cleaning: If appropriate, you can clean or impute missing values using domain knowledge or statistical techniques.

What is the difference between NULL and an empty string in SQL?

In SQL, NULL represents the absence of a value or unknown value, whereas an empty string ('') is a valid value that indicates the presence of an empty or zero-length string.

How can you handle missing values when performing calculations in SQL?

When performing calculations in SQL, missing values (NULL) can propagate and lead to unexpected results. To handle this, you can use functions such as COALESCE or NULLIF to replace or ignore missing values during calculations.

Can you use aggregate functions with missing values in SQL?

Yes, SQL aggregate functions such as SUM, AVG, COUNT, and others can handle missing values by automatically excluding them from calculations. They treat missing values as if they don't exist when performing aggregations.

How do you count the number of missing values in a column in SQL?

To count the number of missing values in a column, you can use the COUNT function with the IS NULL condition. Here's an example:

How do you replace missing values with a specific value in SQL?

You can use the COALESCE function to replace missing values with a specific value in SQL. COALESCE returns the first non-null value from a list of arguments. Here's an example:

How do you handle missing values in a join operation in SQL?

When joining tables in SQL, missing values in the joined columns may result in unmatched rows. To handle this, you can use LEFT JOIN or RIGHT JOIN to include all rows from one table and match them with the corresponding rows from the other table. This ensures that missing values are preserved in the result set.

How do you handle missing values in SQL when sorting data?

In SQL, missing values are typically sorted as the highest or lowest values depending on the sort order (ascending or descending) specified in the ORDER BY clause. To handle missing values differently, you can use the NULLS FIRST or NULLS LAST clause in the ORDER BY clause to explicitly specify their position in the sorted result.

How do you handle missing values in SQL when grouping data?

When grouping data in SQL, missing values are treated as a separate group. If you want to exclude missing values from the groups, you can use the HAVING clause to filter out those groups. For example:

What are some common functions used to handle missing values in SQL?

Some common functions used to handle missing values in SQL are:

How can you handle missing values when creating indexes in SQL?

When creating indexes in SQL, missing values are typically handled automatically. Indexes are created based on the available values in the columns being indexed, and missing values are treated as separate entries in the index. However, it's important to consider how missing values may affect query performance and indexing strategies based on your specific use case.