Uber Business Analyst Interview Experience: SQL, Analytical & Guesstimate Questions
In this blog post, we dive deep into the Uber Business Analyst interview experience, especially focusing on SQL-related questions, analytical thinking, and guesstimate scenarios. If you are preparing for an Uber Business Analyst role or similar data-driven positions, understanding the questions asked and their context will give you a competitive edge. We’ll explain each question in detail with examples to help you prepare effectively.
1. Write an SQL query to extract the third transaction of every user, displaying user ID, spend, and transaction date.
         Explanation:
        To retrieve the third transaction of each user, you need to use a combination of `ROW_NUMBER()` to assign a sequence number to each transaction for every user. Then, filter those transactions where the row number equals 3.
      
        ```sql
        
          
            WITH RankedTransactions AS (
               SELECT 
                   user_id,
                   transaction_id,
                   spend,
                   transaction_date,
                   ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY transaction_date) AS transaction_rank
               FROM transactions
            )
            SELECT user_id, spend, transaction_date
            FROM RankedTransactions
            WHERE transaction_rank = 3;
          
        
        
        ```
      
        Example:
        
        Assume that we have a user with 5 transactions. This query will return the third transaction in terms of the earliest date for that user.
      
2. Calculate the average ratings for each driver across different cities using data from rides and ratings tables.
         Explanation:
        You need to join the rides and ratings tables, group the data by `driver_id` and `city`, and then calculate the average rating.
      
        ```sql
        
          
            SELECT 
               r.driver_id,
               r.city,
               AVG(rt.rating) AS average_rating
            FROM rides r
            JOIN ratings rt ON r.ride_id = rt.ride_id
            GROUP BY r.driver_id, r.city;
          
        
        
        ```
      
         Example:
        Consider a driver in Delhi who has been rated 4.5, 5.0, and 4.8 by different passengers in different rides. This query will return the average rating for that driver across the city.
      
3. Create an SQL query to identify customers registered with Gmail addresses from the 'users' database.
         Explanation:
        To filter out customers who have Gmail addresses, you can use the `LIKE` operator with the condition that the email address contains `@gmail.com`.
      
        ```sql
        
          
            SELECT user_id, email
            FROM users
            WHERE email LIKE '%@gmail.com';
          
        
        
        ```
      
         Example:
        This query will return all users who have a Gmail address, such as `john.doe@gmail.com`, from the `users` table.
      
4. What does database denormalization mean?
         Explanation:
        Denormalization is the process of combining data from multiple tables into a single table to reduce the complexity of queries and improve performance. It involves introducing redundancy by incorporating some level of duplication of
        data, which can lead to faster read access at the cost of increased storage and more complex write operations.
      
         Example:
        In a normalized database, you might have separate `users`, `orders`, and `products` tables. Denormalization would mean merging these into a single table with repeated user and product data, which speeds up queries for order history
        at the expense of redundancy.
      
5. Analyze the click-through conversion rates using data from ad_clicks and cab_bookings tables.
         Explanation:
        The click-through conversion rate is calculated by dividing the number of bookings that occurred after clicking an ad by the total number of ad clicks. You need to join the `ad_clicks` and `cab_bookings` tables on the user ID or
        other relevant identifiers.
      
        ```sql
        
          
            SELECT 
               ac.ad_id,
               COUNT(DISTINCT cb.booking_id) AS total_bookings,
               COUNT(DISTINCT ac.click_id) AS total_clicks,
               (COUNT(DISTINCT cb.booking_id) / COUNT(DISTINCT ac.click_id)) AS conversion_rate
            FROM ad_clicks ac
            LEFT JOIN cab_bookings cb ON ac.user_id = cb.user_id
            WHERE ac.click_time <= cb.booking_time
            GROUP BY ac.ad_id;
          
        
        
        ```
      
         Example:
        This query will show the conversion rate for each ad, showing how many people who clicked on an ad eventually made a cab booking.
      
6. Define a self-join and provide a scenario for its application.
         Explanation:
        A self-join is a join operation where a table is joined with itself. It’s useful when you want to compare rows within the same table.
      
         Example Scenario:
        Imagine an employee table where each employee has a `manager_id` that references the `employee_id` of their manager. A self-join can be used to get a list of employees and their managers.
      
        ```sql
        
          
            SELECT 
               e1.employee_name AS employee, 
               e2.employee_name AS manager
            FROM employees e1
            JOIN employees e2 ON e1.manager_id = e2.employee_id;
          
        
        
        ```
      
         Example:
        This will show the names of employees alongside their respective managers.
      
Scenario-Based Question
1. What is the probability that at least two of three recommended routes for drivers are the fastest, given a 70% success rate?
         Explanation:
        You can use the complement rule in probability. If the success rate for each route is 70%, then the probability of a route not being the fastest is 30%. For three routes, the probability that at least two are the fastest can be
        calculated using the binomial distribution formula.
      
         Formula:  
        P(at least two fastest) = 1 - P(0 fastest) - P(1 fastest)
      
This requires calculating the individual probabilities and then subtracting from 1.
Guesstimate Questions
1. Estimate the number of Uber drivers in Delhi.
         Explanation:
        For guesstimate questions, you need to use logical assumptions and make rough estimations. Here’s how you might approach this question:
      
- Assume Delhi has a population of around 20 million people.
- Assume that 5% of the population uses Uber regularly.
- Assume that on average, each Uber driver serves 50 passengers per day.
- Divide the total number of Uber users by the number of passengers served by each driver.
Using these assumptions, you can make a reasonable estimate of the number of drivers.
2. How many Uber cabs leave Bengaluru Airport in a day?
         Explanation:
 To estimate this, you would need to consider the following:
- Estimate the number of flights arriving at Bengaluru Airport daily.
- Assume a percentage of passengers use Uber to reach the city.
- Assume that each Uber car makes multiple trips a day.
In an Uber Business Analyst interview, you’ll be asked to demonstrate your proficiency in SQL, your ability to analyze data, and your problem-solving skills through analytical and guesstimate questions. Understanding how to approach each question logically, with examples, is key to success in these interviews. We hope this breakdown helps you feel more confident as you prepare for your Uber interview.
