Skip to content

Concepts for Understanding AI

Topics

Concepts for Understanding AI

  • Data, Information, and Knowledge
  • Algorithms as Processing, Ranking, and Prediction
  • Data Representation
  • Probability, Statistics, and AI Reasoning

Weekly Discussion: Is Data Ever Neutral?

Please submit all graded work via Canvas.
Participation requirements and grading details are provided in Canvas.

AI systems depend on data, but data is never collected in a vacuum.
What counts as data, how it is recorded, and whose behavior becomes visible are all shaped by social, institutional, and technical choices.

This discussion asks you to move beyond abstract claims about “biased data” and closely examine one concrete dataset, focusing on how its construction can shape AI system behavior and users’ decision-making.

Your goal is to analyze how data is treated as evidence and the responsibilities of information professionals when working with such data.

Step 1: Choose a concrete dataset

Select one publicly available dataset that is used, or could reasonably be used, in an AI system.

Examples include (but are not limited to) datasets from:

The dataset may involve tabular records, text, images, or logs, as long as it is openly accessible and commonly used (or plausibly usable) in AI or data-driven systems.

Briefly describe:

  • What the dataset contains
  • Who collected it and for what original purpose
  • How it might be used in an AI system or data-driven decision process

How to view the dataset (no technical skills required)

Most public datasets are provided as CSV files.

To explore the dataset, you do not need to write code.

You can simply:

  • Download the CSV file
  • Open it using Excel, Google Sheets, or another spreadsheet program
  • Browse it as you would a normal spreadsheet:
  • Look at column names
  • Scroll through rows
  • Notice what kinds of values are recorded, repeated, or missing

You are not expected to perform calculations or modeling.
Your goal is to observe what the data contains and how it is structured.

Step 2: Data collection and visibility

Analyze how the dataset was constructed, focusing on what becomes visible and what does not.

You may consider:

  • Who or what is included in the dataset, and what is absent
  • Whether data collection was voluntary, automatic, institutional, or incidental
  • What kinds of behaviors, attributes, or outcomes are easier or harder to record
  • How historical, technical, or organizational constraints shaped the dataset

Focus on how the data came to exist, rather than evaluating it as good or bad.

Step 3: Representativeness and interpretation

Reflect on how this dataset should be interpreted in context.

You may address questions such as:

  • Representative of what population, behavior, or situation?
  • Can a dataset be internally consistent but still incomplete or misleading?
  • Are there cases where a dataset is intentionally limited, yet still useful for a specific purpose?
  • How might claims of “neutral,” “objective,” or “raw” data obscure these limitations?

Use your chosen dataset to ground your discussion.

Step 4: Professional responsibility and response

From the perspective of information professionals (e.g., librarians, archivists, data specialists, ICT professionals):

  • What responsibilities do professionals have when using, curating, or supporting AI systems built on this data?
  • In your view, what is one concrete step an information professional could take to help users better understand or interpret this dataset when it is used in an AI system?
  • What is one aspect of the dataset or system that information professionals would likely not be able to change, even if they recognized its limitations?

Focus on accountability and communication, rather than technical solutions.

Expected outcome

Your post should present a short, coherent analysis of one dataset by tracing how it was collected, interpreted, and positioned as evidence within an AI system. A strong post typically:

  • Clearly describes a specific dataset and its context
  • Explains how data construction shapes interpretation and use
  • Reflects on the professional implications of treating data as neutral or authoritative

Reading