Structured vs Unstructured vs Semi-Structured Data: Definitions and Examples

Structured Unstructured Semistructured Data

Data are all around us. Our smart phones, social media interactions, financial transactions, business interactions, and Fitbits all leave behind pieces of data. But have you ever thought about how this data is stored and accessed? This is where it can be handy to understand the difference between structured, unstructured, and semi-structured data types.

Understanding Structured vs Unstructured vs Semi-Structured Data

Data is often captured and stored so that it can be studied to make better products and services for us. Other times it can help us to make better decisions about our activity or consumption.

Understanding what can be done with the data that we obtain, and what storage requirements and tools might be applicable is important. We will discuss why below.

What is Structured Data?

If you’ve ever organized data in an Excel spreadsheet, then you are familiar with the concept of structured data. Data are presented in rows and columns.

An Example of Structured Data

In the table below, we present data organized in a table for 6 people (represented by ‘ID’), listed next to their age and their favorite fruit. Each data element is easy to detect. I could ask: What is person 1004’s favorite fruit? We would look at the row where ‘ID’ = 1004, and look across the table to identify the ‘Fruit’ column for this row and report that person 1004 likes oranges the best.

IDAGEFRUIT
100119apple
100221orange
100312grape
100473orange
100524lime
100655grape
This table is an example of structured data where Individual’s age and favorite fruit are organized by ID.

Structured Data Defined

Structured data is data that is organized and formatted so that they are easily searched and understood.

Common tools used to store structured data are spreadsheets and relational databases. In the simple example that we did above, we exercised the same logic that would be applied to query even larger sets of data. Database technologies and variations of the structured query language (SQL) have evolved to make lookups like this very efficient even when there are millions of rows in a table.

What is Unstructured Data

Unstructured data is a bit different, but I’ll bet that you have encountered it. If you’ve ever created a post on a social media site like Facebook, Instagram, Twitter, Pinterest or YouTube, then you have created unstructured data.

An Example of Unstructured Data

Below is a tweet from the Farnam Street twitter account. In it, you see lengthy text, a link to a website, and a few images that are rendered. This tweet is an example of unstructured data.

Unstructured Data Defined

Unstructured Data are data that are not organized in a pre-defined manner. This can include text, photos, music, video, social media posts, and the like.

If you can imagine, unstructured data can take up more space to store. Tools used to store unstructured data are typically NoSQL databases and Data Lakes.

What is Semi-Structured Data?

Now that we’ve covered structured and unstructured data, you might already have a suspicion as to what semi-structured data is. We’ll discuss it here anyway.

An Example of Semi-Structured Data

Take a look at the HTML text below. This is the code that was embedded to give us the Farnam Street tweet above.

<blockquote class=”twitter-tweet”><p lang=”en” dir=”ltr”>There are two main mindsets we can navigate life with: growth and fixed. Having a growth mindset is essential for success. In this post, we explore how to develop the right mindset for improving your intelligence.<a href=”https://t.co/e51YVkrZgo”>https://t.co/e51YVkrZgo</a></p>&mdash; Farnam Street (@farnamstreet) <a href=”https://twitter.com/farnamstreet/status/1357656980039606275?ref_src=twsrc%5Etfw”>February 5, 2021</a></blockquote> <script async src=”https://platform.twitter.com/widgets.js” charset=”utf-8″></script>

If you know HTML, then you will see a number of items that occur between brackets ‘< >’ called tags. The tags provide instruction to organize the HTML language. If you study the HTML above, you might see the text of the tweet, the link to the blog post, the Farnam Street twitter handle, the date, etc.

In addition to HTML, other markup languages (like XML or JSON) are examples of semi-structured data.

Semi-Structured Data Defined

Semi-structured data has components of both structured and unstructured data. From a structured perspective, there is some organization (like the HTML tags in the example above), but there are unstructured elements as well (like the body text of the tweet).

Semi-structured data can be stored in different formats. Individual elements may even be mapped to a relational database.

Conclusion

Now we know what structured, unstructured, and semi-structured data are with a concrete example of each type of data.

You may want to check out the types of structured data in our next post. Happy learning!

You Might Also Like

Leave a Reply