Structured vs Unstructured vs Semi-Structured Data: Definitions and Examples
Data are all around us. Our smart phones, social media interactions, financial transactions, business interactions, and Fitbits all leave behind pieces of data. But have you ever thought about how this data is stored and accessed? This is where it can be handy to understand the difference between structured, unstructured, and semi-structured data types.
Understanding Structured vs Unstructured vs Semi-Structured Data
Data is often captured and stored so that it can be studied to make better products and services for us. Other times it can help us to make better decisions about our activity or consumption.
Understanding what can be done with the data that we obtain, and what storage requirements and tools might be applicable is important. We will discuss why below.
What is Structured Data?
If you’ve ever organized data in an Excel spreadsheet, then you are familiar with the concept of structured data. Data are presented in rows and columns.
An Example of Structured Data
In the table below, we present data organized in a table for 6 people (represented by ‘ID’), listed next to their age and their favorite fruit. Each data element is easy to detect. I could ask: What is person 1004’s favorite fruit? We would look at the row where ‘ID’ = 1004, and look across the table to identify the ‘Fruit’ column for this row and report that person 1004 likes oranges the best.
ID | AGE | FRUIT |
1001 | 19 | apple |
1002 | 21 | orange |
1003 | 12 | grape |
1004 | 73 | orange |
1005 | 24 | lime |
1006 | 55 | grape |
Structured Data Defined
Structured data is data that is organized and formatted so that they are easily searched and understood.
Common tools used to store structured data are spreadsheets and relational databases. In the simple example that we did above, we exercised the same logic that would be applied to query even larger sets of data. Database technologies and variations of the structured query language (SQL) have evolved to make lookups like this very efficient even when there are millions of rows in a table.
What is Unstructured Data
Unstructured data is a bit different, but I’ll bet that you have encountered it. If you’ve ever created a post on a social media site like Facebook, Instagram, Twitter, Pinterest or YouTube, then you have created unstructured data.
An Example of Unstructured Data
Below is a tweet from the Farnam Street twitter account. In it, you see lengthy text, a link to a website, and a few images that are rendered. This tweet is an example of unstructured data.
There are two main mindsets we can navigate life with: growth and fixed. Having a growth mindset is essential for success. In this post, we explore how to develop the right mindset for improving your intelligence.https://t.co/e51YVkrZgo
— Farnam Street (@farnamstreet) February 5, 2021
Unstructured Data Defined
Unstructured Data are data that are not organized in a pre-defined manner. This can include text, photos, music, video, social media posts, and the like.
If you can imagine, unstructured data can take up more space to store. Tools used to store unstructured data are typically NoSQL databases and Data Lakes.
What is Semi-Structured Data?
Now that we’ve covered structured and unstructured data, you might already have a suspicion as to what semi-structured data is. We’ll discuss it here anyway.
An Example of Semi-Structured Data
Take a look at the HTML text below. This is the code that was embedded to give us the Farnam Street tweet above.
<blockquote class=”twitter-tweet”><p lang=”en” dir=”ltr”>There are two main mindsets we can navigate life with: growth and fixed. Having a growth mindset is essential for success. In this post, we explore how to develop the right mindset for improving your intelligence.<a href=”https://t.co/e51YVkrZgo”>https://t.co/e51YVkrZgo</a></p>— Farnam Street (@farnamstreet) <a href=”https://twitter.com/farnamstreet/status/1357656980039606275?ref_src=twsrc%5Etfw”>February 5, 2021</a></blockquote> <script async src=”https://platform.twitter.com/widgets.js” charset=”utf-8″></script>
If you know HTML, then you will see a number of items that occur between brackets ‘< >’ called tags. The tags provide instruction to organize the HTML language. If you study the HTML above, you might see the text of the tweet, the link to the blog post, the Farnam Street twitter handle, the date, etc.
In addition to HTML, other markup languages (like XML or JSON) are examples of semi-structured data.
Semi-Structured Data Defined
Semi-structured data has components of both structured and unstructured data. From a structured perspective, there is some organization (like the HTML tags in the example above), but there are unstructured elements as well (like the body text of the tweet).
Semi-structured data can be stored in different formats. Individual elements may even be mapped to a relational database.
Conclusion
Now we know what structured, unstructured, and semi-structured data are with a concrete example of each type of data.
You may want to check out the types of structured data in our next post. Happy learning!
Leave a Reply