Problem in Duration Formatting in Excel
Recently I wanted to perform some analysis using the YouTube channel data. So I browsed all the channels from where I wanted to get the data, copied and pasted data into my excel sheet. After performing all the clean operations I got a duration column that has information like this.
Now, I noted the format and found it is not HH:MM:SS format. The second thing I noted the column named duration should not be time. We know time cannot more than or equal 24 hours, but the duration can be 100 hours as well. And I know none of these YouTube videos are more than 5 hours. So why this corrupted data? Then I realized excel does not have anything called duration format. It has a DateTime format. You can put value in some excel cell and based on your format it will extract that information and show it in the cell. For example, if cell A1 = 01-Feb-2021 03:20:25 and you format it to MM:SS you will see only 20:55 in this cell. If you format it to DD:MM then you will see 01-02. It works as below.
My challenge was how to bring this duration into correct format. I searched 2 hours on google but could find any solution for this problem. Finally I came with my following steps to clean this mess.
How did I format time as a duration in Excel?
|Column A: Serial Number|
|Column B: Video Name|
|Column C: Original Duration Column|
|Column D: Copy Col C to Col D and replace : with |||Without this you won’t be able to perform text operations mentioned in the next step|
|Column E: =IF( LEN(D2)>5, IF(RIGHT(D2,2)=”00″,”00|”&MID(D2,1,5), D2),”00|”&D2)||To make all data in same format HH:MM:SS. In previous column some place first 2 digits are hour and other place they are hour. Excel get confused with this. Apart from this excel consider this as date-time and start date is 01/01/1900|
|Column F: Copy values from col E to col F. Change format-> custom format-> HH|MM|SS||So that there is no formula and now your got pure duration in HH:MM:SS format. Now you can tell excel that | is timer format like :. If you directly skip this operation and perform Column F then you will not get duration properly.|
|Column G: Copy values from col F to col G. Change format-> custom format-> HH:MM:SS||Now you can tell excel that time format is not |but format is :|
|Column H: Replace 00:00:00 with blank||Otherwise you will have unnecessary information when there is no duration.|
All the steps mentioned above were implemented in a excel sheet and data is cleaned as below. You can download the excel file from github.
Data cleaning is a very important step for any data science, machine learning, or data analysis project. Date, time, duration is very complicated to handle if you do not understand the intricacy of this. If you encounter a similar kind of issue, I hope this article will be able to help you in cleaning your data and save your precious time.