Excel Duration Formatting

Problem in Duration Formatting in Excel

Recently I wanted to perform some analysis using the YouTube channel data. So I browsed all the channels from where I wanted to get the data, copied and pasted data into my excel sheet. After performing all the clean operations I got a duration column that has information like this.

#Video NameDuration
2Video 213:42
3Video 308:44
4Video 416:51
6Video 610:01
7Video 711:42
8Video 805:18
9Video 908:31
10Video 1018:11
11Video 1112:30
12Video 1213:39
13Video 1315:32
14Video 1425:12:00
15Video 1509:03
16Video 1601:19:24
17Video 1718:16
18Video 1812:56
19Video 1914:36
20Video 2006:12
21Video 2115:44
22Video 2213:12
23Video 2316:50
24Video 2411:31
25Video 2521:59
26Video 2624:17:00

Now, I noted the format and found it is not HH:MM:SS format. The second thing I noted the column named duration should not be time. We know time cannot more than or equal 24 hours, but the duration can be 100 hours as well. And I know none of these YouTube videos are more than 5 hours. So why this corrupted data? Then I realized excel does not have anything called duration format. It has a DateTime format. You can put value in some excel cell and based on your format it will extract that information and show it in the cell. For example, if cell A1 = 01-Feb-2021 03:20:25 and you format it to MM:SS you will see only 20:55 in this cell. If you format it to DD:MM then you will see 01-02. It works as below.

DateFormatted ValueFormatting
01-02-21 03:20:2520:25MM:SS
01-02-21 03:20:2503:20HH:MM
01-02-21 03:20:2501:02DD:MM
01-02-21 03:20:2502:21MM:YY
01-02-21 03:20:2501:02DD:MM
01-02-21 03:20:2501-FebDD:MMM

My challenge was how to bring this duration into correct format. I searched 2 hours on google but could find any solution for this problem. Finally I came with my following steps to clean this mess.

How did I format time as a duration in Excel?

Column A: Serial Number 
Column B: Video Name 
Column C: Original Duration Column 
Column D: Copy Col C to Col D and replace : with |Without this you won’t be able to perform text operations mentioned in the next step
Column E: =IF(  LEN(D2)>5, IF(RIGHT(D2,2)=”00″,”00|”&MID(D2,1,5), D2),”00|”&D2)To make all data in same format HH:MM:SS. In previous column some place first 2 digits are hour and other place they are hour. Excel get confused with this. Apart from this excel consider this as date-time and start date is 01/01/1900
Column F: Copy values from col E to col F. Change format-> custom format-> HH|MM|SSSo that there is no formula and now your got pure duration in HH:MM:SS format. Now you can tell excel that | is timer format like :. If you directly skip this operation and perform Column F then you will not get duration properly.
Column G: Copy values from col F to col G. Change format-> custom format-> HH:MM:SSNow you can tell excel that time format is not |but format is  :
Column H: Replace 00:00:00 with blankOtherwise you will have unnecessary information when there is no duration.

All the steps mentioned above were implemented in a excel sheet and data is cleaned as below. You can download the excel file from github.

#Video NameDurationFinal
2Video 213:4200:13:42
3Video 308:4400:08:44
4Video 416:5100:16:51
6Video 610:0100:10:01
7Video 711:4200:11:42
8Video 805:1800:05:18
9Video 908:3100:08:31
10Video 1018:1100:18:11
11Video 1112:3000:12:30
12Video 1213:3900:13:39
13Video 1315:3200:15:32
14Video 1425:12:0000:25:12
15Video 1509:0300:09:03
16Video 1623:0500:23:05
17Video 1718:1600:18:16
18Video 1801:37:5301:37:53


Data cleaning is a very important step for any data science, machine learning, or data analysis project. Date, time, duration is very complicated to handle if you do not understand the intricacy of this. If you encounter a similar kind of issue, I hope this article will be able to help you in cleaning your data and save your precious time.

Please follow and like us:
Tweet 20

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *