How to convert this timestamp format to another format in Perl?

1 vote
2 answers
2660 views
                          I am trying to design a Perl/... approach which converts my timestamp format (ddMMyyyy-HHmm+0300) into the timestamp/time/... format (yyyy-MM-dd'T'HH:mm:00) used by WEKA data analysis system. 
I am initially making the WEKA data file from paste command and the removal of the first column with AWK. 
There should not be any limitations which would make the problem harder than it is actually, but possibly the quotes in the first variable. 
I think the approach (3) can be most feasible i.e. use directly POSIX::strftime function (Deathgrip)

1. Hard problem in Section 1 
2. Easier approach without quotes in the data in Section 2
3. POSIX::strftime approach and similar thread Perl strptime format differs from strftime 

Example of the input

    23072017-2200+0300

   - Expected output

        2017-07-23'T'22:00:00

Full example of CSV line without quotes but with underscores so can be harder

     Ni, Aika, Aika_l, Un, Unen, Unen_kesto, Uniluokat_R, Uniluokat_k, Uniluokat_s, HRV_RMSSD_a, HRV_RMSSD_i, Kokonaisp, Palautumisen_k, Hermoston_t, Syke_ave_m, Syke_a, Syke_l, Hengitystiheys_ave_m, Hengitystiheys_a, Hengitystiheys_min_a, Liikeaktiivisuus_l, Liikeaktiivisuus_a, Paivamaara_l
     "Masi", 23072010-2200+0300, 24072010-0600+0300, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
     "Masi", 23072010-2200+0300, 24072010-0600+0300, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010


Expected output


     Ni, Aika, Aika_l, Un, Unen, Unen_kesto, Uniluokat_R, Uniluokat_k, Uniluokat_s, HRV_RMSSD_a, HRV_RMSSD_i, Kokonaisp, Palautumisen_k, Hermoston_t, Syke_ave_m, Syke_a, Syke_l, Hengitystiheys_ave_m, Hengitystiheys_a, Hengitystiheys_min_a, Liikeaktiivisuus_l, Liikeaktiivisuus_a, Paivamaara_l
     "Masi", 2010-07-23'T‌​'22:00:00, 2010-07-24'T‌​'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
     "Masi", 2010-07-23'T‌​'22:00:00, 2010-07-24'T‌​'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010

1. Attempt script which you can call by script.pl filename 
---

I think the use of parser Text::CSV is too complicated because my data set is simpler than the use case. 
So I think a simple regex approach can be possible

    #!/usr/bin/env perl
    # https://stackoverflow.com/a/33995620/54964 
    
    ## Data prepared like this for the script
    # paste -d" " log.csv data.csv | awk '{$1=""; print $0}' > weka.data.csv
    # cp $HOME/Data/weka.data.csv $HOME/Workspace/
    #
    # Maybe, this all could be integrated into Perl script

    use strict;
    use warnings;
    
    use Text::CSV;
    
    my $csv = Text::CSV->new( { binary => 1, eol => "\n" } );
    
    while ( my $row = $csv->getline( \*ARGV ) ) {
        s/\n/ /g for @$row;
        $csv->print( \*STDOUT, $row );
 
        # TODO regex
        #convert ddMMyyyy-HHmm+0300 to yyyy-MM-dd'T'HH:mm:00    
    }

2. Perl Regex approach
---

Pseudocode where the approach cannot work because there are no variable replacements like carrying dd to the result

    # TODO s/ddMMyyyy-HHmm+0300/$3-$2-$1'T'$4:$5:00/;
    perl -pe s/([0-3][0-9])(([0-1][0-9]))(20[0-9]{2})([0-2][0-9])([0-5][0-9])+0300/$3-$2-$1'T'$4:$5:00/;

where

- dd by ([0-3][0-9]) / $3
- similarly for MM by ([0-1][0-9]) / $2
- yyyy similarly like (20[0-9]{2}) / $1
- - literally
- HH 24H time by ([0-5][0-9]) / $4
- mm by ([0-5][0-9])) / $5 
- +0300 / remove simply

It would be great to have the regex in some more readable format. 

Testing Sundeep's proposal in comment
---

Code 

    #!/bin/bash
    # https://stackoverflow.com/a/33995620/54964 
        
    s='"Masi", 23072010-2200+0300, 24072010-0600+0300 70, 7h40'
    
    echo "$s" | perl -pe 's/\b(\d\d)(\d\d)(\d{4})-(\d\d)(\d\d)\+\d{4}\b/$3-$2-$1\x27T\x27$4:$5:00/g' y $csv = Text::CSV->new( { binary => 1, eol => "\n" } );

Output is as expected for one line

    "Masi", 2010-07-23'T‌​'22:00:00, 2010-07-24'T‌​'06:00:00, 70, 7h40

Applying on the complete line by just replacing the variable s content, output as expected
 
    "Masi", 2010-07-23'T‌​'22:00:00, 2010-07-24'T‌​'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010

TODO complete approach with multiline approach with capability to skip the header

Testing Deathgrip's motivated proposal
---

Code

    #!/usr/bin/env perl
    # https://stackoverflow.com/a/33995620/54964 
        
    use strict;
    use warnings;
    # https://stackoverflow.com/a/20007784/54964 
    # http://perldoc.perl.org/POSIX.html 
    use Time::Piece;
    use POSIX;
    
    # TODO breaks because of false brackets
    #my $input = '"Masi", 2010-07-23'T'22:00:00, 2010-07-24'T'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010'
    
    my $str = '23072017-2200+0300';
    my $f = '%d%m%dY-%H%M+0300';
    #my $t = POSIX::strftime($str, $f); # fails!
    my $t = strftime($str, $f); # fails!
    
    print "$t\n";


Output

    Usage: POSIX::strftime(fmt, sec, min, hour, mday, mon, year, wday = -1, yday = -1, isdst = -1) at prepare.data3.pl line 22.
    

OS: Debian 9    
                        
Asked by Léo Léopold Hertz 준영 (7138 rep)
Jul 24, 2017, 03:28 PM
Last activity: Aug 9, 2017, 04:02 PM
How to convert this timestamp format to another format in Perl?

Related Questions