Sample Header Ad - 728x90

How to convert this timestamp format to another format in Perl?

1 vote
2 answers
2660 views
I am trying to design a Perl/... approach which converts my timestamp format (ddMMyyyy-HHmm+0300) into the timestamp/time/... format (yyyy-MM-dd'T'HH:mm:00) used by WEKA data analysis system. I am initially making the WEKA data file from paste command and the removal of the first column with AWK. There should not be any limitations which would make the problem harder than it is actually, but possibly the quotes in the first variable. I think the approach (3) can be most feasible i.e. use directly POSIX::strftime function (Deathgrip) 1. Hard problem in Section 1 2. Easier approach without quotes in the data in Section 2 3. POSIX::strftime approach and similar thread Perl strptime format differs from strftime Example of the input 23072017-2200+0300 - Expected output 2017-07-23'T'22:00:00 Full example of CSV line without quotes but with underscores so can be harder Ni, Aika, Aika_l, Un, Unen, Unen_kesto, Uniluokat_R, Uniluokat_k, Uniluokat_s, HRV_RMSSD_a, HRV_RMSSD_i, Kokonaisp, Palautumisen_k, Hermoston_t, Syke_ave_m, Syke_a, Syke_l, Hengitystiheys_ave_m, Hengitystiheys_a, Hengitystiheys_min_a, Liikeaktiivisuus_l, Liikeaktiivisuus_a, Paivamaara_l "Masi", 23072010-2200+0300, 24072010-0600+0300, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010 "Masi", 23072010-2200+0300, 24072010-0600+0300, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010 Expected output Ni, Aika, Aika_l, Un, Unen, Unen_kesto, Uniluokat_R, Uniluokat_k, Uniluokat_s, HRV_RMSSD_a, HRV_RMSSD_i, Kokonaisp, Palautumisen_k, Hermoston_t, Syke_ave_m, Syke_a, Syke_l, Hengitystiheys_ave_m, Hengitystiheys_a, Hengitystiheys_min_a, Liikeaktiivisuus_l, Liikeaktiivisuus_a, Paivamaara_l "Masi", 2010-07-23'T‌​'22:00:00, 2010-07-24'T‌​'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010 "Masi", 2010-07-23'T‌​'22:00:00, 2010-07-24'T‌​'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010 1. Attempt script which you can call by script.pl filename --- I think the use of parser Text::CSV is too complicated because my data set is simpler than the use case. So I think a simple regex approach can be possible #!/usr/bin/env perl # https://stackoverflow.com/a/33995620/54964 ## Data prepared like this for the script # paste -d" " log.csv data.csv | awk '{$1=""; print $0}' > weka.data.csv # cp $HOME/Data/weka.data.csv $HOME/Workspace/ # # Maybe, this all could be integrated into Perl script use strict; use warnings; use Text::CSV; my $csv = Text::CSV->new( { binary => 1, eol => "\n" } ); while ( my $row = $csv->getline( \*ARGV ) ) { s/\n/ /g for @$row; $csv->print( \*STDOUT, $row ); # TODO regex #convert ddMMyyyy-HHmm+0300 to yyyy-MM-dd'T'HH:mm:00 } 2. Perl Regex approach --- Pseudocode where the approach cannot work because there are no variable replacements like carrying dd to the result # TODO s/ddMMyyyy-HHmm+0300/$3-$2-$1'T'$4:$5:00/; perl -pe s/([0-3][0-9])(([0-1][0-9]))(20[0-9]{2})([0-2][0-9])([0-5][0-9])+0300/$3-$2-$1'T'$4:$5:00/; where - dd by ([0-3][0-9]) / $3 - similarly for MM by ([0-1][0-9]) / $2 - yyyy similarly like (20[0-9]{2}) / $1 - - literally - HH 24H time by ([0-5][0-9]) / $4 - mm by ([0-5][0-9])) / $5 - +0300 / remove simply It would be great to have the regex in some more readable format. Testing Sundeep's proposal in comment --- Code #!/bin/bash # https://stackoverflow.com/a/33995620/54964 s='"Masi", 23072010-2200+0300, 24072010-0600+0300 70, 7h40' echo "$s" | perl -pe 's/\b(\d\d)(\d\d)(\d{4})-(\d\d)(\d\d)\+\d{4}\b/$3-$2-$1\x27T\x27$4:$5:00/g' y $csv = Text::CSV->new( { binary => 1, eol => "\n" } ); Output is as expected for one line "Masi", 2010-07-23'T‌​'22:00:00, 2010-07-24'T‌​'06:00:00, 70, 7h40 Applying on the complete line by just replacing the variable s content, output as expected "Masi", 2010-07-23'T‌​'22:00:00, 2010-07-24'T‌​'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010 TODO complete approach with multiline approach with capability to skip the header Testing Deathgrip's motivated proposal --- Code #!/usr/bin/env perl # https://stackoverflow.com/a/33995620/54964 use strict; use warnings; # https://stackoverflow.com/a/20007784/54964 # http://perldoc.perl.org/POSIX.html use Time::Piece; use POSIX; # TODO breaks because of false brackets #my $input = '"Masi", 2010-07-23'T'22:00:00, 2010-07-24'T'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010' my $str = '23072017-2200+0300'; my $f = '%d%m%dY-%H%M+0300'; #my $t = POSIX::strftime($str, $f); # fails! my $t = strftime($str, $f); # fails! print "$t\n"; Output Usage: POSIX::strftime(fmt, sec, min, hour, mday, mon, year, wday = -1, yday = -1, isdst = -1) at prepare.data3.pl line 22. OS: Debian 9
Asked by Léo Léopold Hertz 준영 (7138 rep)
Jul 24, 2017, 03:28 PM
Last activity: Aug 9, 2017, 04:02 PM