How to convert this timestamp format to another format in Perl?
1
vote
2
answers
2660
views
I am trying to design a Perl/... approach which converts my timestamp format (
ddMMyyyy-HHmm+0300
) into the timestamp/time/... format (yyyy-MM-dd'T'HH:mm:00
) used by WEKA data analysis system.
I am initially making the WEKA data file from paste
command and the removal of the first column with AWK
.
There should not be any limitations which would make the problem harder than it is actually, but possibly the quotes in the first variable.
I think the approach (3) can be most feasible i.e. use directly POSIX::strftime
function (Deathgrip)
1. Hard problem in Section 1
2. Easier approach without quotes in the data in Section 2
3. POSIX::strftime
approach and similar thread Perl strptime format differs from strftime
Example of the input
23072017-2200+0300
- Expected output
2017-07-23'T'22:00:00
Full example of CSV line without quotes but with underscores so can be harder
Ni, Aika, Aika_l, Un, Unen, Unen_kesto, Uniluokat_R, Uniluokat_k, Uniluokat_s, HRV_RMSSD_a, HRV_RMSSD_i, Kokonaisp, Palautumisen_k, Hermoston_t, Syke_ave_m, Syke_a, Syke_l, Hengitystiheys_ave_m, Hengitystiheys_a, Hengitystiheys_min_a, Liikeaktiivisuus_l, Liikeaktiivisuus_a, Paivamaara_l
"Masi", 23072010-2200+0300, 24072010-0600+0300, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
"Masi", 23072010-2200+0300, 24072010-0600+0300, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
Expected output
Ni, Aika, Aika_l, Un, Unen, Unen_kesto, Uniluokat_R, Uniluokat_k, Uniluokat_s, HRV_RMSSD_a, HRV_RMSSD_i, Kokonaisp, Palautumisen_k, Hermoston_t, Syke_ave_m, Syke_a, Syke_l, Hengitystiheys_ave_m, Hengitystiheys_a, Hengitystiheys_min_a, Liikeaktiivisuus_l, Liikeaktiivisuus_a, Paivamaara_l
"Masi", 2010-07-23'T'22:00:00, 2010-07-24'T'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
"Masi", 2010-07-23'T'22:00:00, 2010-07-24'T'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
1. Attempt script which you can call by script.pl filename
---
I think the use of parser Text::CSV
is too complicated because my data set is simpler than the use case.
So I think a simple regex approach can be possible
#!/usr/bin/env perl
# https://stackoverflow.com/a/33995620/54964
## Data prepared like this for the script
# paste -d" " log.csv data.csv | awk '{$1=""; print $0}' > weka.data.csv
# cp $HOME/Data/weka.data.csv $HOME/Workspace/
#
# Maybe, this all could be integrated into Perl script
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new( { binary => 1, eol => "\n" } );
while ( my $row = $csv->getline( \*ARGV ) ) {
s/\n/ /g for @$row;
$csv->print( \*STDOUT, $row );
# TODO regex
#convert ddMMyyyy-HHmm+0300 to yyyy-MM-dd'T'HH:mm:00
}
2. Perl Regex approach
---
Pseudocode where the approach cannot work because there are no variable replacements like carrying dd
to the result
# TODO s/ddMMyyyy-HHmm+0300/$3-$2-$1'T'$4:$5:00/;
perl -pe s/([0-3][0-9])(([0-1][0-9]))(20[0-9]{2})([0-2][0-9])([0-5][0-9])+0300/$3-$2-$1'T'$4:$5:00/;
where
- dd
by ([0-3][0-9])
/ $3
- similarly for MM
by ([0-1][0-9])
/ $2
- yyyy
similarly like (20[0-9]{2})
/ $1
- -
literally
- HH
24H time by ([0-5][0-9])
/ $4
- mm
by ([0-5][0-9])
) / $5
- +0300
/ remove simply
It would be great to have the regex in some more readable format.
Testing Sundeep's proposal in comment
---
Code
#!/bin/bash
# https://stackoverflow.com/a/33995620/54964
s='"Masi", 23072010-2200+0300, 24072010-0600+0300 70, 7h40'
echo "$s" | perl -pe 's/\b(\d\d)(\d\d)(\d{4})-(\d\d)(\d\d)\+\d{4}\b/$3-$2-$1\x27T\x27$4:$5:00/g' y $csv = Text::CSV->new( { binary => 1, eol => "\n" } );
Output is as expected for one line
"Masi", 2010-07-23'T'22:00:00, 2010-07-24'T'06:00:00, 70, 7h40
Applying on the complete line by just replacing the variable s
content, output as expected
"Masi", 2010-07-23'T'22:00:00, 2010-07-24'T'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
TODO complete approach with multiline approach with capability to skip the header
Testing Deathgrip's motivated proposal
---
Code
#!/usr/bin/env perl
# https://stackoverflow.com/a/33995620/54964
use strict;
use warnings;
# https://stackoverflow.com/a/20007784/54964
# http://perldoc.perl.org/POSIX.html
use Time::Piece;
use POSIX;
# TODO breaks because of false brackets
#my $input = '"Masi", 2010-07-23'T'22:00:00, 2010-07-24'T'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010'
my $str = '23072017-2200+0300';
my $f = '%d%m%dY-%H%M+0300';
#my $t = POSIX::strftime($str, $f); # fails!
my $t = strftime($str, $f); # fails!
print "$t\n";
Output
Usage: POSIX::strftime(fmt, sec, min, hour, mday, mon, year, wday = -1, yday = -1, isdst = -1) at prepare.data3.pl line 22.
OS: Debian 9
Asked by Léo Léopold Hertz 준영
(7138 rep)
Jul 24, 2017, 03:28 PM
Last activity: Aug 9, 2017, 04:02 PM
Last activity: Aug 9, 2017, 04:02 PM