Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

-1 votes

1 answers

165 views

How to force unlock a file on weka?

When I try to flock some file on a [weka file system](https://docs.weka.io/overview/about) it failed because some other client has already hold the lock, I can't find out which process/host locked it. How can I force the client to unlock the file on weka?

                                  When I try to flock some file on a [weka file system](https://docs.weka.io/overview/about)  it failed because some other client has already hold the lock, I can't find out which process/host locked it. How can I force the client to unlock the file on weka?
                                

konchy (119 rep)

Aug 30, 2023, 08:52 AM • Last activity: Sep 1, 2023, 01:13 PM

0 votes

1 answers

2208 views

How to standardise this nonstandard CSV file for Weka?

debian csv weka

I insert the data into the service [csvlint.io][1] because I get the error `22 Problem encountered on line 2` in Weka in trying to import CSV file the following way. `java -jar weka.jar` > Explorer > Preprocess > Open file > [select file format CSV] > [Choose CSV file] Similar error message is in th...

                                  I insert the data into the service csvlint.io  because I get the error 22 Problem encountered on line 2 in Weka in trying to import CSV file the following way. 

    java -jar weka.jar > Explorer > 
        Preprocess > Open file > [select file format CSV] 
        > [Choose CSV file]

Similar error message is in the thread Not recognised as an csv file in Weka  which I have before solved by inserting the data into LibreOffice, autofixing there and saving as CSV but I would like to find a commandline solution there. 
I get the following warning in the *csvlint.io* service from there although I have generated *Data* in Debian 9. 

> Structural problem: Non-standard Line Breaks on row 1
> 
> Your CSV appears to use LF line-breaks. While this will be fine in
> most cases, RFC 4180 specifies that CSV files should use CR-LF (a
> carriage-return and line-feed pair, e.g. \r\n). This may be labelled
> as "Windows line endings" on some systems.

*Data*

    Ni, Aika, Aika_l, Un, Unen, Unen_kesto, Uniluokat_R, Uniluokat_k, Uniluokat_s, HRV_RMSSD_a, HRV_RMSSD_i, Kokonaisp, Palautumisen_k, Hermoston_t, Syke_ave_m, Syke_a, Syke_l, Hengitystiheys_ave_m, Hengitystiheys_a, Hengitystiheys_min_a, Liikeaktiivisuus_l, Liikeaktiivisuus_a, Paivamaara_l
    "Masi", 23072010-2200+0300, 24072010-0600+0300, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
    "Masi", 23072010-2200+0300, 24072010-0600+0300, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010

To remove horizontal white space there, you can run tr -d "[:blank:]" on the data but it should not necessary. 
I think line-endings are not an issue here because fixing the file with dos2unix or unix2dos (meuh) does not solve the issue. 


OS: Debian 9   



                                

Léo Léopold Hertz 준영 (7138 rep)

Jul 25, 2017, 05:25 PM • Last activity: Sep 15, 2017, 02:40 PM

1 votes

2 answers

2660 views

How to convert this timestamp format to another format in Perl?

debian perl posix weka machine-learning

I am trying to design a Perl/... approach which converts my timestamp format (`ddMMyyyy-HHmm+0300`) into the timestamp/time/... format (`yyyy-MM-dd'T'HH:mm:00`) used by WEKA data analysis system. I am initially making the WEKA data file from `paste` command and the removal of the first column with `...

                                  I am trying to design a Perl/... approach which converts my timestamp format (ddMMyyyy-HHmm+0300) into the timestamp/time/... format (yyyy-MM-dd'T'HH:mm:00) used by WEKA data analysis system. 
I am initially making the WEKA data file from paste command and the removal of the first column with AWK. 
There should not be any limitations which would make the problem harder than it is actually, but possibly the quotes in the first variable. 
I think the approach (3) can be most feasible i.e. use directly POSIX::strftime function (Deathgrip)

1. Hard problem in Section 1 
2. Easier approach without quotes in the data in Section 2
3. POSIX::strftime approach and similar thread Perl strptime format differs from strftime 

Example of the input

    23072017-2200+0300

   - Expected output

        2017-07-23'T'22:00:00

Full example of CSV line without quotes but with underscores so can be harder

     Ni, Aika, Aika_l, Un, Unen, Unen_kesto, Uniluokat_R, Uniluokat_k, Uniluokat_s, HRV_RMSSD_a, HRV_RMSSD_i, Kokonaisp, Palautumisen_k, Hermoston_t, Syke_ave_m, Syke_a, Syke_l, Hengitystiheys_ave_m, Hengitystiheys_a, Hengitystiheys_min_a, Liikeaktiivisuus_l, Liikeaktiivisuus_a, Paivamaara_l
     "Masi", 23072010-2200+0300, 24072010-0600+0300, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
     "Masi", 23072010-2200+0300, 24072010-0600+0300, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010


Expected output


     Ni, Aika, Aika_l, Un, Unen, Unen_kesto, Uniluokat_R, Uniluokat_k, Uniluokat_s, HRV_RMSSD_a, HRV_RMSSD_i, Kokonaisp, Palautumisen_k, Hermoston_t, Syke_ave_m, Syke_a, Syke_l, Hengitystiheys_ave_m, Hengitystiheys_a, Hengitystiheys_min_a, Liikeaktiivisuus_l, Liikeaktiivisuus_a, Paivamaara_l
     "Masi", 2010-07-23'T‌​'22:00:00, 2010-07-24'T‌​'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
     "Masi", 2010-07-23'T‌​'22:00:00, 2010-07-24'T‌​'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010

1. Attempt script which you can call by script.pl filename 
---

I think the use of parser Text::CSV is too complicated because my data set is simpler than the use case. 
So I think a simple regex approach can be possible

    #!/usr/bin/env perl
    # https://stackoverflow.com/a/33995620/54964 
    
    ## Data prepared like this for the script
    # paste -d" " log.csv data.csv | awk '{$1=""; print $0}' > weka.data.csv
    # cp $HOME/Data/weka.data.csv $HOME/Workspace/
    #
    # Maybe, this all could be integrated into Perl script

    use strict;
    use warnings;
    
    use Text::CSV;
    
    my $csv = Text::CSV->new( { binary => 1, eol => "\n" } );
    
    while ( my $row = $csv->getline( \*ARGV ) ) {
        s/\n/ /g for @$row;
        $csv->print( \*STDOUT, $row );
 
        # TODO regex
        #convert ddMMyyyy-HHmm+0300 to yyyy-MM-dd'T'HH:mm:00    
    }

2. Perl Regex approach
---

Pseudocode where the approach cannot work because there are no variable replacements like carrying dd to the result

    # TODO s/ddMMyyyy-HHmm+0300/$3-$2-$1'T'$4:$5:00/;
    perl -pe s/([0-3][0-9])(([0-1][0-9]))(20[0-9]{2})([0-2][0-9])([0-5][0-9])+0300/$3-$2-$1'T'$4:$5:00/;

where

- dd by ([0-3][0-9]) / $3
- similarly for MM by ([0-1][0-9]) / $2
- yyyy similarly like (20[0-9]{2}) / $1
- - literally
- HH 24H time by ([0-5][0-9]) / $4
- mm by ([0-5][0-9])) / $5 
- +0300 / remove simply

It would be great to have the regex in some more readable format. 

Testing Sundeep's proposal in comment
---

Code 

    #!/bin/bash
    # https://stackoverflow.com/a/33995620/54964 
        
    s='"Masi", 23072010-2200+0300, 24072010-0600+0300 70, 7h40'
    
    echo "$s" | perl -pe 's/\b(\d\d)(\d\d)(\d{4})-(\d\d)(\d\d)\+\d{4}\b/$3-$2-$1\x27T\x27$4:$5:00/g' y $csv = Text::CSV->new( { binary => 1, eol => "\n" } );

Output is as expected for one line

    "Masi", 2010-07-23'T‌​'22:00:00, 2010-07-24'T‌​'06:00:00, 70, 7h40

Applying on the complete line by just replacing the variable s content, output as expected
 
    "Masi", 2010-07-23'T‌​'22:00:00, 2010-07-24'T‌​'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010

TODO complete approach with multiline approach with capability to skip the header

Testing Deathgrip's motivated proposal
---

Code

    #!/usr/bin/env perl
    # https://stackoverflow.com/a/33995620/54964 
        
    use strict;
    use warnings;
    # https://stackoverflow.com/a/20007784/54964 
    # http://perldoc.perl.org/POSIX.html 
    use Time::Piece;
    use POSIX;
    
    # TODO breaks because of false brackets
    #my $input = '"Masi", 2010-07-23'T'22:00:00, 2010-07-24'T'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010'
    
    my $str = '23072017-2200+0300';
    my $f = '%d%m%dY-%H%M+0300';
    #my $t = POSIX::strftime($str, $f); # fails!
    my $t = strftime($str, $f); # fails!
    
    print "$t\n";


Output

    Usage: POSIX::strftime(fmt, sec, min, hour, mday, mon, year, wday = -1, yday = -1, isdst = -1) at prepare.data3.pl line 22.
    

OS: Debian 9    
                                

Léo Léopold Hertz 준영 (7138 rep)

Jul 24, 2017, 03:28 PM • Last activity: Aug 9, 2017, 04:02 PM

Showing page 1 of 3 total questions