Unix & Linux Stack Exchange
Q&A for users of Linux, FreeBSD and other Unix-like operating systems
Latest Questions
1
votes
3
answers
226
views
How could I (painlessly) split or reverse "Last, First" within a record in Miller?
I have a tab-delimited file where one of the columns is in the format "LastName, FirstName". What I want to do is split that record out into two separate columns, `last`, and `first`, use `cut` or some other verb(s) on _that_, and output the result to JSON. I should add that I'm not married to JSON,...
I have a tab-delimited file where one of the columns is in the format "LastName, FirstName". What I want to do is split that record out into two separate columns,
last
, and first
, use cut
or some other verb(s) on _that_, and output the result to JSON.
I should add that I'm not married to JSON, and I know how to use other tools like [jq
](https://github.com/stedolan/jq) , but it would be nice to get it in that format in one step.
The syntax for the nest
verb looks like it requires memorizing a lot of frankly non-memorable options, so I figured that there would be a simple DSL operation to do this job. Maybe that's not the case?
Here's what I've tried. (Let's just forget about the extra space that's attached to Firstname
right now, OK? I would use strip
or ssub
or something to get rid of that later.)
echo -e "last_first\nLastName, Firstname" \
| mlr --t2j put '$o=splitnv($last_first,",")'
# result:
# { "last_first": "LastName, Firstname", "o": "(error)" }
# expected something like:
# { "last_first": "LastName, Firstname", "o": { 1: "LastName", 2: "Firstname" } }
#
# or:
# { "last_first": "LastName, Firstname", "o": [ "LastName", "Firstname" ] }
Why (error)
? Is it not reasonable that assigning to $o
as above would assign a new column o
to the result of splitnv
?
Here's something else I tried that didn't work like I would've expected either:
echo -e "last_first\nLastName, Firstname" \
| mlr -T nest --explode --values --across-fields --nested-fs , -f last_first
# result (no delimiter here, just one field, confirmed w/ 'cat -A')
# last_first
# LastName, Firstname
# expected:
# last_first_1last_first_2
# LastName, Firstname
**Edit**: The problem with the command above is I should've used --tsv
, **not** -T
, which is a synonym for --nidx --fs tab
(numerically-indexed columns). Problem is, Miller doesn't produce an error message when it's obviously wrong to ask for named columns in that case, which might be a mis-feature; see [issue #233](https://github.com/johnkerl/miller/issues/233) .
Any insight would be appreciated.
Kevin E
(540 rep)
Mar 7, 2019, 10:52 AM
• Last activity: Mar 10, 2025, 10:05 PM
3
votes
5
answers
508
views
How to create a new column and add a random identifier to it with miller
I want to add a column with a randomly created "case number" to my `csv` file. The first 2 letters of the casenumber must be any letter from A-Z in capitals. followed by 5 random numbers. input: ``` COMPANY,NAME,STREET,ZIP,CITY,IBAN Test Ltd,John,Big Ben 343,4343,London,UK2348020384 Test Ltd,Kate,Bi...
I want to add a column with a randomly created "case number" to my
csv
file. The first 2 letters of the casenumber must be any letter from A-Z in capitals. followed by 5 random numbers.
input:
COMPANY,NAME,STREET,ZIP,CITY,IBAN
Test Ltd,John,Big Ben 343,4343,London,UK2348020384
Test Ltd,Kate,Big Ben 343,4343,London,UK4389223892
Test Ltd,Jake,Big Ben 343,4343,London,UK3892898999
output
COMPANY,NAME,STREET,ZIP,CITY,IBAN,CASENUMBER
Test Ltd,John,Big Ben 343,4343,London,UK2348020384,IN84903
Test Ltd,Kate,Big Ben 343,4343,London,UK4389223892,TY93842
Test Ltd,Jake,Big Ben 343,4343,London,UK3892898999,OL34307
How to do this with miller? I have the following command ready
mlr -I --csv put '${CASENUMBER}=xxx' then \
reorder -f COMPANY,NAME,STREET,ZIP,CITY,IBAN,CASENUMBER input/input.csv
What to add to the above command exactly?
pwrsheller
(357 rep)
Jan 29, 2024, 09:07 AM
• Last activity: Jan 30, 2024, 01:18 PM
0
votes
3
answers
342
views
Convert lower-case to uppercase with the output to a new column using miller
I want to copy column `NAME` to column `NAME-LOWERCASE`. `NAME-LOWERCASE` should only contain lowercase letters. The uppercase should be untouched in all column expect `NAME-LOWERCASE` input ``` NAME,test PTC,N Agri,Y E-example,N ForYou,N Willy Nes,Y ``` output ``` NAME,NAME-LOWERCASE,test PTC,ptc,N...
I want to copy column
NAME
to column NAME-LOWERCASE
. NAME-LOWERCASE
should only contain lowercase letters. The uppercase should be untouched in all column expect NAME-LOWERCASE
input
NAME,test
PTC,N
Agri,Y
E-example,N
ForYou,N
Willy Nes,Y
output
NAME,NAME-LOWERCASE,test
PTC,ptc,N
Agri,agri,Y
E-example,E-example,N
ForYou,foryou,N
Willy Nes,willy nes,Y
I know how to create a new column from another column and reorder
mlr -I --csv \
put '$FIRSTNAME = sub($FULLNAME," .*","")' then \
reorder -f FULLNAME,LASTNAME,EMAIL,DOMAIN,COMPANY input.csv
And I know how to convert upper to lowercase
mlr --csv -N case -l
How to combine both commands? Or there is another miller command to reach my goal?
pwrsheller
(357 rep)
Sep 19, 2023, 07:04 AM
• Last activity: Oct 11, 2023, 01:38 PM
0
votes
1
answers
203
views
Forcing miller to read data as string in conversion to JSON
In the following MWE echo x="1e2" | mlr --ojson cat my intention is for miller to generate a one-element JSON array containing the object {"x": "1e2"} The object actually returned (within the array) is instead {"x": 1e2} where the value is taken as a number, I guess that as a consequence of its pars...
In the following MWE
echo x="1e2" | mlr --ojson cat
my intention is for miller to generate a one-element JSON array containing the object
{"x": "1e2"}
The object actually returned (within the array) is instead
{"x": 1e2}
where the value is taken as a number, I guess that as a consequence of its parsing. How can I tell miller to generate the JSON object with a string for its value rather than a number? (The rationale underlying the quotation marks around '1e2' in the MWE is precisely to highlight this intention.)
Marcos
(103 rep)
Jul 31, 2023, 02:35 AM
• Last activity: Aug 25, 2023, 07:27 PM
7
votes
2
answers
515
views
Deduplicate CSV rows based on a specific column, with a CSV parser
I searched for this task, and found the following older questions: - https://unix.stackexchange.com/questions/681059/removing-duplicates-from-a-csv-based-on-specified-columns - https://unix.stackexchange.com/questions/444476/identify-unique-records-on-csv-based-on-specific-columns But I can't use `a...
I searched for this task, and found the following older questions:
- https://unix.stackexchange.com/questions/681059/removing-duplicates-from-a-csv-based-on-specified-columns
- https://unix.stackexchange.com/questions/444476/identify-unique-records-on-csv-based-on-specific-columns
But I can't use
awk
because my data is a complex CSV file with multiple nested double quotes.
Let's say I want to deduplicate the following (simplified case):
Ref,xxx,zzz
ref1,"foo, bar, base",qux
ref1,"foo, bar, base",bar
ref2,aaa,bbb
In the output I need it as follows:
Ref,xxx,zzz
ref1,"foo, bar, base",qux
ref2,aaa,bbb
No awk
solution, please, only with any CSV parser.
I tried the following:
mlr --csv uniq -a -g Ref file.csv
But it's an error.
Mévatlavé Kraspek
(541 rep)
May 29, 2023, 01:34 AM
• Last activity: May 29, 2023, 06:50 PM
0
votes
1
answers
412
views
Adding an empty column to a CSV file with Miller
I have a CSV file that looks like this: ``` 0 1 2 3 ``` I'd like to use Miller to append an empty column `x` to every row so that the output file looks like this: ``` 0,x 1, 2, 3, ``` How do I do that?
I have a CSV file that looks like this:
0
1
2
3
I'd like to use Miller to append an empty column x
to every row so that the output file looks like this:
0,x
1,
2,
3,
How do I do that?
Mateusz Piotrowski
(4983 rep)
Jan 26, 2023, 04:53 PM
0
votes
1
answers
261
views
Finding whether a string is a substring of another with Miller/mlr's DSL
How do I find whether a column of a CSV contains another using mlr's DSL? In other words I have a CSV ``` a,b test and,test and more ``` and want to find out whether `'test and'` (a) is included in `'test and more'` (b)
How do I find whether a column of a CSV contains another using mlr's DSL?
In other words I have a CSV
a,b
test and,test and more
and want to find out whether 'test and'
(a) is included in 'test and more'
(b)
E Lisse
(1 rep)
Jan 14, 2023, 06:18 PM
• Last activity: Jan 15, 2023, 08:58 AM
-2
votes
1
answers
830
views
Separating CSV (one column) into many columns on delimiter (comma)
I have a CSV with ~50 comma-separated values in one column that I want to separate into separate columns. The header is line 1. This should be really simple, and I've tried a lot surrounding `awk` and `mlr` but haven't been able to adapt anything I've seen in order to separate a single column into m...
I have a CSV with ~50 comma-separated values in one column that I want to separate into separate columns. The header is line 1. This should be really simple, and I've tried a lot surrounding
awk
and mlr
but haven't been able to adapt anything I've seen in order to separate a single column into many columns using a comma as a delimiter.
My process:
1. I used mlr
to combine hundreds of CSVs into one CSV:
mlr --icsv cat *.csv > filename.txt
mlr --ocsv unsparsify filename.txt > filename.csv
2. Now I have a CSV with one column; in that column are ~50 comma-separated values that I want to explode into many columns.
user536893
(5 rep)
Aug 9, 2022, 02:44 AM
• Last activity: Aug 9, 2022, 06:30 AM
2
votes
3
answers
191
views
Extracting domains from a CSV URL column using Miller
Having CSV content similar to this: ```csv Family,URL,IP,FirstSeen Pony,http://officeman.tk/images/admin.php,207.180.230.128,01-06-2019 Pony,http://learn.cloudience.com/ojekwaeng/yugo/admin.php,192.145.234.108,01-06-2019 Pony,http://vman23.com/ba24/admin.php,95.213.204.53,01-06-2019 ``` I'm aware th...
Having CSV content similar to this:
Family,URL,IP,FirstSeen
Pony,http://officeman.tk/images/admin.php,207.180.230.128,01-06-2019
Pony,http://learn.cloudience.com/ojekwaeng/yugo/admin.php,192.145.234.108,01-06-2019
Pony,http://vman23.com/ba24/admin.php,95.213.204.53,01-06-2019
I'm aware that the URL
column can be selected using:
mlr --mmap --csv --skip-comments --headerless-csv-output cut -f 'URL'
How could domains be extracted using Miller w/out piping to other commands?
**Desired Output:**
officeman.tk
learn.cloudience.com
vman23.com
T145
(223 rep)
Jul 6, 2021, 01:30 AM
• Last activity: Jul 6, 2021, 03:40 PM
6
votes
1
answers
922
views
How can I call an external command from within Miller (mlr)’s DSL?
Suppose I have the following CSV: ``` $ cat test.csv id,domain 1,foo.com 2,bar.com ``` Using `mlr put`, I can easily map any function over a field in the CSV, as long as I can define it in the Miller [DSL][1]. So, for example, `mlr --csv put '$id = $id + 1'` will increment the `id` by 1 for each rec...
Suppose I have the following CSV:
$ cat test.csv
id,domain
1,foo.com
2,bar.com
Using mlr put
, I can easily map any function over a field in the CSV, as long as I can define it in the Miller DSL . So, for example, mlr --csv put '$id = $id + 1'
will increment the id
by 1 for each record.
But what if I can’t define the function in Miller’s DSL, perhaps because it is not pure? Suppose I wanted to map each domain in the CSV to an IP address. I’d like to do something like mlr --csv put '$ip = shell("nslookup $domain")
. Is there an easy way to do this?
Currently I am extracting the input field into a separate file, rewriting it in a separate shell script, and adding the result back in with mlr join
. However, this is pretty messy, because my CSV is full of quoted commas and newlines, which I need to carefully handle myself rather than relying on Miller.
sjy
(956 rep)
Jan 29, 2019, 08:12 AM
• Last activity: Nov 28, 2020, 12:50 PM
5
votes
3
answers
729
views
Output a header label in data field in miller
Given *file.csv*: a,b,c 1,2,3 How can [`mlr`][1] be made to output: a,b,c 1,2,c Using the label name of `$c` *without* knowing in advance that `$c` contains the letter "*c*"? --- Note: correct answer must use `mlr` only. [1]: https://github.com/johnkerl/miller
Given *file.csv*:
a,b,c
1,2,3
How can
mlr
be made to output:
a,b,c
1,2,c
Using the label name of $c
*without* knowing in advance that $c
contains the letter "*c*"?
---
Note: correct answer must use mlr
only.
agc
(7353 rep)
Mar 14, 2018, 03:31 PM
• Last activity: Jul 1, 2020, 07:30 AM
Showing page 1 of 11 total questions