Unix & Linux Stack Exchange
Q&A for users of Linux, FreeBSD and other Unix-like operating systems
Latest Questions
0
votes
5
answers
1060
views
command-line tool to sum the values in a column of a CSV file
I am looking for a command-line tool to calculate the sum of the values in a specified column of a CSV file. (**Update**: The CSV file might have quoted fields, so a simple solution just to break on a delimiter (',') does not work.) Given the following sample CSV file: ``` description A,description...
I am looking for a command-line tool to calculate the sum of the values in a specified column of a CSV file. (**Update**: The CSV file might have quoted fields, so a simple solution just to break on a delimiter (',') does not work.)
Given the following sample CSV file:
description A,description B,data 1, data 2
fruit,"banana,apple",3,17
veggie,cauliflower,7,18
animal,"fish,meat",9,22
I want to build the sum, for example, over the column data 1
with the result **19**.
I have tried to use [csvkit] for this but didn't get very far. Are there other command-lien tools specialised in this CSV operation?
halloleo
(649 rep)
Jul 2, 2024, 01:41 AM
• Last activity: Apr 23, 2025, 03:04 PM
0
votes
2
answers
1296
views
CSV fields max length error and setting quoting=csv.QUOTE_NONE
After running `csvcut` on a comma-delimited .csv file: [root@server files]# csvcut -c title,mpn,overview,techspecs2,image_carousel_elargesrc syn_multi-image.csv > syn_scraped_cut.csv I get the error: > CSV contains fields longer than maximum length of 131072 characters. > Try raising the maximum wit...
After running
csvcut
on a comma-delimited .csv file:
[root@server files]# csvcut -c title,mpn,overview,techspecs2,image_carousel_elargesrc syn_multi-image.csv > syn_scraped_cut.csv
I get the error:
> CSV contains fields longer than maximum length of 131072 characters.
> Try raising the maximum with the field_size_limit parameter, or try
> setting quoting=csv.QUOTE_NONE.
Though large, I can tell you for sure that my longest field is only 65535 characters long, which is under the maximum allowed length by a pretty safe margin.
I have no idea what setting quoting=csv.QUOTE_NONE
refers to. I have only been using simple csvkit commands and that is all I know.
Reading similar threads and answers such as [here](https://stackoverflow.com/questions/15063936/csv-error-field-larger-than-field-limit-131072) and [here](https://stackoverflow.com/a/18408911/9095603) , I am unable to extract any kind of solution in the context of csvkit, specifically. I'm not adept at programming in general and am limited to using csvkit, its commands and options.
How do I fix this error?
ptrcao
(5995 rep)
Jul 25, 2019, 11:14 PM
• Last activity: Dec 17, 2023, 10:32 PM
7
votes
2
answers
515
views
Deduplicate CSV rows based on a specific column, with a CSV parser
I searched for this task, and found the following older questions: - https://unix.stackexchange.com/questions/681059/removing-duplicates-from-a-csv-based-on-specified-columns - https://unix.stackexchange.com/questions/444476/identify-unique-records-on-csv-based-on-specific-columns But I can't use `a...
I searched for this task, and found the following older questions:
- https://unix.stackexchange.com/questions/681059/removing-duplicates-from-a-csv-based-on-specified-columns
- https://unix.stackexchange.com/questions/444476/identify-unique-records-on-csv-based-on-specific-columns
But I can't use
awk
because my data is a complex CSV file with multiple nested double quotes.
Let's say I want to deduplicate the following (simplified case):
Ref,xxx,zzz
ref1,"foo, bar, base",qux
ref1,"foo, bar, base",bar
ref2,aaa,bbb
In the output I need it as follows:
Ref,xxx,zzz
ref1,"foo, bar, base",qux
ref2,aaa,bbb
No awk
solution, please, only with any CSV parser.
I tried the following:
mlr --csv uniq -a -g Ref file.csv
But it's an error.
Mévatlavé Kraspek
(541 rep)
May 29, 2023, 01:34 AM
• Last activity: May 29, 2023, 06:50 PM
6
votes
3
answers
513
views
Truncate an CSV column using CsvKit
How can I truncate the length of a column using CSVKit? The definition looks like this: * Column 1: no length restriction * Column 2: This should properly handle escaped (quoted) columns and new lines. For example: ``` First Header,Second Header foo, foo,b foo,bar foo,"bar" foo,"""bar" foo," bar" ``...
How can I truncate the length of a column using CSVKit?
The definition looks like this:
* Column 1: no length restriction
* Column 2:
This should properly handle escaped (quoted) columns and new lines.
For example:
First Header,Second Header
foo,
foo,b
foo,bar
foo,"bar"
foo,"""bar"
foo,"
bar"
should become
First Header,Second Header
foo,
foo,b
foo,ba
foo,ba
foo,"""b"
foo,"
b"
patstuart
(163 rep)
Jul 14, 2022, 06:10 PM
• Last activity: Jul 15, 2022, 07:44 AM
3
votes
2
answers
4044
views
how to install csvkit in bash
Kusalananda nicely recommends using `csvformat` from [csvkit](https://csvkit.readthedocs.io/en/latest/) to format `jq` `@csv` into a csv format without double quotes `"` [answering how to parse json with jq](https://unix.stackexchange.com/a/506790/530603). This answer does not seem to involve the us...
Kusalananda nicely recommends using
csvformat
from [csvkit](https://csvkit.readthedocs.io/en/latest/) to format jq
@csv
into a csv format without double quotes "
[answering how to parse json with jq](https://unix.stackexchange.com/a/506790/530603) .
This answer does not seem to involve the use of python. But the csvkit [installation tutorial](https://csvkit.readthedocs.io/en/latest/tutorial/1_getting_started.html#installing-csvkit) and its [installation troubleshooting](https://csvkit.readthedocs.io/en/latest/tricks.html#troubleshooting) do seem to rely on, perhaps require, the use of python. This makes me, a newbie, confused:
Is it possible to install csvkit in git bash without using python (read: open spyder or anaconda, let's say)? How?
**Edit.** MINGW64 (git bash) displays bash: pip: command not found
. Same for conda
.
How do you recommend moving on from there?
python is installed, pip.exe being in ...\Anaconda\Scripts
. There are several suggested solutions on other sites e.g. in various ways adding the dir of pip.exe to PATH [here](https://www.stackoverflow.com/questions/6318156/adding-python-to-path-on-windows) and [here](https://www.stackoverflow.com/question/32597209/python-not-working-in-the-command-line-of-git-bash)) .
Johan
(439 rep)
Jun 20, 2022, 05:40 PM
• Last activity: Jun 22, 2022, 12:17 PM
1
votes
1
answers
152
views
How can I separate these two columns in this csv file in Linux/Bash?
I am looking to separate these two columns, each into their own separate text files. This data is from a csv file on Kaggle that contains Titanic passenger data. The first column is the number of passengers, and the second column is the age of those passengers I.e. 10 one year olds, 12 two year olds...
I am looking to separate these two columns, each into their own separate text files. This data is from a csv file on Kaggle that contains Titanic passenger data. The first column is the number of passengers, and the second column is the age of those passengers I.e. 10 one year olds, 12 two year olds, etc . I want to separate these and put them into a simple graph in the command line.I have used csvkit so far to manipulate the data set. Thanks! I am new to Linux and this is my first dabble into tapping into the community!
10 1
12 2
7 3
10 4
5 5
6 6
4 7
6 8
10 9
4 10
Tyler Young
(11 rep)
May 8, 2021, 08:50 PM
• Last activity: May 8, 2021, 09:09 PM
0
votes
2
answers
2262
views
Concatenating columns of the same csv file to create a new column with a new heading
What I have is a CSV file to this effect: +------------+--------------+ | Category I | Sub-Category | +------------+--------------+ | 1144 | 128 | | 1144 | 128 | | 1000 | 100 | | 1001 | 100 | | 1002 | 100 | | 1002 | 100 | | 1011 | 102 | | 1011 | 102 | | 1011 | 102 | | 1011 | 102 | | 1011 | 102 | | 1...
What I have is a CSV file to this effect:
+------------+--------------+
| Category I | Sub-Category |
+------------+--------------+
| 1144 | 128 |
| 1144 | 128 |
| 1000 | 100 |
| 1001 | 100 |
| 1002 | 100 |
| 1002 | 100 |
| 1011 | 102 |
| 1011 | 102 |
| 1011 | 102 |
| 1011 | 102 |
| 1011 | 102 |
| 1011 | 102 |
| 1013 | 103 |
| 1013 | 103 |
| 1013 | 103 |
| 1013 | 103 |
| 1013 | 103 |
| 1013 | 103 |
| 1013 | 103 |
+------------+--------------+
I wish to concatenate the first and second columns above to form a third, new column with a new arbitrary heading, to this effect:
+-------------+--------------+-----------------------+
| Category ID | Sub-Category | Arbitrary New Heading |
+-------------+--------------+-----------------------+
| 1144 | 128 | 1144128 |
| 1144 | 128 | 1144128 |
| 1000 | 100 | 1000100 |
| 1001 | 100 | 1001100 |
| 1002 | 100 | 1002100 |
| 1002 | 100 | 1002100 |
| 1011 | 102 | 1011102 |
| 1011 | 102 | 1011102 |
| 1011 | 102 | 1011102 |
| 1011 | 102 | 1011102 |
| 1011 | 102 | 1011102 |
| 1011 | 102 | 1011102 |
| 1013 | 103 | 1013103 |
| 1013 | 103 | 1013103 |
| 1013 | 103 | 1013103 |
| 1013 | 103 | 1013103 |
| 1013 | 103 | 1013103 |
| 1013 | 103 | 1013103 |
| 1013 | 103 | 1013103 |
+-------------+--------------+-----------------------+
My usual go-to utility, csvkit does not have the means to achieve this, afaik - see https://github.com/wireservice/csvkit/issues/930 .
What is a simple solution not requiring advanced programming knowledge, which can achieve this?
I'm vaguely aware of awk and sed as potential solutions, but I don't want to limit the enquiry to those just in case there is a better (i.e. simpler) solution.
The solution must be efficient for very large files, i.e containing 120,000+ lines.
Edit: I have included the sample data for the convenience of those wanting to take a crack at it; download here: https://www.dropbox.com/s/achtyxg7qi1629k/category-subcat-test.csv?dl=0
ptrcao
(5995 rep)
Dec 21, 2019, 09:56 AM
• Last activity: Dec 25, 2019, 06:24 AM
0
votes
1
answers
142
views
Syntactical error with csvsql query?
I have a csv file `attributes.csv` from which I want to retrieve all records to a new file `attributes_withoutPIDate.csv` excluding records for which the `Name` column has "PI Date" as the value. Commanding `csvsql` in this manner csvsql -d ',' -I --query 'select * where Name "PI Date" from attribut...
I have a csv file
attributes.csv
from which I want to retrieve all records to a new file attributes_withoutPIDate.csv
excluding records for which the Name
column has "PI Date" as the value.
Commanding csvsql
in this manner
csvsql -d ',' -I --query 'select * where Name "PI Date" from attributes' attributes.csv > attributes_withoutPIDate.csv
yields an error
(sqlite3.OperationalError) near "from": syntax error
[SQL: select * where Name "PI Date" from attributes]
(Background on this error at: http://sqlalche.me/e/e3q8)
I suspect a syntactical error. Can someone advise how to fix it?
ptrcao
(5995 rep)
Dec 22, 2019, 04:35 AM
• Last activity: Dec 23, 2019, 03:45 AM
0
votes
1
answers
1109
views
How to write a csvcut script to cut column by header with multiple files?
Since `csvcut` (from [`csvkit`](http://csvkit.readthedocs.io/)) does not take more than a single file at a time, I need to write a script to process multiple files using it. The first parameter should be the delimiter, the second parameter should be the header of the column to extract, and remaining...
Since
csvcut
(from [csvkit
](http://csvkit.readthedocs.io/)) does not take more than a single file at a time, I need to write a script to process multiple files using it.
The first parameter should be the delimiter, the second parameter should be the header of the column to extract, and remaining arguments are the filenames.
If the file names are missing, the script should standard input.
It should be something like this
csvcut ';' Measure calories.csv
I'm not really familiar with csvkit
. Can anyone help?
amV
(75 rep)
Aug 12, 2019, 07:46 AM
• Last activity: Aug 12, 2019, 09:18 AM
Showing page 1 of 9 total questions