Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

2 votes

1 answers

2243 views

Can I record sound until silence OR a maximum length of recording?

Looking at [detecting sound until some silence occurs][1], I arrived at the command `rec recording.flac rate 32k silence -l 1 0.1 3% 1 3.0 3%`. I realize my specific use would be somewhat different: I do want to record until some silence is detected, but I also want an upper limit, say 10-15 seconds...

                                  Looking at detecting sound until some silence occurs , I arrived at the command rec recording.flac rate 32k silence -l 1 0.1 3% 1 3.0 3%.

I realize my specific use would be somewhat different: I do want to record until some silence is detected, but I also want an upper limit, say 10-15 seconds, of how long the recording will go on before moving on. I can just prepend a timeout 15s command, which would give me a maximum speech time of (15 seconds - leading silence, which will vary), but is there some way to tell sox I only need the first x seconds of a recording, which would give me a maximum speech time of 15 secs regardless of leading silence?

Niklas Raatikainen (21 rep)

Nov 13, 2015, 11:10 AM • Last activity: Jul 27, 2025, 10:04 PM

9 votes

4 answers

13021 views

How to create the spectrogram of many audio files efficiently with Sox?

audio sox

I have a bunch of audio files and I would like to create the spectrogram for each individual file using Sox. Usually, for a single file, I do this: sox audiofile.flac -n spectrogram However I don't know how to extend this method to more than one file. Ideally I would like my output `.png` file to ha...

                                  I have a bunch of audio files and I would like to create the spectrogram for each individual file using Sox. Usually, for a single file, I do this:

    sox audiofile.flac -n spectrogram

However I don't know how to extend this method to more than one file. Ideally I would like my output .png file to have a  filename associated to its respective audio file; for example audiofile1.png for audiofile1.flac, audiofile2.png for audiofile2.flac and so on.

Does anybody know how to do this?

Carl Rojas (1139 rep)

May 3, 2015, 09:19 PM • Last activity: May 1, 2025, 10:20 PM

0 votes

0 answers

60 views

sox split by silence incorrectly detects lengths of segments

audio sox

I'm relying on [this guidance][1] to split a WAV file by silent segments, preserving all silence. My problem is that sox incorrectly detects the length of segments by approximately an order of magnitude. For example, using the following command, I get segments that only have a second or two of silen...

                                  I'm relying on this guidance  to split a WAV file by silent segments, preserving all silence. My problem is that sox incorrectly detects the length of segments by approximately an order of magnitude. For example, using the following command, I get segments that only have a second or two of silence between them, rather than ten seconds I've specified:

    sox input.wav output.wav silence -l  0   1 10.0 0.1%: newfile : restart

In order to only have splits occur where there are longer silent segments, I have been trying higher values just by trial and error -- something like 90.0 seconds appears to approximate an actual 5-7 seconds of silence.

I assume this must be something about lack of timing information in the input file, which was created by capturing sound from an ALSA loopback device and this command:

    sox -t alsa hw:1,1 -c 2 input.wav

Is there something different I can do with my workflow, either at the capture stage or the split stage, so that sox correctly identifies the length of silent segments?

Adam J. Kessel (81 rep)

Mar 13, 2025, 07:22 PM

1 votes

0 answers

66 views

How to make Microsoft Teams use a virtual microphone

audio pipewire sox microsoft-teams

My goal is to alter my voice in a particular way and redirect it to MS Teams. I managed to pipe my voice from my headphones microphone to a virtual microphone named `wh40k` like this: `parec -d $PHYSICAL_MIC | sox $FILTERS | pacat -d $VIRTUAL_MIC`. In detail, this is my code: VIRTUAL_MIC="wh40k" # H...

                                  My goal is to alter my voice in a particular way and redirect it to MS Teams.

I managed to pipe my voice from my headphones microphone to a virtual microphone named wh40k like this: parec -d $PHYSICAL_MIC | sox $FILTERS | pacat -d $VIRTUAL_MIC.

In detail, this is my code:



    VIRTUAL_MIC="wh40k"
    # Hard-coded headphones microphone name which I got from pactl list
    PHYSICAL_MIC="alsa_output.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__Headphones__sink"
    
    # Check for required software: quit if any not found
    for CMD in parec sox pacat; do
        if ! command -v "$CMD" &>/dev/null; then
            echo "Error: $CMD not found.."
            exit 1
        fi
    done
    
    cleanup() {
        echo "Removing virtual microphone $VIRTUAL_MIC..."
        pactl unload-module "$MODULE_ID"
        exit 0
    }
    
    if ! pactl list sinks short | grep -q "$VIRTUAL_MIC"; then
        echo "Creating virtual microphone '$VIRTUAL_MIC'..."
        MODULE_ID=$(pactl load-module module-null-sink \
            sink_name=$VIRTUAL_MIC \
            sink_properties=device.description="$VIRTUAL_MIC")
        if [[ -z "$MODULE_ID" ]]; then
            echo "Error: could not create virtual microphone '$VIRTUAL_MIC'."
            exit 1
        fi
    else
        echo "Virtual microphone '$VIRTUAL_MIC' already exists."
    fi
    
    trap cleanup INT TERM
    
    echo -e "\nConfigurated sinks:"
    pactl list sinks short
    
    echo -e "\nConfigured sources (monitor):"
    pactl list sources short
    
    echo -e "\nConfiguring '$VIRTUAL_MIC' audio pipeline..."

    # $PITCH and other filters below are vars like "pitch -750"
    parec -d "$PHYSICAL_MIC" --raw --format=s16le --rate=44100 --channels=1 | \
        sox -t raw -r 44100 -e signed -b 16 -c 1 - -t raw - \
            $PITCH $OVERDRIVE $GAIN $REVERB $EQUALIZER $BASS $TREBLE $CHORUS $ECHO | \
                pacat --raw --device="$VIRTUAL_MIC".monitor

I can confirm that parec | sox works because I tested it with parec | sox | pacat and sox filters were correctly applied.

Now I can't use wh40k virt mic on MS Teams: when I select MS Teams > Settings > Devices > Microphone: Monitor of wh40k I can't hear my voice from the Teams test call.

Any clue on how could I troubleshoot this?

My versions:
 - MS Teams 1.5.00.23861 (64-bit)
 - Linux Mint 21 Kernel 
 - 6.8.0-40-generic
 - pipewire 1.2.7
                                

elmazzun (169 rep)

Jan 2, 2025, 09:59 PM • Last activity: Jan 5, 2025, 01:21 PM

2 votes

1 answers

871 views

sox gives inconsistent results mixing stereo to mono

audio sox

Whichever of the following ways of mixing stereo to mono I try *repeatedly*, the resulting files are always different (md5 hashes do not match): ``` sox stereo.wav -c 1 mono.wav sox stereo.wav mono.wav remix 1,2 sox stereo.wav mono.wav remix 1-2 ``` The differences appear to be in the binary meat of...

Whichever of the following ways of mixing stereo to mono I try *repeatedly*, the resulting files are always different (md5 hashes do not match):

sox stereo.wav -c 1 mono.wav
sox stereo.wav mono.wav remix 1,2
sox stereo.wav mono.wav remix 1-2

The differences appear to be in the binary meat of the audio, not in the headers. If I use say ffmpeg, the resulting file is always the same:

ffmpeg -hide_banner -i stereo.wav -ac 1 mono.wav

Does sox use some sort of randomness in the mixing-down algorithm? Why?

Greendrake (459 rep)

Jan 19, 2022, 09:33 AM • Last activity: Sep 3, 2024, 12:56 PM

8 votes

3 answers

19984 views

sox returns an error when I try to handle mp3 files

mp3 sox

Hello so here is the deal, I used: $ yum install sox To install it in CentOS 6. After that I did a quick test: $ sox test.mp3 test.amr and this is what it returns: $ sox formats: no handler for file extension `mp3' I need this done with `sox` not `lame` because I will need to use it for mixing and o...

                                  Hello so here is the deal, I used: 

    $ yum install sox 

To install it in CentOS 6. After that I did a quick test:

    $ sox test.mp3 test.amr

and this is what it returns:

    $ sox formats: no handler for file extension `mp3'

I need this done with sox not lame because I will need to use it for mixing and other functions not available with lame.

cppit (181 rep)

Nov 2, 2013, 07:56 AM • Last activity: Feb 5, 2024, 06:52 PM

0 votes

1 answers

95 views

Playing a sound every 15 minutes at a specific volume

audio volume sox

I have monitor speakers that go into a sleep state when they don't receive a certain level of sound input. The speakers often turn off when I am watching/listening to something quietly. To remedy this, I have a `crontab` set to play a 10Hz wave file every 15 minutes using Sox Play: */15 * * * * XDG_...

                                  I have monitor speakers that go into a sleep state when they don't receive a certain level of sound input. 
The speakers often turn off when I am watching/listening to something quietly.

To remedy this, I have a crontab set to play a 10Hz wave file every 15 minutes using Sox Play:

    */15 * * * * XDG_RUNTIME_DIR=/run/user/1000 /usr/bin/play -v 1.25 /home/USER/rmas/10Hz.wav

This works much of them time, but I have to be sure to have my system volume up around 40+% and my application volume super low.

This is due to the -v (--volume) option being **relative** to system volume.

I need a way to do this in an absolute fashion,
where the sound is played at 100% (or a user-specified variable) volume without affecting the sound level of anything else. 
And I do not want any windows popping up
(action should occur completely in the background).

Am I able to do this with Sox? or what is another route I could take?

* Pop!_OS 22.04 -
* KRK Rokit 5 Monitors (Gen 2) -
* Scarlett 2i2 USB Audio Interface routing sound to speakers.
                                

keenwa (1 rep)

Dec 12, 2023, 11:07 AM • Last activity: Dec 14, 2023, 10:17 AM

0 votes

0 answers

1108 views

Low-latency realtime sound filtering with Pulseaudio and Sox

linux audio pulseaudio sox

I'm using Linux for audio experimentation, so PulseAudio and ALSA. I'm having trouble achieving a consistent low latency. I had an idea that I could cover up unwanted environmental noises (such as sirens or backup alarms) by using my computer to create noise in the same part of the frequency spectru...

                                  I'm using Linux for audio experimentation, so PulseAudio and ALSA. I'm having trouble achieving a consistent low latency.

I had an idea that I could cover up unwanted environmental noises (such as sirens or backup alarms) by using my computer to create noise in the same part of the frequency spectrum.

A very simplistic way of doing this is to multiply input samples by some power-law noise such as "pink noise" (1/f) or "brown noise" (1/f^2) and play the result out of a speaker. I think this corresponds to a convolution in the frequency domain, so it should have the effect of making frequency spikes wider and less annoying.

I'm not a big fan of PulseAudio, but it is the standard application-level audio framework in Linux, and it seems to be the easiest tool which is able to do variable-rate resampling. Resampling is used to correct clock skew when working with multiple devices (in this case microphone and speakers). I got some advice for reducing latency [for PulseAudio here](https://juho.tykkala.fi/Pulseaudio-and-latency)  and [for Unix pipes here](https://unix.stackexchange.com/a/730565/118662) .

I have a Sox command which implements the filter effect I want, but I can't figure out how to get PulseAudio's input and output to have a predictable latency. The following simplified (Zsh) pipe command just sends samples directly from the microphone to speakers, but sometimes when I run it the subjective latency is almost negligible, and sometimes the latency is near 500ms (for example if I snap my fingers in front of the microphone, I might hear it immediately on some runs, and on other runs it'll echo twice a second). These differences occur when I just restart the pipe; I don't have to restart the PulseAudio server.

    PFMT=(--rate 48000 --format s16le --channels 1)
    pacat -r --latency-msec=1 $PFMT | pacat --latency-msec=1 $PFMT

I tried putting stdbuf -o64 -i64 before each pacat, in case the problem was caused by the Unix pipe buffer, but this doesn't seem to change the behavior.

I can always kill the pipe and restart it, and keep repeating until I get the pipe started up with low latency, but it would be nice to have a solution that works every time. I can't figure out from the PulseAudio logs what the difference is between the high-latency runs and the low-latency runs.

From a low-latency run (the first line is a virtual "monitor" source):

    $ (pactl list sources; pactl list sinks) | grep Latency
    Latency: 0 usec, configured 1999818 usec
    Latency: 4193 usec, configured 66000 usec
    Latency: 2861 usec, configured 15012 usec

From a high-latency run:

    $ (pactl list sources; pactl list sinks) | grep Latency
    Latency: 0 usec, configured 1999818 usec
    Latency: 505 usec, configured 66000 usec
    Latency: 3305 usec, configured 15012 usec

Here are some relevant lines in the PulseAudio config, which I copied from Internet advice. I'm not sure that any of them are having an effect.

    # .config/pulse/daemon.conf
    ;; https://forums.linuxmint.com/viewtopic.php?t=44862 
    default-fragments = 2
    default-fragment-size-msec = 5

    high-priority = yes
    rlimit-nice = 31
    nice-level = -11
    realtime-scheduling = yes
    rlimit-rtprio = 9
    realtime-priority = 9

I'm running a version of PulseAudio which is a few years old, so please let me know if I'm running into some known bug that has already been fixed.

Here is the full noise-multiplying command (Zsh again) that I want to run, it suffers from the same unpredictable latency problem as the simple pipe above. It is not really relevant to the latency problem I'm currently encountering, except that this is why I'm not just using PulseAudio's module-loopback to route samples from the source to the sink.

    SFMT=(-e signed -r 48000 -b 16 -c 1 -t raw)
    PFMT=(--rate 48000 --format s16le --channels 1)
    STDB=(stdbuf -o64 -i64)
    sox -n $SFMT - synth brownnoise vol 0.01 | sox --buffer 64 -T $SFMT - $SFMT ($STDB pacat --latency-msec=1 $PFMT) vol 100

Thanks.

----
Update, 5 December 2023:

To answer some questions in the comments about my audio setup and about ALSA. The input is a USB microphone "JMTek, LLC. USB PnP Audio Device", output is my laptop's built-in 3.5mm audio jack, via an AMD audio controller.

If I use ALSA (aplay, arecord) then it seems to achieve low latency more consistently, but I get strange messages like "underrun!!! (at least 806.752 ms long)" even when I use -B to shorten the buffer size to 50 microseconds. Also, unlike with Pulse, the apparent latency with ALSA sometimes will gradually increase over several minutes (from say 10ms to 100ms). Like Pulse, ALSA also has the problem of random latency changes from one invocation to another - sometimes I get 400ms - but as I said it seems to be more often that the latency is small with ALSA. Here is the shell code I use to experiment with ALSA. Note that I'm reading and writing directly from/to the devices, without using the 'plug' PCM to change rates or channel counts.

    #!/bin/zsh
    SFMT=(-e signed -r 48000 -b 16 -c 1 -t raw)
    AFMT=(-r 48000 -f S16_LE -c 1)
    AOPT=(-B 50 $AFMT)
    STDB=(stdbuf -o64 -i64)
    sox -n $SFMT - synth brownnoise vol 0.01 | sox --buffer 64 -T $SFMT - $SFMT ($STDB aplay $AOPT -c 2 -Dhw:1) vol 300

Example output from the above ALSA experiment:

    Recording WAVE 'stdin' : Signed 16 bit Little Endian, Rate 48000 Hz, Mono
    Playing raw data 'stdin' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo
    underrun!!! (at least 18304.245 ms long)
    underrun!!! (at least 870598.858 ms long)
    underrun!!! (at least 241414.917 ms long)
    underrun!!! (at least 1.451 ms long)
    underrun!!! (at least 12.687 ms long)
    overrun!!! (at least 4.934 ms long)
    underrun!!! (at least 10.253 ms long)
    underrun!!! (at least 11.326 ms long)
    overrun!!! (at least 0.549 ms long)
    ...

                                

Metamorphic (1219 rep)

Dec 3, 2023, 10:42 PM • Last activity: Dec 6, 2023, 06:28 AM

1 votes

2 answers

1579 views

Detecting sound / silence on a sox pipe?

pipe sox

I am trying to keep a `sox` pipe input from a sound card open and execute a player commend only when there is sound in the pipe (without killing the pipe or using a file). This could be easily achieved with `sox silence 1 0.1 5% -1 0.1 5%` for files but when I use it for a pipe output it doesn't wor...

I am trying to keep a sox pipe input from a sound card open and execute a player commend only when there is sound in the pipe (without killing the pipe or using a file). This could be easily achieved with sox silence 1 0.1 5% -1 0.1 5% for files but when I use it for a pipe output it doesn't work. This is the sox record command I'm using

/bin/sox -V2 -q \
-r 48000 -b 16 -c 2 -t alsa hw:CARD=sndrpihifiberry,DEV=0 \
-t wav -r 44100 -b 16 -c 2 - \ 
silence 1 0.1 0.1% -1 2 0.5% \ 
> $streamFile &

I would like to attach and detach a player to the pipe only when there's a sound in the pipe. Something like:

while [ true ]; do 
  
        until [ WAIT FOR  SOUND ]; do
        
        TEST FOR SOUND IN THE PIPE
        
        done
        
        echo "Sound Detected starting @ $(date)" >> $log
        /usr/bin/player > $log

done

Any ideas?

nadigo (11 rep)

Jan 6, 2022, 11:46 PM • Last activity: Nov 28, 2023, 04:36 PM

0 votes

1 answers

164 views

Capture output from SOX "-n stat"

pipe stdout sox

I am trying to capture/pipe the output from the following: arecord -f S16_LE -qd 5 file && sox file -n stat output: Samples read: 8000 Length (seconds): 1.000000 Scaled by: 2147483647.0 Maximum amplitude: 0.992188 Minimum amplitude: -0.992188 Midline amplitude: 0.000000 Mean norm: 0.093221 Mean ampl...

                                  I am trying to capture/pipe the output from the following: 

arecord -f S16_LE -qd 5 file && sox file -n stat

output:
 
Samples read:              8000
Length (seconds):      1.000000
Scaled by:         2147483647.0
Maximum amplitude:     0.992188
Minimum amplitude:    -0.992188
Midline amplitude:     0.000000
Mean    norm:          0.093221
Mean    amplitude:    -0.015338
RMS     amplitude:     0.232947
Maximum delta:         0.617188
Minimum delta:         0.000000
Mean    delta:         0.001067
RMS     delta:         0.009643
Rough   frequency:           52
Volume adjustment:        1.008


I need to capture the data to convert to json. Issue is "SOX" seems to defy any method I would normally use to capture/pipe the stdout. Any recommendations?    
                                

Mark Miller (3 rep)

Nov 17, 2023, 05:08 PM • Last activity: Nov 17, 2023, 05:29 PM

2 votes

1 answers

1013 views

Why does `read` fail saying "read error: 0: Resource temporarily unavailable"?

bash read sox

**script** ``` #!/bin/bash -- # record from microphone rec --channels 1 /tmp/rec.sox trim 0.9 band 4k noiseprof /tmp/noiseprof && # convert to mp3 sox /tmp/rec.sox --compression 0.01 /tmp/rec.mp3 trim 0 -0.1 && # play recording to test for noise play /tmp/rec.mp3 && printf "\nRemove noise? " read re...

**script**

#!/bin/bash --

# record from microphone
rec --channels 1 /tmp/rec.sox trim 0.9 band 4k noiseprof /tmp/noiseprof &&


# convert to mp3
sox /tmp/rec.sox --compression 0.01 /tmp/rec.mp3 trim 0 -0.1 &&


# play recording to test for noise
play /tmp/rec.mp3 &&


printf "\nRemove noise? "
read reply


# If there's noise, remove it
if [[ $reply == "y" ]]; then
  sox /tmp/rec.sox --compression 0.01 /tmp/rec.mp3 trim 0 -0.1 noisered /tmp/noiseprof 0.1
  play /tmp/rec.mp3
fi

**Errors with**: read error: 0: Resource temporarily unavailable **But**, the script works if I use the -e flag on read to enable readline

Pound Hash (327 rep)

Nov 16, 2023, 10:34 PM • Last activity: Nov 17, 2023, 01:21 AM

5 votes

2 answers

4324 views

Piping Sox and FFMPEG together

pipe ffmpeg sox

I'm processing a variety of audio files in a bunch of different formats and I'd like to unify their format and configuration using FFMPEG and SoX. There are two steps to my process: 1. Convert the file, whatever it may originally be, to a PCM 16-bit little-endian WAV file: `ffmpeg -i input.wav -c:a...

                                  I'm processing a variety of audio files in a bunch of different formats and I'd like to unify their format and configuration using FFMPEG and SoX.

There are two steps to my process:

 1. Convert the file, whatever it may originally be, to a PCM 16-bit little-endian WAV file:   
    ffmpeg -i input.wav -c:a pcm_s16le output.wav
 2. Process the file in Sox to make it conform to the sample rate and channel count that we need:    
    sox input.wav output.flac channels 2 rate 44.1k

I'd ideally like to pipe these two commands together so as to avoid creating an unnecessary file. 

I'm having a lot of trouble actually getting the format to work properly, though. 

SoX complains that it needs to explicitly know the format of the incoming audio, which is something that I don't even know at execution time. I know the format of the PCM audio, but I'm not sure the channel count nor of the sample rate of the incoming audio. 

Is there a way to pipe these two commands together, or better, to only have to use one tool for the job? 

The reason I've used two tools rather than just trying to do it with one:

### FFMPEG ###
 * Not sure if there's a way to safely convert a mono audio stream to a stereo audio stream by duplicating the channels. (SoX does this natively.)
 * Not sure how to change sample rate. (SoX does this natively.)
 * Not sure how to output to FLAC using the best compression rate. 
 

### SoX ###
 * Not able to do audio format detection as well as FFMPEG does. If I have a file without an extension, SoX asks me to manually specify the format, which doesn't work at all for my application.
                                

Naftuli Kay (41346 rep)

Nov 5, 2013, 12:03 AM • Last activity: Nov 8, 2023, 10:20 AM

0 votes

1 answers

136 views

How to bulk edit wav files?

audio sox wav

I need to cut off the first say 3 seconds from a batch of wav files. Is there a way to do it from the command line or using a linux native program? Thanks.

                                  I need to cut off the first say 3 seconds from a batch of wav files.
Is there a way to do it from the command line or using a linux native program?

Thanks.

black-clover (383 rep)

Aug 21, 2023, 03:59 AM • Last activity: Sep 19, 2023, 10:38 PM

-1 votes

2 answers

358 views

How to fade out batch wav files?

sox

I need to apply a fade out of 2 seconds on a batch of wav files (with names including spaces like C 0 120-127.wav). The files length varies, but I need a fade out of 2 seconds from the end of each file, regardless of the file length. I know this can be done with sox, but the tutorials I checked incl...

                                  I need to apply a fade out of 2 seconds on a batch of wav files (with names including spaces like C 0 120-127.wav).

The files length varies, but I need a fade out of 2 seconds from the end of each file, regardless of the file length.

I know this can be done with sox, but the tutorials I checked include other actions (fade in or convert) or are for single files where you specify input and putput, and this is confusing (I'm a sox newbie).

I need the files names to remain the same, although piping the output to another folder is also acceptable.

Thanks.

black-clover (383 rep)

Sep 19, 2023, 05:38 PM • Last activity: Sep 19, 2023, 10:32 PM

1 votes

1 answers

72 views

How to extend sustain in batch wav files?

audio ffmpeg sox

I have a batch of sound samples which are too short (2.15 sec), and I want to extend the sustain to a total of about 10 seconds, meaning stretch the last 0.50 second of the file to 10 seconds. I can do this on each single file in audacity with paulstretch but was wondering if there's a way to do so...

                                  I have a batch of sound samples which are too short (2.15 sec), and I want to extend the sustain to a total of about 10 seconds, meaning stretch the last 0.50 second of the file to 10 seconds.
I can do this on each single file in audacity with paulstretch but was wondering if there's a way to do so in batch from the command line.

Here is the original sample:

original 

and here is the result I'd want:

stretched

black-clover (383 rep)

Sep 18, 2023, 06:09 PM • Last activity: Sep 18, 2023, 10:42 PM

1 votes

1 answers

70 views

Divide frequency of live audiostream from an ultrasound microphone

pulseaudio sox

I want to record ultrasound with a microphone, divide the frequency and listen to the transposed sound. I start with a normal microphone for tests. I tried sox with rec and play so far. It works, but the latency is 2 seconds. rec -r 48000 -c 1 -t wav - pitch -1000 | play -t wav - How can I improve t...

                                  I want to record ultrasound with a microphone, divide the frequency and listen to the transposed sound. I start with a normal microphone for tests.

I tried sox with rec and play so far. It works, but the latency is 2 seconds.

    rec -r 48000  -c 1 -t wav - pitch -1000  | play -t wav -

How can I improve this line and reduce the latency?

My audio setup is

     # API: ALSA v: k6.1.27-gentoo status: kernel-api
     # Server-1: PulseAudio v: 16.1 status: active

Jonas Stein (4298 rep)

Aug 7, 2023, 01:19 AM • Last activity: Aug 7, 2023, 05:48 PM

3 votes

0 answers

1297 views

How to record audio device with more than 2 channels using SoX?

linux pulseaudio ffmpeg sox pipewire

I have multiple USB audio interfaces that each have 10, 18 or even 32 input channels. Mainly used to record every instrument of a band into a separate track. I record in raw wav format (s32le @48kHz) which means I need crazy amounts of storage space if I record all channels. For that reason I need to only record the channels that I actually want to record. I found this to be possible using SoX by specifying the amount of channels I need with the -c flag and using the remix "effect" to select the channels to be recorded. And this little proof of concept shows me that it does work:

Bash
$ export SOURCE_NAME="alsa_input.usb-Behringer_FLOW_8_03-FF-02-11-55-44-00.Direct__hw_F8__source"

# Record only 1 channel(s) (-c 1) - The channel(s) to record: 2
$ sox -t pulseaudio "${SOURCE_NAME}" -r 48000 -c 1 -b 16 -e signed-integer output.w64 remix 2

Scaling it up however, doesn't work:

Bash
# Record only 4 channel(s) (-c 4) - The channel(s) to record: 1 2 6 8
$ sox -t pulseaudio "${SOURCE_NAME}" -r 48000 -c 4 -b 32 -e signed-integer output.w64 remix 1 2 6 8

For some reason SoX only recognizes the first two channels:

Input File     : 'alsa_input.usb-Behringer_FLOW_8_03-FF-02-11-55-44-00.Direct__hw_F8__source' (pulseaudio)
Channels       : 2
Sample Rate    : 48000
Precision      : 16-bit
Sample Encoding: 16-bit Signed Integer PCM

sox FAIL remix: too few input channels

FFmpeg also fails when recording more than 2 channels:

Bash
$ ffmpeg -f pulse -i "${SOURCE_NAME}" -c:a pcm_s32le -ar 48000 -ac 10 -channel_layout 0x3ff output.w64

FFmpeg throws this error:

Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, pulse, from 'alsa_input.usb-Behringer_FLOW_8_03-FF-02-11-55-44-00.Direct__hw_F8__source':
  Duration: N/A, start: 1689504465.730127, bitrate: 1536 kb/s
  Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
Multiple -ac options specified for stream 0, only the last option '-ac 10' will be used.
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s32le (native))
Press [q] to stop, [?] for help
[pcm_s32le @ 0x55bc4d7d6040] Channel layout '10 channels (FL+FR+FC+LFE+BL+BR+FLC+FRC+BC+SL)' with 10 channels does not match number of specified channels 2
Error initializing output stream 0:0 -- Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height
Conversion failed!

Double checking with ffmpeg probe:

$ ffprobe -f pulse -i "${SOURCE_NAME}"
Input #0, pulse, from 'alsa_input.usb-Behringer_FLOW_8_03-FF-02-11-55-44-00.Direct__hw_F8__source':
  Duration: N/A, start: 1689504633.181940, bitrate: 1536 kb/s
  Stream #0:0: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s

So my next thought was that PulseAudio itself has a bug.But we can easily check for that using the pactl utility:

$ pactl list sources
Source #1414
    ...
    Name: alsa_input.usb-Behringer_FLOW_8_03-FF-02-11-55-44-00.Direct__hw_F8__source
    ...
    Sample Specification: s32le 10ch 48000Hz
    Channel Map: aux0,aux1,aux2,aux3,aux4,aux5,aux6,aux7,aux8,aux9
    ...
    Volume: aux0: 48287 /  74% / -7.96 dB,   aux1: 48287 /  74% / -7.96 dB,   aux2: 48287 /  74% / -7.96 dB,   aux3: 48287 /  74% / -7.96 dB,   aux4: 48287 /  74% / -7.96 dB,   aux5: 48287 /  74% / -7.96 dB,   aux6: 48287 /  74% / -7.96 dB,   aux7: 48287 /  74% / -7.96 dB,   aux8: 48287 /  74% / -7.96 dB,   aux9: 48287 /  74% / -7.96 dB
            balance 0.00
    ...
    Properties:
        ...
        audio.channels = "10"
        ...

This makes it quite obvious that PulseAudio is aware of all 10 input channels of that USB audio interface. So I tried using PulseAudio's tool parecord:

$ parecord --device=${SOURCE_NAME} --format=s32le --rate=48000 --channels 10 --file-format=w64 output.w64
Warning: failed to write channel map to file.

and although it produced this warning (whatever it means), it did actually record all 10 channels successfully. I was even able to select specific channels like this:

parecord --device=${SOURCE_NAME} --format=s32le --rate=48000 --channels 4 --channel-map=aux0,aux1,aux5,aux7 --file-format=w64 output.w64

So why is this not working with SoX or FFmpeg? I also tried telling SoX to use ALSA instead, but that doesn't work at all:

$ sox -t alsa "plughw:CARD=F8,DEV=0" -r 48000 -c 4 -b 32 -e signed-integer output.w64 remix 1 2 6 8
sox FAIL formats: can't open input  `plughw:CARD=F8,DEV=0': snd_pcm_open error: Device or resource busy

I guess access via ALSA just doesn't work when you have PipeWire and PulseAudio running on top of it. I checked if I can record via ALSA's arecord utility, but I get the same "device busy" error:

$ arecord -D plughw:CARD=F8,DEV=0 -r 48000 -c 10 -f S32_LE -t wav output.wav
arecord: main:867: audio open error: Device or resource busy

Directly recording using PipeWire's pw-record utility worked just fine though btw:

$ pw-record --target ${SOURCE_NAME} --format s32 --rate 48000 --channels 10

And I was also able to select the channels I want to record:

$ pw-record --target ${SOURCE_NAME} --format s32 --rate 48000 --channels 4 --channel-map=aux0,aux1,aux5,aux7 output.w64

I looked into SoX and if it supports PipeWire directly, but that doesn't appear to be the case unfortunately. But since PulseAudio does actually see all channels, I don't understand why SoX and FFmpeg are failing here Any ideas?

Forivin (1193 rep)

Jul 16, 2023, 11:54 AM • Last activity: Jul 16, 2023, 02:44 PM

0 votes

1 answers

933 views

SoX - Mix original signal with effected signal

command-line audio sox

Is there an option in SoX effects processing to mix the wet and dry signals instead of only outputting the wet? For example, say my effects chain is overdrive into pitch shift: `sox in.wav out.wav overdrive 0.5 gain -0.5 pitch 700` Except I don't want the final file to be _just_ the shifted signal....

                                  Is there an option in SoX effects processing to mix the wet and dry signals instead of only outputting the wet?

For example, say my effects chain is overdrive into pitch shift:

sox in.wav out.wav overdrive 0.5 gain -0.5 pitch 700

Except I don't want the final file to be _just_ the shifted signal. I want a mix of the distorted, shifted signal and the distorted, unshifted signal.

Does SoX support this somehow?

vomitHatSteve (3 rep)

Oct 9, 2017, 10:13 PM • Last activity: May 2, 2023, 05:57 PM

1 votes

0 answers

294 views

Find the peak frequency in Khz of audio file using sox or python for mp3 flac and wav files

python3 sox

I wanted to batch analyze audio files for peak frequencies using python or SoX [![file.mp3 with constant bit rate][1]][1] [![file.mp3 with variable bit rate][2]][2] Above spectrogram are created using sox I wanted to extract peak frequency like in image 1: 20Khz [1]: https://i.sstatic.net/91a1u.png...

                                  I wanted to batch analyze audio files for peak frequencies  using python or SoX

Above spectrogram are created using sox
I wanted to extract peak frequency
like in image 1: 20Khz

Arjun (11 rep)

Jan 13, 2023, 05:17 PM

-1 votes

1 answers

288 views

sox: How To Trim Audio From a .mp4

audio sox

using sox how do I remove 0:20-0:25 of a .mp4 file. So far I have sox filename filename2 trim 20-25

                                  using sox how do I remove 0:20-0:25 of a .mp4 file.  So far I have 

    sox filename filename2 trim 20-25

mister mcdoogle (505 rep)

Jan 3, 2023, 05:23 AM • Last activity: Jan 3, 2023, 09:56 AM

Showing page 1 of 20 total questions