Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

0 votes

1 answers

1666 views

If I have a json string how do I calculate the number of bytes needed when stored?

java json storage character-encoding byte

I have a json string formatted displayed in a web page. What I am trying to understand is what is the size in terms of bytes that this json string requires. If I copy and pipe to `wc -c` I get `1000` which is the number of characters but I don't think that this means that the json string is 1000 byt...

                                  I have a json string formatted displayed in a web page. What I am trying to understand is what is the size in terms of bytes that this json string requires.  
If I copy and pipe to wc -c I get 1000 which is the number of characters but I don't think that this means that the json string is 1000 bytes as I have seen googling around as a suggestion.  
The reason I am confused is the following:  
In java for instance a String is composed of char and each char is 2 bytes to support utf-8. Json also supports utf-8 so I am not sure if I should consider that the size of the json string requires 2000 bytes or what is a way to figure this out? 
                                

Jim (1479 rep)

Feb 22, 2024, 06:27 PM • Last activity: Mar 1, 2024, 09:17 PM

3 votes

1 answers

1058 views

What is "length" of a string in Bourne shell compatibles' `${#string}`?

bash zsh unicode bourne-shell byte

Arising from [this](https://unix.stackexchange.com/questions/685602/count-bytes-of-filename/685603?noredirect=1#comment1295723_685603) discussion: When I have (zsh 5.8, bash 5.1.0) ```shell var="ASCII" echo "${var} has the length ${#var}, and is $(printf "%s" "$var"| wc -c) bytes long" ``` the answe...

Arising from [this](https://unix.stackexchange.com/questions/685602/count-bytes-of-filename/685603?noredirect=1#comment1295723_685603) discussion: When I have (zsh 5.8, bash 5.1.0)

var="ASCII"
echo "${var} has the length ${#var}, and is $(printf "%s" "$var"| wc -c) bytes long"

the answer is simple: these are 5 characters, occupying five bytes. Now, var=Müller yields

Müller has the length 6, and is 7 bytes long

Which suggests the ${#} operator counts codepoints, not bytes. This is a bit unclear [in POSIX](https://pubs.opengroup.org/onlinepubs/9699919799.2016edition/utilities/V3_chap02.html#tag_18_06_02) , where they say it counts "characters". This would be clearer if characters in POSIX C weren't octets, normally. Anyways: Nice! Kind of good, seeing that LANG==en_US.utf8. Now,

var='🧜🏿‍♀️'
echo "${var} has the length ${#var}, and is $(printf "%s" "$var"| wc -c) bytes long"

🧜🏿‍♀️ has the length 5, and is 17 bytes long

Soooo, we decompose "Mermaid of dark skin color" into the Unicode codepoint 1. Merperson 2. Dark skin tone 3. Zero-Width Joiner 4. Female 5. Print print the previous character as emoji Fine, so we're really counting Unicode codepoints!

var="e\xcc\x81"
echo "${var} has the length ${#var}, and is $(printf "%s" "$var"| wc -c) bytes long"

é has the length 9, and is 9 bytes long

(of course, my console font decided that the ´ combines with the following space, not the preceding e. The latter would be correct. But let's leave my rage about that for somewhen else.) Um, a slight "wat" is in order here.

> printf "e\xcc\x81"|wc -c
3
> printf "%s" "${var}" |wc -c
9
> echo -n ${var} |wc -c
3
> echo "${var} has the length ${#var}, and is $(printf "%s" "$var"| wc -c) bytes long"
é has the length 9, and is 9 bytes long
> printf "%s" "${var}" |xxd
00000000: 655c 7863 635c 7838 31                   e\xcc\x81

Here's where I give up. echo $var, echo ${var} and echo "${var}" all "correctly" emit three bytes. However, echo ${#var} tells me it's 9 charachters. Where is this documented/standardized, what's the rules for all this?

Marcus Müller (47107 rep)

Jan 9, 2022, 12:36 PM • Last activity: Jan 30, 2023, 04:42 PM

0 votes

1 answers

2448 views

Convert variable from little endian to big endian

linux bash shell hex byte

Working in Bash I have a hex variable that I must convert from little endian to big endian I am new to the entire concept of this and only learned about this about 20 minutes ago, so please bare with me. My script determines a hex variable that undergoes a few changes: decimal, signed 2's complement...

                                  Working in Bash
I have a hex variable that I must convert from little endian to big endian
I am new to the entire concept of this and only learned about this about 20 minutes ago, so please bare with me.

My script determines a hex variable that undergoes a few changes: decimal, signed 2's complement, and division by 8.

Before everything though it must go through little endian to big endian conversion (I may be confusing the two but my example below should clarify)

EXAMPLE:
1. Hex Value: 0080
   After Conversion: 8000

2. Hex Value: 9800
   After Conversion: 0098

3. Hex Value: 1234
   After Conversion: 3412

I believe that this is a 16 bit hex variable as it is always 4 digits.
                                

Nir (1 rep)

Nov 15, 2022, 07:52 PM • Last activity: Nov 15, 2022, 08:25 PM

0 votes

1 answers

666 views

print byte from number in awk

awk byte

I can print a byte from a string literal like: `awk 'BEGIN {print "\001"}' | cat -v` But I need to print a byte of the result of a bitwise OR. So how can I print a byte from a number? Gawk is ok.

                                  I can print a byte from a string literal like:
awk 'BEGIN {print "\001"}' | cat -v

But I need to print a byte of the result of a bitwise OR.
So how can I print a byte from a number?

Gawk is ok.
                                

sedwho (5 rep)

Feb 4, 2022, 11:39 PM • Last activity: Feb 5, 2022, 12:48 AM

0 votes

2 answers

1187 views

Count bytes of filename

size byte

How can I know how many bytes does it weight the name of a filename? Just the file, not the full path. I've tried this: echo 'filename.extension' | wc -c is this right?

                                  How can I know how many bytes does it weight the name of a filename? Just the file, not the full path.
I've tried this:

    echo 'filename.extension' | wc -c

is this right?

Smeterlink (295 rep)

Jan 8, 2022, 11:23 PM • Last activity: Jan 9, 2022, 01:56 PM

0 votes

1 answers

101 views

Execute Program until specific amount of bytes has been returned on stdout, then terminate

shell stdout byte

Imagine I have the following program/script `./generate-infinite-byte-stream`: ```bash #!/bin/bash echo -n 'hello' sleep infinity ``` The infinite sleep command represents a network connection that may or may not deliver more data in the indefinite future that I am not interested in. I would like to...

Imagine I have the following program/script ./generate-infinite-byte-stream:

#!/bin/bash
echo -n 'hello'
sleep infinity

The infinite sleep command represents a network connection that may or may not deliver more data in the indefinite future that I am not interested in. I would like to have a program, let's call it take 5 that runs ./generate-infinite-byte-stream until it has output 5 bytes on stdout and then terminates it:

take 5 ./generate-infinite-byte-stream
# gives 'hello' and returns with exit code 0

Is there such a program or do I need to roll my own with popen()? The program take should also redirect stdin to the executed program. Note: head -c 5 does not do the right thing, because it does not terminate:

./generate-infinite-byte-stream | head -c 5
# this returns 'hello', but never terminates

Aside: The name Take is inspired by the https://reference.wolfram.com/language/ref/Take.html command which returns the first n elements of a list.

masterxilo (137 rep)

Jun 4, 2021, 09:48 AM • Last activity: Jun 4, 2021, 10:13 AM

11 votes

2 answers

2798 views

How do I find the first non-zero byte on a block device, with an optional offset?

bash dd block-device byte

I'm trying to find the first non-zero byte (starting from an optional offset) on a block device using `dd` and print its offset, but I am stuck. I didn't mention `dd` in the title as I figured there might be a more appropriate tool than `dd` to do this, but I figured `dd` should be a good start. If...

I'm trying to find the first non-zero byte (starting from an optional offset) on a block device using dd and print its offset, but I am stuck. I didn't mention dd in the title as I figured there might be a more appropriate tool than dd to do this, but I figured dd should be a good start. If you know of a more appropriate tool and/or more efficient way to reach my goal, that's fine too. In the meantime I'll show you how far I've come with dd in bash, so far.

#!/bin/bash

# infile is just a temporary test file for now, which will be replaced with /dev/sdb, for instance
infile=test.txt
offset=0

while true; do
  byte=dd status='none' bs=1 count=1 if="$infile" skip=$offset
  ret=$?

  # the following doesn't appear to work
  # ret is always 0, even when the end of file/device is reached
  # how do I correctly determine if dd has reached the end of file/device?
  if [ $ret -gt 0 ]; then
    echo 'error, or end of file reached'
    break
  fi

  # I don't know how to correctly determine if the byte is non-zero
  # how do I determine if the read byte is non-zero?
  if [ $byte ???? ]; then
    echo "non-zero byte found at $offset"
    break
  fi

  ((++offset))
done

As you can see, I'm stuck with two issues that I don't know how to solve: a. How do I make the while loop break when dd has reached the end of the file/device? dd gives an exit code of 0, where I expected a non-zero exit code instead. b. How do I evaluate whether the byte that dd read and returns on stdout is non-zero? I think I've read somewhere that special care should be taken in bash with \0 bytes as well, but I'm not even sure this pertains to this situation. Can you give me some hints on how to proceed, or perhaps suggest and alternative way to achieve my goal?

ExploringQuest (113 rep)

Jun 1, 2021, 01:01 PM • Last activity: Jun 2, 2021, 10:53 PM

0 votes

2 answers

304 views

Is there a way to strip the high bit of each byte in a file?

sed conversion tr byte

I've been trying to figure out if this can be done in `sed` or `tr`, but I can't find it. I have a bunch of files from an old Apple II which have the high bit set on each byte. On a Mac, this results in a bunch of gibberish. Of course, I could write a program to xor $80 each byte, but I'm thinking t...

                                  I've been trying to figure out if this can be done in sed or tr, but I can't find it.

I have a bunch of files from an old Apple II which have the high bit set on each byte. On a Mac, this results in a bunch of gibberish. Of course, I could write a program to xor $80 each byte, but I'm thinking that there MUST be a way in UNIX to do this!

Any ideas?

bjb (113 rep)

Apr 1, 2021, 10:56 PM • Last activity: Apr 1, 2021, 11:50 PM

5 votes

3 answers

9541 views

What is the difference between a byte and a character (at least *nixwise)?

escape-characters character-encoding special-characters terminology byte

I understand that any character is comprised of one or more byte/s. If I am not mistaken, at least in *nix operating systems, a character will generally (or totally?) be comprised of only one byte. What is the difference between a byte and a character (at least *nixwise)?

                                  I understand that any character is comprised of one or more byte/s.

If I am not mistaken, at least in *nix operating systems, a character will generally (or totally?) be comprised of only one byte.

What is the difference between a byte and a character (at least *nixwise)?

variableexpander (125 rep)

Feb 23, 2021, 06:00 PM • Last activity: Feb 24, 2021, 06:00 PM

20 votes

4 answers

40315 views

Is there a oneliner that converts a binary file from little endian to big endian?

byte

and vice versa. I am running a RedHat if relevant.

                                  and vice versa.

I am running a RedHat if relevant.

Fermat's Little Student (585 rep)

Oct 29, 2015, 04:22 PM • Last activity: Feb 24, 2021, 06:20 AM

5 votes

5 answers

2990 views

How to count the number of bytes in a file, grouping the same bytes?

linux command-line files binary byte

Example: I have the file "mybinaryfile", and the contents in hex are: A0 01 00 FF 77 01 77 01 A0 I need to know how many A0 bytes there are in this file, how many 01, and so on. The result could be: A0: 2 01: 3 00: 1 FF: 1 77: 2 Is there some way to make this count directly in shell or do I need to...

                                  Example: I have the file "mybinaryfile", and the contents in hex are:

    A0 01 00 FF 77 01 77 01 A0

I need to know how many A0 bytes there are in this file, how many 01, and so on. The result could be:

    A0: 2
    01: 3
    00: 1
    FF: 1
    77: 2

Is there some way to make this count directly in shell or do I need to write a program in whatever language to do this specific task?
                                

Lawrence (329 rep)

Jun 28, 2019, 04:50 PM • Last activity: Mar 31, 2020, 11:56 AM

1 votes

4 answers

1569 views

How to count the number of bytes in a very large file, grouping the same bytes?

linux command-line files binary byte

I am searching for a way to get a statistics on a very large (multiple times larger than the available RAM) the outputs what byte values in the files are present and how often: A0 01 00 FF 77 01 77 01 A0 I need to know how many A0 bytes there are in this file, how many 01, and so on. The result coul...

                                  I am searching for a way to get a statistics on a very large (multiple times larger than the available RAM) the outputs what byte values in the files are present and how often:

    A0 01 00 FF 77 01 77 01 A0

I need to know how many A0 bytes there are in this file, how many 01, and so on. The result could be:

    A0: 2
    01: 3
    00: 1
    FF: 1
    77: 2

Therefore this question is very close to the question [How to count the number of bytes in a file, grouping the same bytes?](https://unix.stackexchange.com/questions/527521/how-to-count-the-number-of-bytes-in-a-file-grouping-the-same-bytes)  but non of the existing answers works for larger files. From my understanding all answers require a minimum RAM equal to the size of the file to be tested (up to multiple times). 

Hence the answers don't work on systems with small RAM, e.g. a Raspberry for processing a multi-GB file.

Is there a simple solution that works on any file size even if we have for example only 512MB RAM available?
                                

Robert (163 rep)

Mar 30, 2020, 12:31 PM • Last activity: Mar 31, 2020, 12:51 AM

0 votes

1 answers

287 views

Entropy: whats the difference between bits and bytes?

encryption openssl byte

If I use `openssl` to generate some random data (for a keyfile, for example): openssl rand -hex 2048 >/tmp/file Is this 4097 bits (or bytes?) of entropy? -rw-rw-r-- 1 username username 4097 Oct 30 20:01 /tmp/file

                                  If I use openssl to generate some random data (for a keyfile, for example):

	openssl rand -hex 2048 >/tmp/file

Is this 4097 bits (or bytes?) of entropy?

	-rw-rw-r-- 1 username username 4097 Oct 30 20:01 /tmp/file

user318576 (3 rep)

Oct 31, 2018, 03:04 AM • Last activity: Oct 31, 2018, 03:44 AM

1 votes

2 answers

28619 views

What options does wget --report-speed take?

wget download byte

When I do this command: wget --report-speed=type they only *type* it accepts is `bits`. It won't have numbers, kilobits / kilobytes or bytes. The help page (`wget --help`) says: --report-speed=TYPE Output bandwidth as TYPE. TYPE can be bits. suggesting that they TYPE **can** be something else? What...

                                  When I do this command:

    wget --report-speed=type

they only *type* it accepts is bits. It won't have numbers, kilobits / kilobytes or bytes.

The help page (wget --help) says:

    --report-speed=TYPE   Output bandwidth as TYPE.  TYPE can be bits.

suggesting that they TYPE **can** be something else?

What options does it take that I haven't found, and (if this option doesn't do this) how can I force the speed to be displayed as bytes or Kilobytes.

Tim (123 rep)

Dec 7, 2014, 01:13 PM • Last activity: Jul 8, 2017, 08:37 PM

22 votes

3 answers

26815 views

How do I trim bytes from the beginning and end of a file?

files trim byte

I have a file, that has trash (binary header and footer) at the beginning and end of the file. I would like to know how to nuke these bytes. For an example, let's assume 25 bytes from the beginning. And, 2 bytes from the end. I know I can use truncate and dd, but truncate doesn't work with a stream...

                                  I have a file, that has trash (binary header and footer) at the beginning and end of the file. I would like to know how to nuke these bytes. For an example, let's assume 25 bytes from the beginning. And, 2 bytes from the end.

I know I can use truncate and dd, but truncate doesn't work with a stream and it seems kind of cludgey to run two commands on the hard file. It would be nicer if truncate, knowing how big the file was, could cat the file to dd. Or, if there was a nicer way to do this?

Evan Carroll (34663 rep)

May 24, 2017, 08:38 PM • Last activity: May 24, 2017, 09:57 PM

26 votes

3 answers

1536 views

Byte count of "ls -l <random file>" versus that of "wc -c <random file>"

ls wc byte

Is there any possible situation when ls -l file.txt is showing not the same number of bytes as wc -c file.txt In one script I found comparison of those two values. What could be the reason of that? Is it even possible to have different byte counts of the same file?

                                  Is there any possible situation when 

    ls -l file.txt

is showing not the same number of bytes as

    wc -c file.txt

In one script I found comparison of those two values. What could be the reason of that? Is it even possible to have different byte counts of the same file?

Rokas.ma (573 rep)

Jan 16, 2017, 02:35 PM • Last activity: Jan 16, 2017, 03:49 PM

3 votes

2 answers

740 views

How to bury an invisible mark into lines of text?

grep text byte

How can I bury an **invisible** mark into random lines of text? Such a mark has to be there, though it will be invisible to someone reading that text printed out on the console. I want to identify those lines by means of an invisible mark in order to, for instance, grep them in or out later. I tried...

                                  How can I bury an **invisible** mark into random lines of text? Such a mark has to be there, though it will be invisible to someone reading that text printed out on the console.

I want to identify those lines by means of an invisible mark in order to, for instance, grep them in or out later.

I tried 0x00 without success. I expected grep to print lines matching 0x00 somewhere. But this didn't work:

    $ echo -e "a\0b" | hexdump -C
    00000000  61 00 62 0a                                       |a.b.|
    00000004
    $ echo -e "a\0b" | grep "a\0b"
                                

n.r. (2263 rep)

Dec 29, 2013, 09:53 PM • Last activity: Dec 29, 2013, 11:17 PM

-2 votes

1 answers

2045 views

How large is the Linux kernel compared to Unix?

linux byte

Not in just LOC (lines of code), but in storage size, as in bytes, megabytes, gigabytes, etc. Also, any sources where I can download the original-based Unix OS? Thanks!

                                  Not in just LOC (lines of code), but in storage size, as in bytes, megabytes, gigabytes, etc.

Also, any sources where I can download the original-based Unix OS? Thanks!

thomas brain (15 rep)

Dec 6, 2013, 09:42 PM • Last activity: Dec 6, 2013, 10:37 PM

0 votes

1 answers

316 views

Why is od calculating decimal values wrong?

od hex byte

This question is related to the answer from enzotib to the question: https://unix.stackexchange.com/questions/88848/how-could-i-use-bash-to-find-2-bytes-in-a-binary-file-increase-their-values-an This converts the two bytes into its hex value: $ echo -n $'\x1b\x1f' | od -tx2 0000000 1f1b 0000002 But...

                                  This question is related to the answer from enzotib to the question: https://unix.stackexchange.com/questions/88848/how-could-i-use-bash-to-find-2-bytes-in-a-binary-file-increase-their-values-an 

This converts the two bytes into its hex value:

    $ echo -n $'\x1b\x1f' | od -tx2
    0000000 1f1b
    0000002

But now, this should give me the decimal value:

    echo -n $'\x1b\x1f' | od -tu2
    0000000  7963
    0000002

But if I convert the hex value into decimal it should be

    $ printf "%d" 0x1b1f
    6943

Why is that? Am I using od wrong for decimal output?
                                

erik (17679 rep)

Aug 30, 2013, 09:48 PM • Last activity: Aug 30, 2013, 10:18 PM

Showing page 1 of 19 total questions