Sample Header Ad - 728x90

Matching Japanese regex (simple ranges) in bash doesn't work as intended

2 votes
1 answer
197 views
I am pretty sure my regexes are fine but they don't work with bash. I crafted them myself using https://unicode.org/charts/ . As you will see, they work properly with awk. Here are the ranges to spare you the need to check them yourself, especially if you don't know Japanese: - hiragana [ぁ-ゟ] - ぁあぃいぅうぇえぉおかがきぎくぐけげこごさざしじすずせぜそぞただちぢっつづてでとどなにぬねのはばぱひびぴふぶぷへべぺほぼぽまみむめもゃやゅゆょよらりるれろゎわゐゑをんゔゕゖ>゙>゚__ゝゞゟ - katakana [゠-ヿㇰ-ㇿ!-○] - ゠ァアィイゥウェエォオカガキギクグケゲコゴサザシジスズセゼソゾタダチヂッツヅテデトドナニヌネノハバパヒビピフブプヘベペホボポマミムメモャヤュユョヨラリルレロヮワヰヱヲンヴヵヶヷヸヹヺ・ーヽヾヿ - ㇰㇱㇲㇳㇴㇵㇶㇷㇸㇹㇺㇻㇼㇽㇾㇿ - !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~⦅⦆。「」、・ヲァィゥェォャュョッーアイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワン゙゚ᄀᄁᆪᄂᆬᆭᄃᄄᄅᆰᆱᆲᆳᆴᆵᄚᄆᄇᄈᄡᄉᄊᄋᄌᄍᄎᄏᄐᄑ하ᅢᅣᅤᅥᅦᅧᅨᅩᅪᅫᅬᅭᅮᅯᅰᅱᅲᅳᅴᅵ¢£¬ ̄¦¥₩│←↑→↓■○ I also have a regex to find kanjis [一-龥] but this one works as intended in bash. The >>> wrong! are comments I added to pinpoint where the problems are.
[[ "する" =~ [ぁ-ゟ] ]] && echo 'is hiragana' || echo 'is not hiragana'
is hiragana

echo 'する' | awk '/[ぁ-ゟ]/ {print "is hiragana"}'
is hiragana

[[ "スル" =~ [ぁ-ゟ] ]] && echo 'is hiragana' || echo 'is not hiragana'
is hiragana >>> wrong!

echo 'スル' | awk '/[ぁ-ゟ]/ {print "is hiragana"}'

[[ "僕" =~ [ぁ-ゟ] ]] && echo 'is hiragana' || echo 'is not hiragana'
is not hiragana

echo '僕' | awk '/[ぁ-ゟ]/ {print "is hiragana"}'

[[ "する" =~ [゠-ヿㇰ-ㇿ!-○] ]] && echo 'is katakana' || echo 'is not katakana'
is katakana >>> wrong!

echo 'する' | awk '/[゠-ヿㇰ-ㇿ!-○]/ {print "is katakana"}'

[[ "スル" =~ [゠-ヿㇰ-ㇿ!-○] ]] && echo 'is katakana' || echo 'is not katakana'
is katakana

echo 'スル' | awk '/[゠-ヿㇰ-ㇿ!-○]/ {print "is katakana"}'
is katakana

[[ "僕" =~ [゠-ヿㇰ-ㇿ!-○] ]] && echo 'is katakana' || echo 'is not katakana'
is not katakana

echo '僕' | awk '/[゠-ヿㇰ-ㇿ!-○]/ {print "is katakana"}'
It's like bash consider hiragana and katakana to be equivalent, like it converts them beforehand or something?
Asked by Some_user (63 rep)
Apr 15, 2023, 02:25 PM
Last activity: Dec 31, 2023, 12:34 PM