Sample Header Ad - 728x90

Generating unique IDs for json content indexing

3 votes
4 answers
2242 views
I am looking for effective and simple ID generation for the following content using bash script: {"name": "John", "surname": "Gates", "country": "Germany", "age": "20", "height": "180"} {"name": "John1", "surname": "Gates", "country": "Germany", "age": "20", "height": "180"} {"name": "John2", "surname": "Gates", "country": "Germany", "age": "20", "height": "180"} {"name": "John3", "surname": "Gates", "country": "Germany", "age": "20", "height": "180"} {"id": "XXX", "name": "John", "surname": "Gates", "country": "Germany", "age": "20", "height": "180"} {"id": "XXX", "name": "John1", "surname": "Gates", "country": "Germany", "age": "20", "height": "180"} {"id": "XXX", "name": "John2", "surname": "Gates", "country": "Germany", "age": "20", "height": "180"} {"id": "XXX", "name": "John3", "surname": "Gates", "country": "Germany", "age": "20", "height": "180"} I will have approximately 5,000,000 of similar records and I want to generate repeatable, predictable ID. As I will be constrained by time to process the following file, I need to do it under 20 minutes window to sql lite database on a Linux machine. MD5, SHA1 are too expensive to be used, unless I can do something like GNU Parallel on 16 threads on AMD Ryzen 1900X CPU that will manage to do it under a few minutes? I have tried with MD5, accomplished 28,000 IDs calculated with 1 min 45 seconds. With SHA1 it took me 2min 3 seconds. I was thinking about creating ID very simple: JohnGatesGermany20180 John1GatesGermany20180 John2GatesGermany20180 John3GatesGermany20180 What could you recommend where the following requirements have to be met: - bash - Linux - 5,000,000 records to process - under 20 minutes - id has to be the same for the same json lines Performed tests: #!/usr/local/bin/bash while IFS= read -r line do uuid=$(uuidgen -s --namespace @dns --name "www.example.com" ) done > test3.txt done < testfile1.txt $time bash script.sh real 12m49.396s user 12m23.219s sys 4m1.417s
Asked by Anna (53 rep)
Aug 2, 2018, 08:43 PM
Last activity: Nov 10, 2023, 04:42 PM