Generating unique IDs for json content indexing
3
votes
4
answers
2242
views
I am looking for effective and simple ID generation for the following content using bash script:
{"name": "John", "surname": "Gates", "country": "Germany", "age": "20", "height": "180"}
{"name": "John1", "surname": "Gates", "country": "Germany", "age": "20", "height": "180"}
{"name": "John2", "surname": "Gates", "country": "Germany", "age": "20", "height": "180"}
{"name": "John3", "surname": "Gates", "country": "Germany", "age": "20", "height": "180"}
{"id": "XXX", "name": "John", "surname": "Gates", "country": "Germany", "age": "20", "height": "180"}
{"id": "XXX", "name": "John1", "surname": "Gates", "country": "Germany", "age": "20", "height": "180"}
{"id": "XXX", "name": "John2", "surname": "Gates", "country": "Germany", "age": "20", "height": "180"}
{"id": "XXX", "name": "John3", "surname": "Gates", "country": "Germany", "age": "20", "height": "180"}
I will have approximately 5,000,000 of similar records and I want to generate repeatable, predictable ID. As I will be constrained by time to process the following file, I need to do it under 20 minutes window to sql lite database on a Linux machine.
MD5, SHA1 are too expensive to be used, unless I can do something like GNU Parallel on 16 threads on AMD Ryzen 1900X CPU that will manage to do it under a few minutes?
I have tried with MD5, accomplished 28,000 IDs calculated with 1 min 45 seconds.
With SHA1 it took me 2min 3 seconds.
I was thinking about creating ID very simple:
JohnGatesGermany20180
John1GatesGermany20180
John2GatesGermany20180
John3GatesGermany20180
What could you recommend where the following requirements have to be met:
- bash
- Linux
- 5,000,000 records to process
- under 20 minutes
- id has to be the same for the same json lines
Performed tests:
#!/usr/local/bin/bash
while IFS= read -r line
do
uuid=$(uuidgen -s --namespace @dns --name "www.example.com" )
done > test3.txt
done < testfile1.txt
$time bash script.sh
real 12m49.396s
user 12m23.219s
sys 4m1.417s
Asked by Anna
(53 rep)
Aug 2, 2018, 08:43 PM
Last activity: Nov 10, 2023, 04:42 PM
Last activity: Nov 10, 2023, 04:42 PM