Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

1 votes

1 answers

948 views

Transition from MongoDB Time Series Collections to InfluxDB

database-design mongodb migration time-series-database influx-db

With version 5.0 [MongoDB][1]'s specialized [Time Series Collections][2] were introduced to deal with such data. As I already stored some sensor meta data (configuration, specification ...) in MongoDB, I decided to make use of these special collections to store sensor readings next to the sensor met...

                                  With version 5.0 MongoDB 's specialized Time Series Collections  were introduced to deal with such data. As I already stored some sensor meta data (configuration, specification ...) in MongoDB, I decided to make use of these special collections to store sensor readings next to the sensor meta data.

According to the docs  I used a single document for each sensor reading like this (pseudo code):

    {
            "timestamp": timestamp,
            "value": value,
            "metadata": {
                "sensorId": sensor_uid,
                "unit": sensor_unit,            
                "type": sensor_type,
                "fromFile": reading_imported_from_file,
            },
    }

Around 50 different sensors are read at the same time which results in 50 documents with equal timestamp but varying value and metadata.

I am currently working on migrating our time series data storage from MongoDB to InfluxDB as this seems to provide a sleeker API and has some basic data visualization already included . As already described above, in MongoDB I used a single document per sensor which might be considered as bad practice when using InfluxDB :

> A measurement per sensor seems unnecessary and can add a significant
> amount of system overhead depending on the number of sensors you have.
> I’d suggest storing all sensor data in a single measurement with
> multiple fields, [...]

Based on this I came up with the following data structure to be passed to InfluxDB (Python dictionary pseudo code for influxdb-client ):

    {
        "time": 1,
        "measurement": measurement_name,
        "tags": {
            "location": location,
            "from_file": reading_imported_from_file,
        },
        "fields": {
            "sensor_1": reading_from_sensor_1,
            "sensor_2": reading_from_sensor_2,
            "sensor_3": reading_from_sensor_3,
        },
    }

However, I did not figure out how to store the other meta data like sensorId, unit, or type. On the one hand side I could easily solve this by violating the before mentioned suggestion and use a single measurement per sensor. On the other hand side, from a relational perspective these meta information should be tied to the sensorId and be therefore accessible from a sensor configuration/specification database using the sensorId as a key. Unfortunately, these values can change throughout a single measurement or experiment due to changing device configurations on-site which are not reflected in the configuration database.

How could I solve this issue? Am I missing something or do I simply have to deal with this design/performance vs. ease-of-use tradeoff?





                                

albert (113 rep)

Feb 18, 2022, 10:24 PM • Last activity: Jan 6, 2025, 10:04 AM

3 votes

2 answers

2960 views

Choosing the right database for stock price history

postgresql mongodb database-recommendation influx-db aerospike

The model is `[(stock_id, period, ts), open, high, low, close, last, volume]` We write new prices for all stocks (120,000) each minute and delete old once when they go out of retention time. It doesn't matter if retention cleanup will happen automatically or we'll do daily cleanup process. Periods a...

The model is [(stock_id, period, ts), open, high, low, close, last, volume] We write new prices for all stocks (120,000) each minute and delete old once when they go out of retention time. It doesn't matter if retention cleanup will happen automatically or we'll do daily cleanup process. Periods are 1 minute, 10 minutes, 1 day, 7 days and about 1,000 to 10,000 last points data retention for each period. Currently there are about 200,000,000 rows (40GB) in postgres table and the performance is sometimes bad, especially if autovacuum is triggered. When we query we usually pull the whole period to show chart, but sometimes we access specific dates. We try to understand if there are some optimizations that can be done in postgresql itself or maybe some other database will work better. NoSQL databases that I consider for testing are MongoDB and Aerospike with the following document model. The document key would be stock_id in this case.

{
 1111: {
   "symbol": "aapl",
   "hist_1m": {
      12334: {
        "last": 123.1,
        "close": 123.2,
        ...
      }, ...
   },
   "hist_10m": {...},
   "hist_1d": {...},
   "hist_7d": {...},
 },
2222: {...}

But I'm really not sure about performance of such model where each sub-hash will be 1000 to 10,000 hashes or maybe even more in the future. In aerospike there's per-map max size of write-block-size (1mb default), in Mongo the limit per document is 16mb but it doesn't explain the performance. How fast or efficient are individual additions to large collections in MongoDB or Aerospike? Are they happen in-place or require loading whole collection in memory and rewriting it back to disk, like it would be with postgresql jsonb column? In postgres we just do thousands of inserts and it's very fast. The performance issue happens because of nature of infinite insert/delete - gaps in the tablespace and autovacuum. Also very it takes quite a long time to do global backfills. I even thought about timeseries DBs like Prometheus or InfluxDB but really don't think they're designed for realtime high-load querying. Please suggest in which database/model you think is ideal for this purpose. I searched for existing question with the same requirement (as I think thousands of systems store similar historical data in some way). Thanks.

Kaplan Ilya (173 rep)

Jun 16, 2022, 01:22 PM • Last activity: Dec 21, 2024, 09:03 AM

1 votes

1 answers

2250 views

InfluxDB - Tag vs Field for simple, time series data

influx-db

We are new to InfluxDB, and are struggling to understand the ***query performance*** difference between a "tag" and a "field" for storing simple, time-stamped measurement values (literally, time series data). I have to imagine this is one of the most common applications for Influx, yet still I'm not...

                                  We are new to InfluxDB, and are struggling to understand the ***query performance*** difference between a "tag" and a "field" for storing simple, time-stamped measurement values (literally, time series data).  I have to imagine this is one of the most common applications for Influx, yet still I'm not clear what is the smartest method.

We have previously logged data to MySQL and similar databases, using a structure that had columns of name/timestamp/value/unit.  Each measurement became one row in that table.  Obviously, this has some performance drawbacks, so we are looking for a better way.

A new InfluxDB will be installed on a **project**.  We have some number of **sensors** on the **project**, each of which has a unique **identifier** (i.e. "TT-001" might be Temperature Sensor #1) and produces a single **measured value** (i.e. 104.6 degF) with a **timestamp**.  These measurements are taken at regular intervals (i.e. every second).  There are perhaps 500 individual measurements, all of which will be stored for periods of perhaps 10 years.

Summarizing the question:  described as above, will logging data to InfluxDB as follows will result in the fastest-loading ***queries***?
  - Measurement = "Project_Name"
    - Field Key => Measurement Name (i.e. "TT-001")
    - Field Value => Measured Value (i.e. 104.6)

The queries are almost exclusively "select (tags) from [start date to end date]".  Literally, 99% of these queries will be graphs of data over time using Grafana.

For this application, where does the "tag" fit in, or is it not useful?

Before anybody asks:  I have read the documentation, and continued looking into examples, forums, blogs, etc.  I have yet to find a concise, clear answer as to the highest-performant method for storing measurements as described above, where I do not believe that "metadata" is particularly useful for our application.  

Really appreciate some input on this...the StackExchange community has always been able to help when I hit these kinds of road blocks!

lifeofaboat (11 rep)

Oct 5, 2020, 05:41 PM • Last activity: Sep 17, 2024, 01:03 AM

2 votes

0 answers

26 views

Kubernetes: Influxdb 1.8.10 container can’t create users

docker kubernetes influx-db

I deployed on docker **InfluxDB v 1.8.10** with command: docker run --name influxdb -t -e INFLUXDB_HTTP_AUTH_ENABLED=“true” -e INFLUXDB_DB=“mydatabase” -e INFLUXDB_USER=“user” -e INFLUXDB_USER_PASSWORD=“user” -e INFLUXDB_ADMIN_USER=“admin” -e INFLUXDB_ADMIN_PASSWORD=“admin” –restart unless-stopped -...

                                  I deployed on docker **InfluxDB v 1.8.10** with command:

    docker run --name influxdb -t
    -e INFLUXDB_HTTP_AUTH_ENABLED=“true”
    -e INFLUXDB_DB=“mydatabase”
    -e INFLUXDB_USER=“user”
    -e INFLUXDB_USER_PASSWORD=“user”
    -e INFLUXDB_ADMIN_USER=“admin”
    -e INFLUXDB_ADMIN_PASSWORD=“admin”
    –restart unless-stopped
    -d influxdb:1.8.10
 
When I connect I see that the new **Admin** user is created.
Now I would like to deploy it on Kubernetes, this is my code:

    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
        name: influxdb
        namespace: mon-grafana
    spec:
      serviceName: influxdb
      replicas: 3
      selector:
        matchLabels:
         app: influxdb
      template:
        metadata:
          labels:
            app: influxdb
        spec:
          containers:
          - name: influxdb
            image: influxdb:1.8.10
            ports:
            - containerPort: 8086
              name: influxdb
              protocol: TCP
            resources:
              requests:
                cpu: 250m
                memory: 500Mi
              limits:
                cpu: 2
                memory: 500Mi
            env:
            - name: INFLUXDB_HTTP_AUTH_ENABLED
              value: "true"
            - name: INFLUXDB_DB
              value: "mydatabase"
            - name: INFLUXDB_USER
              value: "user"
            - name: INFLUXDB_USER_PASSWORD
              value: "user"
            - name: INFLUXDB_ADMIN_USER
              value: "admin"
            - name: INFLUXDB_ADMIN_PASSWORD
              value: "admin"
            volumeMounts:
            - name: pvc-influxdb
              mountPath: /var/lib/influxdb
            - name: influxdb-config
              mountPath: "/etc/influxdb/influxdb.conf"
              subPath: influxdb.conf
            securityContext:
              allowPrivilegeEscalation: false
          volumes:
          - name: influxdb-config
            configMap:
              name: configmap
              items:
              - key: influxdb.conf
                path: influxdb.conf
      volumeClaimTemplates:
      - metadata:
          name: pvc-influxdb
        spec:
          accessModes: [ "ReadWriteOnce" ]
          resources:
            requests:
              storage: 2Gi

this is my **infliuxdb.conf** file:

    root@influxdb-0:/# cat /etc/influxdb/influxdb.conf
    reporting-enabled = false
    
    [meta]
    dir = “/var/lib/influxdb/meta”
    
    [data]
    dir = “/var/lib/influxdb/data”
    wal-dir = “/var/lib/influxdb/wal”
    max-series-per-database = 1000000
    max-values-per-tag = 100000
    index-version = “tsi1”
    max-index-log-file-size = “100k”
    
    [http]
    log-enabled = false
    enabled = true
    bind-address = “0.0.0.0:8086”
    https-enabled = false
    flux-enabled = true

After deploy it in Kubernetes, the **Admin** user is not created during deploy, and when I try to connect I receive this error:

> root@influxdb-0:/# influx -username admin -password admin Connected to
> http://localhost:8086 version 1.8.10 InfluxDB shell version: 1.8.10
> 
> show databases; ERR: error authorizing query: create admin user first
> or disable authentication Warning: It is possible this error is due to
> not setting a database. Please set a database with the command “use ”.

How can create the Admin user during the initial deploy (like I did in docker)?
                                

Marco Ferrara (121 rep)

May 2, 2024, 08:14 AM • Last activity: May 2, 2024, 09:13 AM

1 votes

0 answers

145 views

InfluxDB not returning results when group by time offset at a summer/winter time change

timezone influx-db

Let's start with an example. I've got an application that logs a point every minute. When I aggregate this data into a query where I'll have the count of logs per day, I get the following result. Keep in mind, that the query and the data is in UTC. ```sql SELECT count(int_3) FROM data WHERE time >=...

SELECT count(int_3) 
FROM data 
WHERE time >= '2023-10-26T00:00:00Z' 
AND time = '2023-10-26T00:00:00+02:00' 
AND time  The offset_interval is a duration literal. It shifts forward or back the InfluxDB database’s preset time boundaries. The offset_interval can be positive or negative.

For example let's use the first query (UTC) again and shift the start time and endtime one hour back.

SELECT count(int_3) FROM data WHERE time >= '2023-10-25T23:00:00Z' AND time = '2023-10-25T23:00:00Z' AND time = '2023-10-25T23:00:00+02:00' AND time < '2023-10-31T22:59:59+01:00' GROUP BY time(1d,23h) fill(none) TZ('Europe/Amsterdam')

2023-10-25T23:00:00+02:00 1260 2023-10-26T23:00:00+02:00 1440 2023-10-27T23:00:00+02:00 1440 2023-10-28T23:00:00+02:00 1440 2023-10-29T23:00:00+01:00 1320 2023-10-30T23:00:00+01:00 1440 ``` And there is the problem. 2023-10-29 should have one hour of point (60 points) more than the other days, but actually got 2 hours (120 points) less, which I find difficult to explain. Does somebody have an explanation or a solution for this? I'm using InfluxDB v1.8.10 and v1.11.1.

Ron Nabuurs (121 rep)

Jan 19, 2024, 12:34 PM • Last activity: Jan 24, 2024, 07:48 AM

1 votes

1 answers

5345 views

Influxdb and grafana: howto create series that is the sum of two other series

influx-db

I have these three queries. They show the average upload speed of three different docker containers over time: [![enter image description here][1]][1] The difference() call is needed, because the data contains cumulative "bytes_sent" values, and it never decreases. Here is an example of the output:...

                                  I have these three queries. They show the average upload speed of three different docker containers over time:

The difference() call is needed, because the data contains cumulative "bytes_sent" values, and it never decreases.

Here is an example of the output:

I would like to create a third series that shows the sum of the above three graphs. If I simply remove the WHERE condition the it won't work:

The reason for this is that mean() always precedes difference(), and the mean value of these independent cumulative sums is meaningless:

The correct solution would be to calculate the three difference values and add them together. But I don't know how to do this in grafana?

Obviously the very best solution would be to store the differences in the database in the first place. Unfortunately, I'm not the one who is sending the data so I can't do anything about it.

nagylzs (421 rep)

Apr 13, 2018, 09:33 AM • Last activity: Mar 18, 2023, 12:02 AM

0 votes

1 answers

153 views

Database to save measurements

postgresql database-design timescaledb time-series-database influx-db

I am creating an infrastructure to save measurements coming from a fleet of around 2000 cars. Each car contains about 60 sensors (depending from car) with a sum of about 800 values par second coming from all the sensors. Each sensor is reading from 2 to 50 values of different type (boolean, integer...

                                  I am creating an infrastructure to save measurements coming from a fleet of around 2000 cars. 
Each car contains about 60 sensors (depending from car) with a sum of about 800 values par second coming from all the sensors. 

Each sensor is reading from 2 to 50 values of different type (boolean, integer and commasep).

I would like to save all this values in a database (in cloud) to allow us to read them in case of error and for future reports.

After a study of the possible database we have to chose between:
 * postgres with autopartitions
 * TimescaleDB
 * InfluxDB 

Knowing the scenario my ingegneristic side thinks about InfluxDB since the use case better fit a schemaless option. 
However my conservative side is saying to use a 25-years story database, in this latter case, from your experience is it better to adopt an approach 1 or 2?

Approach 1 is where each row consists in lecture of a value from one sensor -> [timestamp, sensor_id, measure_title, measure_value] (so 800 * 2000 rows every second).

Approach 2 is where a row consists in a lecture of a sensor [timestamp, sensor_id, measure_value_1, …, measure_value_50] (so 60 * 2000 rows every second) where potentially 49 columns can be null and we have another table that contains anagraphic for each title of measure_value_n?

Otherwise do you know other approaches?

Edit 1.

Data must be maintained indefinitely. No way of delete/cancelling

Approach 1 will store around 138 billion of rows par day
Approach 2 will store around 10 billion of rows par day
                                

Jam. G. (1 rep)

Nov 13, 2022, 01:35 PM • Last activity: Nov 13, 2022, 11:12 PM

0 votes

0 answers

916 views

InfluxDB on Raspberry Pi 3 and load issues

error-log influx-db

first of all, I'm not an database expert at all, but I managed to set up a system fitting my monitoring needs for more than one year now. Unfortunately, now the problem arises. Short summary of my system: - Raspberry Pi 3 (Raspbian buster) is running InfluxDB 1.8.5 - Python script is running as a se...

                                  first of all, I'm not an database expert at all, but I managed to set up a system fitting my monitoring needs for more than one year now. Unfortunately, now the problem arises. Short summary of my system:

 - Raspberry Pi 3 (Raspbian buster) is running InfluxDB 1.8.5
 - Python script is running as a service, writing ~20 datapoints every 5sec plus additional several 100 per month on demand. I would say, that is not too much in total.

My database is now >1GB:

    /var/lib/influxdb $ sudo du -hd1
    12K     ./meta
    39M     ./wal
    1.1G    ./data
    1.2G    .

I started to notice several days/weeks ago, that my system got really laggy. uptime reported load averages >4. I suspected the pretty old sd-card and switched to a proper new one, first only by putting the old image to the new card, later I really reinstalled everything and restored the influx data. It got better, but not really good. uptime reported now load averages in the range of 2. I added a monitor for uptime in my python script and it looks like this:

(at ~22.30 I restarted influxdb)

I did some more analysis and found that I can read the influx log with the command sudo journalctl -u influxdb.service. In the result I find lots of lines with the content similar to

    Apr 26 21:04:22 xxx influxd: ts=2021-04-26T19:04:22.133912Z lvl=info msg="Error replacing new TSM files" log_id=0TksnBt0000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0TktwGCG000 op_name=tsm1_compact_group db_shard_id=534 error="cannot allocate memory"

And here my knowledge really stops. Does anyone know what the issue is and what I can do to "repair" my database without loosing more data?

EDIT:

After some more research, my conclusion is that the root cause is my 32bit raspberry pi operating system with the limited RAM. Influx performs some kind of periodic housekeeping where the database is compressed and this fails on my system with slightly >1GB of database.

After all, it is just a hobby for me and I never really used proper databases. Influx was quite appealing for me, at least what I read made sense and I was able to set everything up quite easily. For my purpose I think of a workaround that I monitor my data with a high granularity every year in a new database. With that, I should not suffer the size limit of the individual databases. I'll have an additional database for the daily key measures. This should only consume some MB after several years.

Do you think that could work out or could there be some other pitfall?

bene (101 rep)

Apr 26, 2021, 09:49 PM • Last activity: Apr 27, 2021, 08:08 PM

23 votes

3 answers

39611 views

Understanding how to choose between fields and tags in InfluxDB

schema influx-db time-series-database

What are some good rules and examples of how to choose between storing data in fields vs. tags when designing [InfluxDB schemas](https://docs.influxdata.com/influxdb/v1.2/concepts/schema_and_data_layout/)? What I've [found so far](https://docs.influxdata.com/influxdb/v1.7/concepts/schema_and_data_la...

                                  What are some good rules and examples of how to choose between storing data in fields vs. tags when designing [InfluxDB schemas](https://docs.influxdata.com/influxdb/v1.2/concepts/schema_and_data_layout/) ?

What I've [found so far](https://docs.influxdata.com/influxdb/v1.7/concepts/schema_and_data_layout/#encouraged-schema-design)  is:

>  a measurement that changes over time should be a field, and metadata about the measurement should be in tags
>
> tags and fields are effectively columns in the table. tags are indexed, fields are not
>
> the values that are highly variant and not usually part of a WHERE clause are put into fields
>
> Store data in fields if you plan to use them with an InfluxQL function
>
> Tags containing highly variable information like UUIDs, hashes, and random strings will lead to a large number of series in the database, known colloquially as high series cardinality. High series cardinality is a primary driver of high memory usage for many database workloads.

But let's say you store filled orders in an e-commerce application: order id, sale price, currency.

* Should the order id be a tag, or a field?
* Should the currency be a tag, or a field?
                                

Dan Dascalescu (331 rep)

Feb 6, 2017, 01:49 AM • Last activity: Apr 5, 2021, 03:26 PM

1 votes

0 answers

228 views

Optimize InfluxQL query for multi core use?

influx-db

I have an InfluxQL query (Influxdb 1.8) & it takes a lot of time for it to finish even on c5.4xlarge with Intel Xeon Platinum 8000 with Turbo CPU clock speed of up to 3.6 GHz + EBS io2 with 16k iops available. But when I look at htop I see only 1 core loaded 100% & the rest are idle. Aws monitoring...

                                  I have an InfluxQL query (Influxdb 1.8) & it takes a lot of time for it to finish even on c5.4xlarge with Intel Xeon Platinum 8000 with Turbo CPU clock speed of up to 3.6 GHz + EBS io2 with 16k iops available.

But when I look at htop I see only 1 core loaded 100% & the rest are idle. Aws monitoring shows that volume iops while running the query are ~8k which is far from max 16k & ram is at 20%.

Is there a way to optimize the query to spread the load on all cores?

I have other queries & they load all cores ok.

Here's the problematic query:

    select count(*) from 
    (SELECT "pitch" AS "AAAA" FROM "AAAA"."autogen"."imu_messages"),
    (SELECT "pitch" AS "BBBB" FROM "BBBB"."autogen"."imu_messages"),
    (SELECT "pitch" AS "CCCC" FROM "CCCC"."autogen"."imu_messages"),
    (SELECT "pitch" AS "DDDD" FROM "DDDD"."autogen"."imu_messages"),
    (SELECT "pitch" AS "EEEE" FROM "EEEE"."autogen"."imu_messages"),
    (SELECT "pitch" AS "FFFF" FROM "FFFF"."autogen"."imu_messages"),
    (SELECT "pitch" AS "GGGG" FROM "GGGG"."autogen"."imu_messages"),
    (SELECT "pitch" AS "HHHH" FROM "HHHH"."autogen"."imu_messages"),
    WHERE time > now() - 60d GROUP BY time(1m) FILL(-1)

If you need any additional info let me know & I'll update the question with it.

Thanks.
                                

GTXBxaKgCANmT9D9 (131 rep)

Mar 20, 2021, 01:12 AM

1 votes

1 answers

3612 views

InfluxDB: selecting last() values from different tags

query-performance query time-series-database influx-db

I have a bucket with the raw data of several sensors, things like temperature, air pressure and etc. The sensors don't send their data at the same time, which means that a given timestamp might have several data points from various sensors or just from just one. Each reading is tagged with the ID of...

select last(Temperature) from raw_measure where ID =~ /4372502|4399699|4406512|4407840/

instead of returning the last Temperature observation of each ID, it returns the most recent temperature timestamp of the whole group. How can I get the last reading of each ID in a single query?

John D. (21 rep)

Feb 16, 2021, 04:00 PM • Last activity: Feb 19, 2021, 01:22 AM

0 votes

1 answers

2630 views

InfluxDB with Zabbix or Telefraf?

performance performance-tuning data-collection influx-db

I'm considering the use of InfluxDB to store thousands of measurements per second from routers in a network. Each measurement is lightweight (between 10-20 bytes). Now, I'm wondering what is the best strategy: InfluxDB + Zabbix? Or Telegraf? Or another alternative? 1) InfluxDB + Zabbix: according to...

                                  I'm considering the use of InfluxDB to store thousands of measurements per second from routers in a network.  Each measurement is lightweight (between 10-20 bytes).  Now, I'm wondering what is the best strategy: InfluxDB + Zabbix? Or Telegraf? Or another alternative?

1) InfluxDB + Zabbix: according to 

https://www.zabbix.com/integrations/influxdb 

there is a possible integration. However, it seems the integration involves taking data that was collected in a Zabbix database and moving it to an InfluxDB to take advantage of InfuxDB using much less storage. So they meant it to be for archiving large amounts of data.  

However, we believe Zabbix itself is too slow, no? The integration above would not work for online data collection, right? We need fast storage, and we do not want Zabbix to be a bottleneck. Are there known benchmarks and recommendations about Zabbix + InfluxDB integration, for online collection + storage?

2) InfluxDB + Telegraf: according to

   https://www.influxdata.com/blog/monitoring-openwrt-with-telegraf/ 

there is also the possibility of integrating InfluxDB + Telegraf for data collection.  Are there best practices and/or benchmarks about data collection with Telegraf + storage with InfluxDB?

3) Are there other alternatives and/or suggestions for large scale data collection and integration with InfluxDB?

Daniel S. (53 rep)

Oct 25, 2020, 01:44 AM • Last activity: Nov 3, 2020, 01:13 AM

2 votes

1 answers

3445 views

Is InfluxDB faster than PostgreSQL

postgresql performance influx-db

According to [this article](https://portavita.github.io/2018-07-31-blog_influxdb_vs_postgresql) the only real benefit influx has over postgres (using time-based indices) is space used. PostgreSQL is more performant, with time indices. Why would one use influx-db over postgresql, then? I heard from s...

                                  According to [this article](https://portavita.github.io/2018-07-31-blog_influxdb_vs_postgresql)  the only real benefit influx has over postgres (using time-based indices) is space used. PostgreSQL is more performant, with time indices.

Why would one use influx-db over postgresql, then?

I heard from someone that aggregate functions like avg/count are faster in influx, but couldn't find any results online backing this up.

Tobi Akinyemi (193 rep)

Sep 17, 2020, 01:46 PM • Last activity: Sep 17, 2020, 02:18 PM

1 votes

0 answers

357 views

Influx:Show field keys based on tag values (or the other way)

fields influx-db

I have an influx database with one tag key (called machine) and multiple field key (called cpu and memory). I want to be able to find all unique tag values for each field key or filled field keys for a given tag value. Here is an example of database: ``` SELECT * FROM "myMeasurement" name: myMeasure...

SELECT * FROM "myMeasurement"

name: myMeasurement
-------------------
time                   machine     cpu     memory
2020-01-01T00:00:00Z   m1          51
2020-01-01T01:00:00Z   m2                  2048
2020-01-01T00:00:00Z   m1          52
2020-01-01T01:00:00Z   m2                  2054

I would like to retrieve **m1** when requesting machine containing **cpu** values and/or **cpu** when requesting not empty field keys for machine **m1**. Thanks

Pierre S. (111 rep)

Aug 5, 2020, 10:52 AM

5 votes

1 answers

557 views

Store regular MySQL query results in a timeseries DB

mysql influx-db

I'm looking into installing a timeseries database like InfluxDB or Prometheus to handle data for our Grafana monitoring system. One of the things I like to do is run a couple of MySQL queries every few minutes to gather business metrics (like number of subscribers or application usage metrics). Docu...

                                  I'm looking into installing a timeseries database like InfluxDB or Prometheus to handle data for our Grafana monitoring system.

One of the things I like to do is run a couple of MySQL queries every few minutes to gather business metrics (like number of subscribers or application usage metrics).

Documentation for these timeseries databases feature plenty of examples of how to get MySQL performance metrics into their stores but I can't find any that demonstrates how to store data from queries.

Google search results seem to be completely drowned out by the more typical use case of gathering MySQL performance metrics.

Is this at all possible with either InfluxDB or Prometheus and where can I find information on how to set this up?

Martijn - DMARC Analyzer (51 rep)

Nov 29, 2018, 09:14 AM • Last activity: Jul 16, 2020, 11:57 PM

1 votes

1 answers

181 views

Influx - Help to choose in between Tag and Field

influx-db

I have the following columns: '**datetime**', - which gives the time when the sensor values are recorded. '**machineID**', - which gives the machine ID of the machine from where the sensor values are recorded. '**volt**', '**rotate**', '**pressure**', '**vibration**' - 4 different sensor values meas...

                                  I have the following columns:

'**datetime**', - which gives the time when the sensor values are recorded.

'**machineID**', - which gives the machine ID of the machine from where the sensor values are recorded.

'**volt**', '**rotate**', '**pressure**', '**vibration**' - 4 different sensor values measured for a particular Machine at a time.

Please let me know if I need to name datetime and MachineID coulumns under tag or fields.

Thank you in advance.

JustAnotherDataEnthusiast (113 rep)

Aug 30, 2019, 02:15 PM • Last activity: Apr 30, 2020, 02:18 PM

3 votes

1 answers

974 views

Should I use prometheus or influxdb

influx-db

I know this has been asked a lot of times and I've checked almost all of the links out there but just want to have a third-party opinion on our use case. So I'm working for an IOT company and we already have prometheus installed on our servers. Primarily we use prometheus to monitor the server and a...

                                  I know this has been asked a lot of times and I've checked almost all of the links out there but just want to have a third-party opinion on our use case.

So I'm working for an IOT company and we already have prometheus installed on our servers. Primarily we use prometheus to monitor the server and app's health and some sensor-related data. Now we have a new feature where we need to save the data when an IOT device changes location. 

I did a bit of reading on prometheus and read about a line on their comparison page: https://prometheus.io/docs/introduction/comparison/#prometheus-vs-influxdb 

> Where InfluxDB is better:
>
> - If you're doing event logging.
> - Commercial option offers clustering for InfluxDB, which is also better for long term data storage.
> - Eventually consistent view of data between replicas.

So did a bit of reading about influxdb and it seems the right tool for our use case. Was wondering if we can just continue using prometheus as our time-series database rather than adding another tool which we need to maintain in the future. Or does it make sense to use a proper time-series database not just for monitoring (as I believe what the original intention of creating prometheus was).
                                

mateeyow (179 rep)

Mar 19, 2020, 04:39 AM • Last activity: Apr 28, 2020, 07:49 PM

0 votes

2 answers

2409 views

What is the equivalent of SELECT <certain columns> in Flux Query Language?

influx-db

What would be equivalent flux query for `SELECT address, name FROM addresses` ? (I am referring to FluxQL, the new query language developed by InfluxData) I didn't find a clear answer to this in the limited Flux Documentation present. Flux documentation says that `filter()` function is the equivalen...

                                  What would be equivalent flux query for SELECT address, name FROM addresses ? (I am referring to FluxQL, the new query language developed by InfluxData)

I didn't find a clear answer to this in the limited Flux Documentation present. Flux documentation says that filter() function is the equivalent of both SELECT and WHERE clauses, but all examples given are equivalent of WHERE clauses, nothing on SELECT.

Edit: An answer posted below gives the equivalent query in InfluxQL, which is not what I am asking for. Yeah I know, FluxQL, InfluxQL, it can get confusing.

These are the documentation for FluxQL for better reference:

https://docs.influxdata.com/flux/v0.50/introduction/getting-started 

https://v2.docs.influxdata.com/v2.0/query-data/get-started/

Sushovan Mandal (101 rep)

Mar 9, 2020, 11:11 AM • Last activity: Mar 16, 2020, 08:17 AM

1 votes

1 answers

297 views

InfluxDB Downsampling

influx-db

I have a InfluxDB that is limited to 40 GB, it's being filled by AppMetrics .NET integration performance metrics on each event in the code, usually hundreds of events per second. My problem is that this setup was running for a month and now has reached a state where it is at the limit and no new dat...

                                  I have a InfluxDB that is limited to 40 GB, it's being filled by AppMetrics .NET integration performance metrics on each event in the code, usually hundreds of events per second.
My problem is that this setup was running for a month and now has reached a state where it is at the limit and no new data/snapshots can be created.
My question is there a way to process the DB in a way only 15 min aggregations are left for data older than 1 day?
                                

Daniel Johns (23 rep)

Mar 5, 2019, 09:27 AM • Last activity: Mar 5, 2020, 07:01 PM

2 votes

1 answers

769 views

InfluxDB batch insert only inserts the last line

insert csv influx-db

I'm trying to insert some lines into a InfluxDB instance (running on Windows 7 x64) from both a .txt and a .csv file, but it's only inserting the last line on the file. I'm currently using the Insomnia interface to send the file through Influx's HTTP API. Example: I have the following content inside a data.txt file: any,mytag=ab value=59 any,mytag=ab value=78 any,mytag=ab value=102 All lines are ending with LF only (verifying it through N++). Then I send this file via an HTTP request to my running InfluxDB instance:

POST http://localhost:8086/write?db=mydb&precision=ns

Yet, when I do a select * from "any" query, it shows that only the last line (value=102) was inserted. I've also tried inserting through text/plain content but to no success as well. Any idea on this? Thanks.

Segmentation Fault (41 rep)

Apr 10, 2019, 09:45 PM • Last activity: Dec 23, 2019, 01:02 PM

Showing page 1 of 20 total questions