Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

1 answers

45 views

Database / Search-Index recommendation: Match time ranges from different categories

query-performance index query elasticsearch solr

* 300k+ videos * 10+ millions of markers, pointing to timeranges in videos ```json { "markerCategory": "something", "markerDesc": "something-more-specific", "frameIn": 120001, "frameOut": 140002 }, { "markerCategory": "something-else", "markerDesc": "something-else-more-specific", "frameIn": 130001,...

* 300k+ videos * 10+ millions of markers, pointing to timeranges in videos

{
  "markerCategory": "something",
  "markerDesc": "something-more-specific",
  "frameIn": 120001,
  "frameOut": 140002
},
{
  "markerCategory": "something-else",
  "markerDesc": "something-else-more-specific",
  "frameIn": 130001,
  "frameOut": 135002
}

Any Suggestions which database / search-index would perform best, when searching for something along these lines: > Videos having events of category A __AND__ category B in overlapping timeranges, sorted by amount of covered time Videos are currently exported from some proprietary relational database and stored in an Apache SOLR instance for searching. * Is there a specific name for those kind of queries ("inverted range queries" or some thing like that...) ? * Any Suggestions which technology would perform best, for those types of queries? I was thinking maybe elasticsearch?

gherkins (103 rep)

Mar 6, 2025, 06:45 AM • Last activity: Mar 6, 2025, 01:08 PM

-1 votes

1 answers

29 views

Indexing system vs raw DMBS connection

mysql index solr

I have created a new dataset that has 11,000,000+ rows with 4 pivot tables in MySQL. The tables are not that deep, only 6 - 12 cols per. My question is I've set up the Lucene Apache SOLR to index the data, and it work's great for searching .. BUT I haven't noticed a dramatic load time in searches. I...

                                  I have created a new dataset that has 11,000,000+ rows with 4 pivot tables in MySQL.  The tables are not that deep, only 6 - 12 cols per.

My question is I've set up the Lucene Apache SOLR to index the data, and it work's great for searching .. BUT

I haven't noticed a dramatic load time in searches.  If I run a raw SQL command, the difference between the two is negligible.  At what point is it more beneficial to use a indexing engine vs scripting a raw query yourself?

Now granted I have the ability to set "weight" and all that via SOLR .. But my impression was that I would have dramatically reduced overhead on the server ...  Is it just that my dataset isn't large enough / complicated enough to illicit these finding?  Other than the obvious advantages being weight, rank and sort/filter -- What is the overhead advantage of using a indexing system such as SOLR?

Zak (113 rep)

Sep 8, 2023, 08:31 PM • Last activity: Sep 8, 2023, 10:17 PM

2 votes

0 answers

52 views

List solr replica properties

solr

I'm looking for a way to list all properties on a replica. I checked this page: https://lucene.apache.org/solr/guide/7_6/collections-api.html But no such thing is mentioned there. Only adding, editing or deleting existing properties, but no way of listing all of them. How can I find out which proper...

                                  I'm looking for a way to list all properties on a replica.

I checked this page: https://lucene.apache.org/solr/guide/7_6/collections-api.html 

But no such thing is mentioned there. Only adding, editing or deleting existing properties, but no way of listing all of them.

How can I find out which properties have already been set on a replica?

EDIT: Found them in zookeeper.

Use command:

    zkCli.sh get /solr/collections//state.json

KdgDev (211 rep)

Nov 1, 2019, 03:42 PM • Last activity: Nov 4, 2019, 09:23 AM

0 votes

1 answers

2430 views

WCS8 Solr Preprocess failed : Error SQLCODE=-601, SQLSTATE=42710, SQLERRMC=WCSSTGAP.XI_CATENTRY_PRICE_0

db2 db2-luw db2-10.5 solr

Our Solr Pro-process is failing with below error: > [2018/08/28 14:31:36:104 CDT] 00000001 W > com.ibm.commerce.foundation.dataimport.preprocess.AbstractDataPreProcessor:createDBTable(Connection, > String, String) create XI_CATENTRY_PRICE_0 with error: DB2 SQL Error: > SQLCODE=-601, SQLSTATE=42710,...

                                  Our Solr Pro-process is failing with below error:

> [2018/08/28 14:31:36:104 CDT] 00000001 W
> com.ibm.commerce.foundation.dataimport.preprocess.AbstractDataPreProcessor:createDBTable(Connection,
> String, String) create XI_CATENTRY_PRICE_0 with error: DB2 SQL Error:
> SQLCODE=-601, SQLSTATE=42710,
> SQLERRMC=WCSSTGAP.XI_CATENTRY_PRICE_0;TABLE, DRIVER=4.19.49

I suspect that DROP table XI_CATENTRY_PRICE_0 is failing and ultimately causing the creation of the existing table.

How could I resolve this?
                                

Prardhan (1 rep)

Aug 29, 2018, 11:36 PM • Last activity: Jun 13, 2019, 01:02 PM

4 votes

2 answers

2988 views

How we can do full text search and facets stuffs on BigQuery?

full-text-search rdbms elasticsearch google-bigquery solr

In the future, we will have millions ofs record in our web product. So we had used BigQuery for data storage and analysis. We have to build filters on our search page like Flipkart and Amazon provide product filters. Basically, we need different filters on left side of our search page. With help of...

                                  In the future, we will have millions ofs record in our web product. So we had used BigQuery for data storage and analysis. We have to build filters on our search page like Flipkart and Amazon provide product filters.

Basically, we need different filters on left side of our search page. With help of these filters, we can see our desired results. Every filters have their count, i.e., record counts in that category/term. So we have to build queries in such way that they give result and count of every category(term). Here, "category" means different types filter on different columns as we see on e-commerce sites, e.g., Flipkart, Amazon etc.

> Faceted search (also called faceted navigation, guided navigation, or parametric search) breaks up search results into multiple categories, typically showing counts for each, and allows the user to "drill down" or further restrict their search results based on those facets.

Is there any framework/plugin available like Solr which can be used with BigQuery to provide above desired functionality?

Is BigQuery not suitable for above purpose? Do we need to stick with any RDBMS database (e.g., PostgreSQL, MySQL, etc.) and search engine (e.g., Solr, Elasticsearch, etc.) for this purpose?

Lalit Kumar Maurya (143 rep)

Jun 13, 2017, 04:22 AM • Last activity: Apr 8, 2019, 11:36 PM

2 votes

2 answers

297 views

SQL performances of selecting recently modified records Solr

sql-server sql-server-2016 solr

I have a SQL Server 2916 database, and Solr. The Solr indexer runs often such queries: select book_collection from books group by book_collection having max(updated_on) > '2017-04-04 09:50:05' The column updated_on is updated by a trigger on insert/update; at every run of the Solr incremental indexe...

                                  I have a SQL Server 2916 database, and Solr.
The Solr indexer runs often such queries:

    select book_collection from books group by book_collection having max(updated_on) > '2017-04-04 09:50:05'

The column updated_on is updated by a trigger on insert/update; at every run of the Solr incremental indexer (every 10 minutes), many queries like the above get the latest modified records and reindexes them.
This table, for instance, has about one million rows and at every time the query would return 10-20 rows maximum.
These queries end up in the list of the most expensive queries run on the database, so I would like to optimize them.
My questions:

1) Would a timestamp column perform better than a datetime column?

2) If I changed the query like this, would it be more efficient?

    select distinct book_collection from books where updated_on > '2017-04-04 09:50:05'

The first query plan is the original one, the second is modified by me. The fact that the second plan asks for an index, and the first one doesn't, really suggests that the modified query can use an index, if present, and is therefore better.

carlo.borreo (1477 rep)

Apr 4, 2017, 08:53 AM • Last activity: Apr 4, 2017, 04:10 PM

1 votes

0 answers

70 views

Where do custom Solr schemas go in Riak?

schema ubuntu riak solr

I have checked these two pages in the Riak docs http://docs.basho.com/riak/kv/2.2.0/developing/usage/search/ http://docs.basho.com/riak/kv/2.2.0/developing/usage/search-schemas/ and they tell how to make a custom schema and how to attach it to an index, but they seem to expect the schema to be in a...

                                  I have checked these two pages in the Riak docs 
http://docs.basho.com/riak/kv/2.2.0/developing/usage/search/  
http://docs.basho.com/riak/kv/2.2.0/developing/usage/search-schemas/  

and they tell how to make a custom schema and how to attach it to an index, but they seem to expect the schema to be in a certain folder to do so. Whenever I try adding the schema through curl like this:

    curl -XPUT http://localhost:8098/search/schema/users \
        -H 'Content-Type:application/xml' \
        --data-binary @user_schema.xml

it throws an error saying it isn't found. I've tried inserting the path too. I've found solutions online saying that it's somewhere in the data folder, but those were back in 2015 and I'm using version 2.2 of Riak now. Has anyone else recently had success with adding a custom solr schema to Riak?

PS - I'm using ubuntu/trusty64 in a vm
                                

AndrewK (11 rep)

Dec 29, 2016, 03:07 AM

2 votes

1 answers

231 views

DSE Cassandra decimal type

cassandra solr

I generated solrschema with `dsetool` on a table name `seriesdata` (its key is composed by `seriesmetadata_id`, `initialtime`; two decimal field). These fields are defined as `DecimalStrField` but they are numeric valued (I have to execute range selection). I tried to change schema defining these fi...

                                  I generated solrschema with dsetool on a table name seriesdata (its key is composed by seriesmetadata_id, initialtime; two decimal field).
These fields are defined as DecimalStrField but they are numeric valued (I have to execute range selection). I tried to change schema defining these fields as:

    

but I receive this error:

>[cassandra@bigdatalin-03 ~]$ dsetool reload_core timeseriesks.seriesdata schema=schema_data.xml solrconfig=solr_config_data.xml reindex=true -l cassandra -p cassandra
    org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: com.datastax.bdp.search.solr.CassandraIndexSchema$ValidationException: Mismatch between Solr key field [seriesmetadata_id] with type {TrieLongField{class=org.apache.solr.schema.TrieLongField,analyzer=org.apache.solr.schema.FieldType$DefaultAnalyzer,args={class=org.apache.solr.schema.TrieLongField}}] and Cassandra key alias [seriesmetadata_id] with type [decimal]
            at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:665)
            at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:303)
            at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:294)
            at com.datastax.bdp.tools.SearchDseToolCommands.createOrReloadCore(SearchDseToolCommands.java:559)
            at com.datastax.bdp.tools.SearchDseToolCommands.access$200(SearchDseToolCommands.java:59)
            at com.datastax.bdp.tools.SearchDseToolCommands$ReloadCore.execute(SearchDseToolCommands.java:209)
            at com.datastax.bdp.tools.DseTool.run(DseTool.java:126)
            at com.datastax.bdp.tools.DseTool.run(DseTool.java:51)
            at com.datastax.bdp.tools.DseTool.main(DseTool.java:186)

Why?
Thanks a lot!

                                

afmulone (65 rep)

May 6, 2016, 02:41 PM • Last activity: May 10, 2016, 02:43 PM

2 votes

0 answers

190 views

Solr with SQL Server or NoSQL

sql-server mongodb nosql solr

I am working on a POC of using Solr with SQL Server. I have a very complex data model with SQL server which required a bunch of joining and scalar functions to strip some mark up and other stuff. This is turning out to be performance bottleneck. To address this issue, we have considered NoSQL (MogoD...

                                  I am working on a POC of using Solr with SQL Server.

I have a very complex data model with SQL server which required a bunch of joining and scalar functions to strip some mark up and other stuff.

This is turning out to be performance bottleneck. To address this issue, we have considered NoSQL (MogoDB) or Solr with SQL Server as our options.

Using MongoDB, we attach a replication events for all CRUD operations which will carry over the data to MongoDB as well after successful insert, update, delete on SQL Server. And when we have to perform a search we do the search directly on Mongo Collections.

**This sounds very cool as we have 32 tables joining in this search, which can convert to 2 collections in MongoDB**  

On the other note we are also exploring our options using Solr with SQL Server with DataImport

My concern is, based on this article http://www.codewrecks.com/blog/index.php/2013/04/29/loading-data-from-sql-server-to-solr-with-a-data-import-handler/  I have to do a import for each entity

 - How does the joining search works with Solr? Should I import each table from SQL to Solr and then write a join login against Solr APIs 
 - Can I Import Multiple Entities at once? Should I create a view for expected result set (denormalized) and import that view to Solr?
 - Will these imports have to be done on regular intervals? After a import
  if there is new data, does Solr API reflects that change? or I have
   to do a import first and then do the search against Solr API

Finally can I compare Solr with MongoDB, if anyone has done this kind of evaluation please share your thoughts.
                                

HaBo (191 rep)

Apr 25, 2016, 05:47 AM • Last activity: Apr 25, 2016, 06:53 AM

1 votes

2 answers

853 views

SQL Server vs SOLR (Or any document db)

sql-server nosql denormalization document-oriented solr

I have my "customer" data in a normalized sql server database. Getting out the customer data in my app is taking too long. This is because I have to go to 10+ tables to get all the data I need. My company has an installation of SOLR that I thought about storing a Json object that contains all the da...

                                  I have my "customer" data in a normalized sql server database.

Getting out the customer data in my app is taking too long.  This is because I have to go to 10+ tables to get all the data I need.

My company has an installation of SOLR that I thought about storing a Json object that contains all the data I need for a single "customer" already put together.

I think that this would give me some significant speed improvements.

However, it got me to wondering what the difference would be to me putting this data in a single table with a varchar(max) column that has the Json in it.  I could index my 10ish searchable columns on the same table as the json.

I know that document databases are very popular.  **So I imagine there has to be benefits over just rolling my own using denormalized data in sql server.  Can someone tell me what they are?**

Vaccano (2550 rep)

Apr 7, 2015, 10:10 PM • Last activity: Feb 8, 2016, 04:08 PM

Showing page 1 of 10 total questions