Home » RDBMS Server » Performance Tuning » Horizontal vs Vertical data table design (Oracle 10.2G)
icon5.gif  Horizontal vs Vertical data table design [message #433205] Mon, 30 November 2009 11:56 Go to next message
wakula
Messages: 150
Registered: February 2008
Location: Poland
Senior Member
Hello,

I am at the design level of some huge data table and I am not sure which table design would be better :

  • multiple columns (horizontal) and DDL
  • multiple rows (vertical) without DDL
  • other suggestions?

Please, assist.

Expected inserts: ~ 100 000 000 values (numbers) inserted per hour
Expected data: as in example
Expected selects: ~ 30 000 selects like in the example
Database release: Oracle 10.2G

Example source data:
key_column, val_1, val_20, ... , val_500
Note that not all the columns are present. We cannot predict which columns are inserted before we got the data. Data needs to be inserted in many separate bulk loads: ~1000 values per transaction (couple of rows for "horizontal" table).

Table designs on my mind:
CREATE TABLE horizontal
( key_column NUMBER
, val_1      NUMBER
, val_2      NUMBER
, ...
, val_n      NUMBER
);

ALTER TABLE horizontal ADD PRIMARY KEY (key_column);


CREATE TYPE my_parent_type AS OBJECT
( dummy CHAR(1) ) NOT FINAL;

CREATE TYPE my_type UNDER my_parent_type
( val_1      NUMBER
, val_20     NUMBER
, val_500    NUMBER
);

CREATE TABLE ora_objects
( key_column NUMBER
, vals       my_parent_type
);

ALTER TABLE ora_objects ADD PRIMARY KEY (key_column);


CREATE TABLE vertical
( key_column  NUMBER
, value_type  NUMBER
, val         NUMBER
);

ALTER TABLE vertical ADD PRIMARY KEY (key_column, value_type);
...or nested tables (correct me if I am wrong but type should be created for them as well - I have no experience with nested tables).

Example queries:
SELECT
   val_1
 + val_30
FROM horizontal
WHERE key_column IN (:1);

SELECT
   TREAT(val AS my_type).val_1
 + TREAT(val AS my_type).val_30
FROM ora_objects
WHERE key_column IN (:1);

SELECT
   MAX(DECODE(value_type,:1,val,NULL))
 + MAX(DECODE(value_type,:2,val,NULL))
FROM vertical
WHERE key_column IN (:3);
Different queries (like in example above but with formulas different than val_1+val_30) are executed every hour. Every formula would be executed exactly once against given set of values of key_column.
Cardinality of queries: hundreds of different formulas.
Total SQLs per hour: ~ hundreds * 30 000.
SQL output will be stored in the DB table.

My concerns:

  • Queries performance - rows access and data pivoting require additional processing
  • Database size - val_type would be repeated for "vertical" design
  • Multiple NULLs in "horizontal" design - because we fill only some of the columns
  • CREATE TABLE / DROP TABLE which would be executed for every hour - we don't need the data after formula calculation... but we need to remove them at some point


Thanks in advance
Re: Horizontal vs Vertical data table design [message #433811 is a reply to message #433205] Fri, 04 December 2009 06:19 Go to previous messageGo to next message
DBA_SangramKeshari
Messages: 44
Registered: October 2009
Location: Mumbai
Member
Hi Wakula,

I can help you in logically.

Horigental approach leads to big row size and may cause ROW CHAINING. Which means more than one database blocks will be read.
If you will face this kind of issue they you have to go bith bigger size of data blocks. go with this option if the average row size is less than 4kb and more than 1 kb


Vertical approach may cause disk i/o issue. but as per my thinking this will be the best if you fails in rowsize in horigental approach
Re: Horizontal vs Vertical data table design [message #433856 is a reply to message #433811] Fri, 04 December 2009 10:07 Go to previous messageGo to next message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
You only get row chaining when you insert/update a row that is longer than the database block size.

As we're looking at 500 numeric columns, that's 500*(21+1) = 11k, plus a little bit - so a 16k block size will ensure that this never happens.

You might get migration if the columns are updated and become larger than will fit in the block, but the OP doesn't mention whether the data gets updated at all or not.

If your queries are going to select a lot of values from the same row and operate on them, then I'd go with the horizontal model as it' will make the queries simpler.
Re: Horizontal vs Vertical data table design [message #434031 is a reply to message #433205] Sun, 06 December 2009 20:04 Go to previous messageGo to next message
DBA_SangramKeshari
Messages: 44
Registered: October 2009
Location: Mumbai
Member
FYI The default block size in oracle 10g is 8k and as per my experience for most OLPT enviroment 8k is used. We only go for 16k or 32k in case of OLAP a dataware housing enviroment.

Appreciate your reply if I am wrong.
Re: Horizontal vs Vertical data table design [message #434166 is a reply to message #434031] Mon, 07 December 2009 09:36 Go to previous messageGo to next message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
You may well only go for a 16K or 32K block size in those circumstances - I have neither the ability nor the desire to check.

I prefer to look at the individual situation and make a decision based on the requirements of the system in question. In this situation you'd need to look at the data distribution in some detail to make a definite decision about blocksize, but if a high percentage of the columns are populated in general, I'd definitely look at a 16k or 32k blocksize to avoid frequent row-chaining.

Even if the rest of the database were using 8k blocks, there's no reason why you can't create a tablespace for this table that uses 16k blocks.
Re: Horizontal vs Vertical data table design [message #434220 is a reply to message #433205] Mon, 07 December 2009 16:35 Go to previous messageGo to next message
wakula
Messages: 150
Registered: February 2008
Location: Poland
Senior Member
Some additional details about the data to be stored:
  1. We have 3 types of data:
    • 15 types of values - the number should not increase in the future. Usually only a subset of the values will be present.
    • 200 types of values - the number will increase in the future. Usually all of the values are present.
    • 600 types of values - the number will increase in the future. Usually all of the values are present.
  2. Values are usually numeric - I believe that the precision might be decreased from Oracle's default.
  3. Data are going to be stored on NFS inside multiple datafiles and possibly different dataspaces - correct me if I am wrong but the block size might vary.
  4. Queries would often use multiple columns - thus horizontal approach would simplify them. Pivoting the data require additional resource - again horizontal design is the winner. However we can hit the limit of ~1000 columns - then horizontal would be a great looser (re-design of the DB and code).
  5. I would expect that row chaining should not happen for "15 values" type and would happen for sure for the "600 values" type.
Today I have talked with some people who are far more experienced than me and the points were that:
  • Oracle's limit of 1000 columns is a real threat
  • Fragmentation might occur but I should not wory about the performance that much
  • Data pivoting is a real problem - but the vertical design seems to be optimal for "200 values" and "600 values" types

I will keep the thread updated if we make any final decision internally - to share the knowledge with the community. At the same time I am open to any other ideas/suggestions/comments.
Re: Horizontal vs Vertical data table design [message #434249 is a reply to message #434220] Tue, 08 December 2009 02:41 Go to previous message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
Row Chaining isn't something to dismiss - it effectively doubles the read cost for affected rows.

I'd still be tempted to go for the horizontal approach - you can work round the 1000 column limit by creating a second table who's primary key is an Id held on the primary table.

Previous Topic: Buffer pool Hit% more than 100%
Next Topic: Unindexing foreign key
Goto Forum:
  


Current Time: Sat May 18 04:21:48 CDT 2024