During last week's Sapphire conference, Hasso Plattner took the opportunity to address a number of myths about HANA - but also clarified the data load of HANA to the audience.
Next the most frequently used tables get loaded into memory - Plattner said like the other ones [he meant vendors] - it's not clear what determines the frequency at first time load into HANA - but that's no rocket science to get done.
Once working in HANA starts, the system loads (and that was news to me) dynamically the needed columns into memory (again from disk or SSD permanent storage). So far I thought and misunderstood all for the application running on top of HANA needed to be loaded to RAM. This begs now a whole set of performance questions.
So only the columns that are needed are in RAM as Plattner emphatically stated.
A column can only be completely in memory - or not at all - there are no caching, intermediate state for columns. As Platner said the algorithm is pretty primitive- if you are not used - you are not in memory
In the below slide - the red ones [columns] never made it - said Plattner:
Plattner made the point, that this is the difference to a row store - and with SAP's order line having over 500+ fields, he claimed that you cannot achieve a similar compression with a row store. And fields that are not used - are not even in the HANA format store on disk / HDD.
Plattner was also very clear, that the state of a HANA column is binary - either its in memory or it's in the HANA store. But then the slide above states that during the 2nd request [of the same data - my assumption and edition] - the data is near 100% in memory. These two statements contradict each other - unless the 2nd query is chasing the very first one, that's loading data from the HANA store to memory - but that would be a border situation.
Both questions are important - because they would point to some sort of elasticity of the RAM usage by HANA.
Normally the opposite should be the case - the more expensive the resource, the more important it's efficient utilization is, which in a cloud infrastructure is driven by elasticity.
But that's what I know so far, and SAP is a company with many smart engineers, I am sure they know this and can address this and make HANA much more elastic than what it is and / or looks like - today.
P.S.Since I got up on @JBecher's official guidline that HANA is all caps - it's all caps from this post going forward - forgive the typing like Hana - earlier.]
How does data get into HANA
Well first data gets transferred from the source systems into a column store - to the HANA format (I will call this - lacking a name from SAP right now - the HANA store) - which gets a roughly 5-10x compression - and the storage medium is whatever the storage vendors give us (Plattner) - disk or SSD.Hasso Plattner during the Sapphire 2013 keynote |
Next the most frequently used tables get loaded into memory - Plattner said like the other ones [he meant vendors] - it's not clear what determines the frequency at first time load into HANA - but that's no rocket science to get done.
Slide from keynote |
Once working in HANA starts, the system loads (and that was news to me) dynamically the needed columns into memory (again from disk or SSD permanent storage). So far I thought and misunderstood all for the application running on top of HANA needed to be loaded to RAM. This begs now a whole set of performance questions.
So only the columns that are needed are in RAM as Plattner emphatically stated.
Slide from keynote |
A column can only be completely in memory - or not at all - there are no caching, intermediate state for columns. As Platner said the algorithm is pretty primitive- if you are not used - you are not in memory
In the below slide - the red ones [columns] never made it - said Plattner:
Slide from keynote |
Plattner made the point, that this is the difference to a row store - and with SAP's order line having over 500+ fields, he claimed that you cannot achieve a similar compression with a row store. And fields that are not used - are not even in the HANA format store on disk / HDD.
Question remaining open
The slides above state that columns stay in memory till the system is re-started or the columns are purged. Plattner did not elaborate on the purging mechanism, which would free up memory, that would be available for loading further data as needed, from the HANA store.Plattner was also very clear, that the state of a HANA column is binary - either its in memory or it's in the HANA store. But then the slide above states that during the 2nd request [of the same data - my assumption and edition] - the data is near 100% in memory. These two statements contradict each other - unless the 2nd query is chasing the very first one, that's loading data from the HANA store to memory - but that would be a border situation.
Both questions are important - because they would point to some sort of elasticity of the RAM usage by HANA.
Why elasticity matters
Well for starters it's one of the key definitions for cloud - based on NIST. It really matters as it allows the scale able provisioning of computing power - both for ramp up and ramp down. And with that it determines the TCO of a cloud infrastructure. An un-elastic cloud system - as such we have to regard HANA at this point, will be very expensive to operate. Add to that, that HANA by design needs to run in the most expensive storage medium out there, RAM - it raises TCO even further.Normally the opposite should be the case - the more expensive the resource, the more important it's efficient utilization is, which in a cloud infrastructure is driven by elasticity.
The AWS argument
Plattner at an early point in the keynote said, that for the ones who believe HANA cannot be elastic, look at HANA One, which runs on AWS and it's elastic there. True, but due to the AWS infrastructure, that manages the HANA One AMI. It's nothing in the SAP code that makes HANA One elastic. And hence the concerns remain, as there is no diffusion of the concern on how elastic HANA is - by the mention of HANA One.MyPOV
With Plattner slowly lifting the kimono a little more - it's clear that HANA is (benevolently) somewhat elastic - since it will only load into RAM what really matters - only the used and needed columns. But the purge mechanism isn't clear. And even if its inner working gets clarified, it's really more a gigantic cache we are talking about - not an elastic cloud product.But that's what I know so far, and SAP is a company with many smart engineers, I am sure they know this and can address this and make HANA much more elastic than what it is and / or looks like - today.
P.S.Since I got up on @JBecher's official guidline that HANA is all caps - it's all caps from this post going forward - forgive the typing like Hana - earlier.]