64 bit tommorow – Wh/if you’ll have more than 4GB “today”?

PencilGuys, I’ll post more in the following days – stay tuned. Since now your feedback was great, thanks a lot! Till then, a quick question:…

What if in a short amount of time you will have an update for Delphi 2010 (or similar) allowing you to use more than 2/4 GB of RAM?

I mean a unit and/or a memory manager customization which will allow this by using Address Windowing Extensions?

For the ones who don’t know, is a paging technique similar with the 32bit memory extenders from DOS days (EMM386 anyone?)

More technical info is here. I don’t know the compiler / memory manager internals (perhaps it isn’t possible so easily) but is just an idea to discuss.

Also, there are other benefits – this will allow breaking the memory barrier also on 32bit OSes. (Ok, on these OSes you must use the /PAE switch).

So, Vote and Comment!

49 thoughts on “64 bit tommorow – Wh/if you’ll have more than 4GB “today”?

  1. Not that simple to realize. You have to switch pages constantly. AFAIK you cannot have “normal” pointers. If you switch the page all pointers to that page are invalid since the memory itself changes. Only an application that actually knows about this stuff and implements its memory management accordingly can use this feature. No generic solution can be devised for that.

      • A TAWEStream should be easy enough to code. A TAWEList would be more difficult. Because a TList does not store data – it store pointers to data – unless you change the way data are allocated, the data are still allocated using the standard memory manager, you’d need a TList able to manage stored items memory, not only referencing them. It should be able to suballocate pages, keep track which pages use still “alive” data, and data among pages if modifications won’t fit the current available space, data using copy-on-write semantic can be hard to manage into mapped pages, they are better suited at fixed size allocations.

        • Agreed, it’s not straightforward and the reason we haven’t tackled it ourselves (yet).

          Following on the from the points you make, you will also have issues with sort and find operations, both of which would need to bring the AWE “cached” pages into scope / “alive” in order to operate on them. I haven’t even thought about the kind of optimizaed algorithm you’d need to determine which of XGB’s of paged data you are best keeping “alive” at any one time when sorting. You couldn’t bring it all back of course only some portion of the 2GB space you have available. There would be a time penalty, but we are prepared for that and the upside of a large address aware structure more than outweighs the speed penalties, I hope.

          The fixed size allocations is also an issue as you say. Some kind of sub allocation would presumably need to take place in order to maximize the 4K pages.

          As I’ve made clear over many months and years now, this is a real “drop dead” need for us and I’m very actively looking for ways to achieve some form of “in-process” large memory space data access. That activity has picked up once again after the roadmap and the continued far horizon for Delphi 64bit.

          Anyone taking this on and producing a component I believe would readily find a niche market, which would grow up to the time of Delphi’s 64bit release. As a company we would pay well above normal component prices for this – the cost of a full Delphi license would be easily worth it to us at least. Alternatively if Embarcadero released such a set of classes as an upgrade to D2010, we’d be out buying licenses tomorrow. As it is, we won’t be upgrading until Delphi 64 comes along.

          • “will also have issues with sort and find operations”. Unless sorting is done without moving data inside the AWE portion of data (or minimizing that moves). Again find operations are faster if you have some sort of indexes kept in main memory that points to where data are stored. Scanning AWE memory can be a bit more complex.
            “There would be a time penalty”. The time penalty may be low, because AWE unlike the old EMS/XMS does not move data back and forth. It uses the processor pagination engine to map physical memory in the AWE window – which is a virtual address. – it just has to update the pagination table entries and little more.
            IMHO developing a custom solution tailored to actual needs is easier than having to code a general purpose library, but of course if Embarcadero or someone else does the work for you it’s far better…

            • I’m sure you’re right…. I was thinking of the sort / find being expensive if the AWE portion had to be brought back to be examined. Maybe you are thinking that there is something stored alongside the pointers that point to the AWE memory indicating the data sequence of what is stored there. Re-reading your post, I think that is what you are saying. So, yes I’d agree. In my original thinking though if the data stored in those 4K pages contains multiple keys (lots of small records for instance) then you either end up with a lot of “extra” key data stored with your AWE pointers in main memory (taking up more of the 2Gb space), or you need to bring back the AWE page(s) to examine them – if you see what I mean… Given all that is the case, then the finer the granularity of the data stored in AWE pages, the greater pain you are going to have?

              I think we are on the same track here, but you have somewhat better clarity than I do – hence why we’d be happy to pay someone to develop a fairly well defined bespoke solution!

              • WIth AWE you need a data structure outside AWE memory that tells you were data are stored inside AWE. When you call AllocateUserPhysicalPages() it returns a pointer which is a sort of “handle” to the memory, and later it is used by MapUserPhyscalPages to map that memory within the process address space. Thereby to recall that memory later, you need to store that “handle”, and which data are stored in those pages, somewhere in “main” memory – or you won’t be able to access those data again. Ordering may be performed on this structure – accessing the AWE memory when needed to read data but without moving data there – if possible, an “immutable” approach. Searching on smaller “indexes” – being them in main memory or in smaller data structurer stored in AWE helps to avoid many mappings/unmappings of larger data. AFAIK, yes, the finer the data granularity, the more difficult to handle data. For example, you can’t “free” a page until all data there are no longer used, unless you reuse free space or “compact” pages sometimes. The smaller the granularity, the more “fragmentation” can occur, many pages allocated with little data within.
                That’s why IMHO databases takes the best advantages from AWE, because they usually already store their data in “blocks” of some k granularity, usually a multiple of the disks cluster size. and “suballocate” the needed space within those blocks already, and have good ways to reuse free space within those blocks.
                Caching those blocks in AWE memory instead of re-reading them from disk is anyway faster – they just need to keep track where the blocks are and recall them as needed.

                • Yes, totally agree. Thats several more reasons why it is somewhat more than tricky to use as a heap… This conversation has really helped to clarify my thinking, thanks. At the very least it has put me off trying to achieve this, but I still wouldn’t like to rule it out for a developer / company who have a deeper understanding and more skill than I do.

                  On the wider point, I have also pursued the “database” option. Whilst a standard database server would most definately NOT do the job (we do know and use MS SQL pretty heavily), I have now had two seperate conversations, both of which are pointing towards an embedded memory resident AWE cached database. That is one that works inside the host delphi application, but is capable of extending its cache via AWE memory pages. I have not had chance to actively try this as yet, but given both parties have arrived at this independently and whom specialize in this area, I have to give quite heavy credence to the idea. They also have products close to this requirement… You too have pointed towards this, so it has to be my primary avenue of investigation. I am still dubious about the speed achievable, but ever hopeful…

                  Thanks again for all your discussion on this point, it really has been useful.

                  • I followed your discussion till now – and I have the same opinions with Luigi. However, there are some glitches. Depending of nature of your application, there are several solutions here:
                    – write your own DB engine (not really recommended even if I can help you here – good only if you have fixed length data and/or your speed requirements are very high)
                    – use an AWE DB solution like Nexus’s.
                    – use an embedded(able) true 64-bit professional / standard DB engine like Firebird or SQLite. The points bellow applies to Firebird which happens to know the internals in a certain degree:
                    In the last solution IMHO, you’ll have a greater speed overall (again I don’t know the nature of your application) because:
                    – being 64-bit it uses the hardware capabilities at maximum (64-bit registers, no paging etc.)
                    – being a different process and a professional grade DB it can cache all the DB in the memory (more on request – if you’re interested send me an eMail), as well as it can be set up to work only with the memory (no writes on disk). Also it cooperates with OS’s system cache to give the best throughput – but it needs configuring according with your app. Also, an interesting point is: what happens if the data doesn’t fit in the memory? In the AWE case: it crash. Here: it overflows on disk. Slower (when the cache flushes the old, unused data) but it works. I don’t know what do you want.
                    – also it can be configured to flush the data on disk at the end of the session / when the program closes / when you want / when the CPU is idle in order to assure a persistence layer between the sessions (dunno if applicable in your case). And this can be (re)loaded in memory at once when the application starts or incrementally, on demand (like any other DB works) – as you wish. Btw, how quick do you ‘manage’ to fill up 4GB?
                    – it has something called ‘Local Protocol’ which is a very-very-fast way to communicate with the server, exploiting the advantage that we’re locally. (No TCP/IP stacks etc.).
                    – it has transparent compression for rows in order to minimize the memory / disk consumption.
                    – no installation required. (Ok, you need to copy files 🙂 )
                    – single file database
                    – and of course all the other bells & whistles which such a DB can offer (very fast searching / indexing, SQL standards compliance, ACID etc. )

                    If you’re interested to help you more, drop me an eMail.

                    • There’s another option to consider. We need the extra memory, but don’t use a general purpose DB (and refactoring our DB into a form useful with AWE would be icky, I think).

                      Another useful option would be to spawn child processes that act as additional memory stores with some interface that allows access (or copy between, or whatever) between the child process and the parent. Perhaps it could also contain logic for processing the portions of data it holds as appropriate and return the result only across the process boundary.

                      This allows a potential architectural solution that embraces and takes advantage of the segmentation (even to the point of permitting bette utilisation of multi-cores.

                      Just think out load, but would be interested to know if anyone here has tried something similar?

  2. AWE/PAE are technologies that address some class of applications, but not others.
    1) They are useful only on 32 bit systems that supports more than 4GB of RAM. XP and lower server versions do not support more than 4GB of RAM, making AWE useless (but for development).
    2) The application must reserve memory for the “window” area, which in turn is not available for other tasks. Also it has to keep track of where data are, and that’s an overhead – the more small blocks you allocate, the more memory is used to keep track were the small blocks are.
    3) AWE allocates memory by pages – not bytes. IIRC the granularity is 4K. Good for applications that allocate already memory in given chunks and in a predictable way (i.e. a database cache that maps datafiles blocks to memory blocks), less useful for applications allocating memory more randomly. If the application starts to map pages in and out continuosly performance are very bad.
    4) Using /3GB limits usable memory to 16GB., so you have more you have not to use and just use AWE and PAE… but the AWE windows may take precious space in your 2GB memory space.

    The bottom line: AWE and PAE were workarounds when there was no 64 bit Windows operating systems for applications like databases. They were never designed as a general way to access memory outside 32 bit addressing. Now that 64 bit Windows is available, all we need is a 64 bit compiler, not another library to write yesterday applications.
    I used Oracle on Windows systems using AWE and PAE. It was horrible. The solution was Oracle 64 bit on Windows 64 bit.

    • “I used Oracle on Windows systems using AWE and PAE. It was horrible.” – What do you mean by horrible. Performance issues? How it was compared with a plain-vanilla 32bit system?
      “The solution was Oracle 64 bit on Windows 64 bit.” – sure. The question isn’t (I HOPE) if we have an environment for building 64-bit executables, but when.

      • “What do you mean by horrible. Performance issues”. 1) Configuration. It was like been a juggler trying to tune everything, 3GB, /PAE, etc. 2)Performance: it could move the SGA (the data cache) to the “extended memory”, but other important data structures (PGA, etc) had still to fit the non “extended” memory, and when not able to set /3GB because the machine had more than 16GB of RAM everyhting had to fit in 2GB process minus the AWE window. It was not really worth the hassle, but for installation that only required a very large SGA and nothing more.

        • Aha! Something specific to Oracle. Not very relevant to our theme. We all agree that this isn’t meant to replace the full blown 64bit.

          • It is specific to Oracle, but it shows how AWE can be used to store some kind data, but it is useless to store others. AWE *it’s* not a general purpose way to access data beyond the 32 bit address space. Like EMS/XMS, it was useful to store some kind of data for application requiring large buffers/cache or the like, but it can’t be used as a general purpose heap.

            • I guess that is what we are tyring to do though – make AWE a general purpose heap. As you say, it’s not the best use to put it to, but what alternatives do we have right now, short of rewriting applications or moving to a different compiler?

              I’d be open to Lazarus if I felt it was up to the job and we could still have fast MS SQL server access – but Devart, supplier of SDAC, understandably aren’t producing 64bit drivers just to support a very small market segment. I think someone elsewhere mentioned to me that there are 64bit OLEDB drivers available for SQL server, whether these are useable through FPC/Lazarus I don’t know. Even then we’d need very fast access as we ultimately pump several million records back out to SQL via a bulk copy interface. This all happening in a matter of minutes. So forget straight “insert” statements we’d be there for hours…

      • No. AFAIK XP *can’t* support more than 4GB. It is limited to 4GB of *physical memory*, because MS limits it to sell more expensive server editions. You can add the /PAE switch, but it can’t use more than 4GB. MS says:

        “Although support for PAE memory is typically associated with support for more than 4 GB of RAM, PAE can be enabled on Windows XP SP2, Windows Server 2003, and later 32-bit versions of Windows to support hardware-enforced Data Execution Prevention (DEP).”
        But AFAIK not to address more memory.

        See here:
        http://msdn.microsoft.com/en-us/library/aa366778(VS.85).aspx
        http://www.microsoft.com/whdc/system/platform/server/PAE/pae_os.mspx

          • Did you ever see Windows 2003/2008 enterprise licensing? If you just need larger address space for intensive calculations or the like, with AWE you end up spending a lot juts for the OS. Multiply it for the number of machines you use and you get a picture.

            • Sorry but I don’t get it here. While all of us want true 64bit, a true 64bit system supports only a subset from the OSes supported by AWE / PAE (obvious). So why do you imply that the price for AWE is higher than the price for true 64bit? Also, why one should buy licenses for this? AFAIS, the main pressure comes from the fact that the systems are already in place.

              • You’re right. If you run 64 bit xp/vista you get more memory available via AWE – PAE is not supported by 64 bit systems (see http://msdn.microsoft.com/en-us/library/aa366796(VS.85).aspx). It was your mention of PAE leading me to think you were talking about 32 bit systems, not 64 bit ones.
                AWE PAE. AWE is just a way to map memory within a process. PAE is a way to use more memory than those addressable due to the registers size. They can work together or separately.

  3. For me, 64-bit is more important for things like Datasnap 2010 ISAPIs. At the moment you have to switch the application pool in IIS back to 32bit which is not ideal.

  4. BTW: EMS and XMS worked in the 16 bit world too…. in the beginning EMS worked using additional memory boards on 8086 class machines, XMS took advantage of 286 protected mode (although getting the 286 out of protected mode required a soft reset!). Both predates 32 bit memory managers (and extenders).
    IIRC EMM386 took advantage of 386 virtual-86 mode introduced with the 386 to simulate EMS without hardware support.
    Like AWE/PAE, they were workarounds to get more memory in some way waiting for going fully in protected mode and 32 bit… and now peacefully forgotten.

  5. It’s not a matter of memory for me, it’s a matter of being able able to build things like Shell Extensions that interact in a 64bit process space.

  6. I agree with Robert. I don’t need a lot of memory for anything I do, but being cut off from seemless shell access is a huge pain that’s only going to get worse. I’m all for cross platform compilation of servers, but I want 64 bit to be the priority.

    • If this is a possibility then it is certainly interesting.

      I fear however, that the devil is in the details (ie: What is the performance hit? How transparent is it to program code? What OS restrictions does it imply etc).

      • First, let me state (again) that I don’t see this as a definitive solution neither as an excuse for delaying the 64-bit compiler.

        What is the performance hit?
        – it is said to be quite fast.
        How transparent is it to program code?
        – that’s why I said that Embarcadero must do it. In order to give you ready-made abstractions.
        What OS restrictions does it imply?
        Of course this work only on 64bit CPUs.
        – on 32bit OSes – you must add the /PAE switch in OS’s boot.ini – google for details
        – on 64bit OSes – nothing – it works out of the box.

        • “Of course this work only on 64bit CPUs.”. Again, no. PAE was introduced with Pentium Pro – it is available on many 32 bit processors as well. It adds more memory addressing lines, extendig them from 32 to 36 bit, allowing up to 64GB or RAM. The paging mechanism maps the 36 bit address space to the 32 bit address space usable by processes.

  7. EMS was a bad solution everyone avoided. XMS was an slightly less bad solution everyone continued to avoid.

    It was not until flat 32 bit addressing became available that most people and programmers went past the megabyte boundary.

    The same will be true for 64 bit memory. How do I know? Because this page switching technology for 32 bit Windows has been around for years now (PAE – look it up) and no one wants to use it now for exactly the same reason – too much work to use it.

    Easier to just wait for native 64 bit solutions. Not only will they be easier to write and maintain, but any 32 bit kludge will need to be rewritten once true 64bit is mainstream in just a few years more.

    If it was 20 years out, I could see it being worth the effort, but with 64bit hardware and OSes already here and prevalent, and 64 bit compilers available (and no more than a year or two until the compiler of your choice is likely to support it) – it hardly seems like a good time to writting yourself into a dead end.

    Learn from the lessons of the past. Either wait for it to show up, or move to a compiler that already supports it NOW.

    • Me? Sure. But it seems that there are others here which desperately want more memory… And if this is feasible with a reasonable cost (time etc.) then why not? Of course, nobody said that the 64-bit compiler should be canceled. Not even delayed.

    • IIRC EMS was born because Lotus needed more memory to store 1-2-3 data 🙂 XMS worked very much alike but required only a 286 and a driver, and no additional hardware support (although EMS support was added to many motherboards). Both were good to store block sof data, not for random access to small structures.
      The interesting thing is that while Microsoft was very late to exploit processors capabilities in the 80s and part of the 90s (286 protected mode, able to address 16GB of memory, was never really used but by earlier versions of OS/2 and the forgotten Windows/286) and _real_ 32 bit operating systems came when Intel released the Pentium only, the software industry was able to fill somewhat the gaps with memory extenders first, and full DOS extenders (Phar Lap) later. Borland was somewhat late then too, because it deliverd a DOS extender only in Borland Pascal 7, while everything was moving to Windows already.
      MS was much faster to exploit 64 bit hardware, and again Delphi is very late to deliver support for newer but obviously needed technologies.

  8. See my Stack overflow posting (tagged AWE) from a couple of days ago. I have been coming to this conclusion for quite some time now. It has crystalized to this since it became clear that a straight 64bit Delphi compiler is not even coming over the horizon at this point.

    I take on board all the negatives about using the AWE approach – using valuable space in the 2Gb range, speed, complexity, boxing yourself into a corner when 64bit comes along. On the flip side, we are already right in a corner with nowhere to go. We NEED that extra memory space. Short of totally reworking the logic of our server application (we’d be moving to C# for that one), we need a solution to break the /3GB barrier (even 4GB for windows on windows isn’t going to cut it).

    I contacted Micheal Rozlog seperately a couple of days ago on this very point – taking up his offer from a response to a previous posting on this blog. I’ve yet to hear anything, but if Embarcadero would offer some basic support in this area, it would be a massive help to those of us that are desperate for that extra memory space – desipte the pitfalls.

    Seperately I’m also pursuing the non Embarcadero route and seeing what is either already available or can be commercially created. The only tool that I’ve found that comes close at this point is NexusDB AWE edition. That is still an “out of process” solution, but they are willing to discuss licensing for supplying libraries to make it in process as per their embedded solution. This would provide a very generic data storage solution, in memory, but I also fear that being a full on database solution it will be comparatively slow when viewed against structures such as dedicated TCollections etc…

    So for those of us that need it, AWE may be the only way to hang on to the Delphi solution until 64 bit arrives. We deal with the fallout at that point.

    A drowning man will grab at anything to stay afloat – even a shark….

    • Yep, this is exactly the scenario for this feature now. I really think that they should break the rules and issue an update, in order to avoid those “totally reworking the logic of our server”, “boxing in a corner” and such. Btw, what structures do you want to support AWE? You’re ok with a thin abstraction supporting generics? (like TList). Your data structure (list, tree etc.) has inside objects or just basic data types? (strings, integers etc.)

      • We’d take what we can get and make it fit our process if at all possible. We do have objects withing objects i.e. several levels of data nesting, but those “objects” are simple data containers i.e. integers / date times / extended etc… We switched away from largely TCollection derivatives a year or so ago, in favour of another component / record based structure – thus reducing our memory footprint by upto 30%. We are again constantly smashing our head on the 3GB limit though, so need to expand into AWE (64bit of course prefereable). So our data structure requirements are quite simple – lots of time phased data (date / value pairs) attached to data type entities (sales / capacity / forecast / demand etc) which are in turn attached to “nodes” (items / products etc..). So that gives 3 levels of depth. In other areas we may have more depth, but the concept remains the same. It is just about data storage (sorting , finding [keyed and sequential], inserting and deleting). So in a nutshell, something you could hold in a TList derived entity.

    • Why?

      The things are very independent one to another. Or perhaps do you want to write 64-bit UDFs for Interbase?
      FTR, we use with our Delphi applications (hence 32-bit) the 64-bit versions of Firebird (for example) where the Server OS permits.

  9. What i understand of AWE is that you can allocate physical memory above the 4Go but you have to map it in the 4Go of the virtual space of process to use it. So it’s very hard to be used in a user memory manager without the help of hardware (cpu page fault). That is much an option for the OS memory manager; but it could be used in a user process to store large buffer of memory.

    looks here :
    http://blogs.mssqltips.com/blogs/chadboyd/archive/2007/10/15/pae-and-3gb-and-awe-oh-my.aspx

  10. I need to create 64-bit DLL’s using Delphi so they can be used by 64-bit programs written in Visual C++ or Visual Basic.

    I don’t need more than 2GB RAM, but I need 64-bit binaries mainly for compatibility with 64-bit applications.

    Simply put, a 32-bit DLL doesn’t load into the same process of a 64-bit program. This breaks things for me.

    Today, I can run Delphi 7 or Delphi 2009 programs in Linux using WINE 1.1.29. Will native Delphi 2011 run any faster or more reliably than WINE 2.x next year? I certainly hope so if 64-bit is delayed.

  11. You may look at http://store.steampowered.com/hwsurvey/ to check how many gamers (!) use 64 bit operating systems.
    Really 64 bit support is needed only by scientific software, some server software, and compatibility software/dlls like user Rich reported.

    So, percentage of developers who will use 64 bit is small among other Delphi programmers. For the 95% of software they write none 64 bit support is needed, but even 2nd core of 2x processors can’t be used or doesn’t make sense.

    This is my opinion for now, and yes, things are changing. But, for example, I predict that 64bit OS usage percent will grow as it goes, and current Vista 64 bit installations will be changed to Windows 7 64bit.
    Gamers (see Steam stats above) are not enterprise clients, it is very hard to upgrade OS for the enterprise, so in this sector 64 bit OSes will grow very slow.

    • You’re wrong. Gamers will move very fast to 64 bit because of video cards memory sizes. The larger the card memory size, the larger the address space the OS must set aside to map the video memory. That means if you have a 1 GB memory card, you have 4GB of memory on a 32 bit desktop system, you end up with 1GB of wasted memory. Add two cards, and you need another 1GB….

      As the memory gets cheaper and cheaper, more and more systems will have 64 bit OSes to break the 4GB barrier for desktop systems. Then even utilities may need to become 64 bit.

      ” but even 2nd core of 2x processors can’t be used or doesn’t make sense.” There are more operations that could be run on separate threads that most Delphi developers implement, because it requires proper skills – and some good libraries. Why most database queries aren’t run on a separate thread, for example?

      “some server software” this is usually an area that pays well – why should Delphi developers unable to target it?

  12. Pingback: The Memory Barrier: Poll results, Comments, Solutions. « Wings of Wind Software

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s