Organizational Research By

Surprising Reserch Topic

erlang and its consumption of heap memory


erlang and its consumption of heap memory  using -'mysql,linux,oracle,erlang,solaris'

I have been running a highly concurrent application on my HP Proliant Servers. The application is a file system indexer i coded in erlang. It spawns a process per Folder it finds on the file system and records all file paths in a fragmented Mnesia Database. (Database consists of disc_only_copies type of tables and a screen shot of its file system can be viewed here.)

The Snippet of code that does the high intensive job of going through the file system is shown below:



%%% -------- COPYRIGHT NOTICE --------------------------------------------------------------------
%% @author Muzaaya Joshua, <joshmuza@gmail.com> [http://joshanderlang.blogspot.com]
%% @version 1.0 free software, but modification prohibited
%% @copyright Muzaaya Joshua (file_scavenger-1.0) 2011 - 2012 . All rights reserved
%% @reference <a href="http://www.erlang.org">OpenSource Erlang WebSite</a>
%%
%%% ---------------- EDOC INTRODUCTION TO THE MODULE ----------------------------------------------
%% @doc This module provides the low level APIs for reading, writing,
%% searching, joining and moving within directories.The module implementation
%% took place on @date at @time.
%% @end

-module(file_scavenger_utilities).

%%% ------- EXPORTS -------------------------------------------------------------------------------
-compile(export_all).

%%% ------- INCLUDES -----------------------------------------------------------------------------

%%% -------- MACROS ------------------------------------------------------------------------------
-define(IS_FOLDER(X),filelib:is_dir(X)).
-define(IS_FILE(X),filelib:is_file(X)).
-define(FAILED_TO_LIST_DIR(X),error_logger:error_report(["*** File Scavenger Utilities Error ***** ",{error,"Failed to List Directory"},{directory,X}])).
-define(NOT_DIR(X),error_logger:error_report(["*** File Scavenger Utilities Error ***** ",{error,"Not a Directory"},{alleged,X}])).
-define(NOT_FILE(X),error_logger:error_report(["*** File Scavenger Utilities Error ***** ",{error,"Not a File"},{alleged,X}])).
%%%--------- TYPES -------------------------------------------------------------------------------

%% @type dir() = string().
%%  Must be containing forward slashes, not back slashes. Must not end with a slash
%%  after the exact directory.e.g this is wrong: "C:/Program Files/SomeDirectory/"
%%  but this is right: "C:/Program Files/SomeDirectory"
%% @type file_path() = string().
%%  Must be containing forward slashes, not back slashes.
%%  Should include the file extension as well e.g "C:/Program Files/SomeFile.pdf"

%% -----------------------------------------------------------------------------------------------
%% @doc Enters a directory and executes the fun ForEachFileFound/2 for each file it finds
%% If it finds a directory, it executes the fun %% ForEachDirFound/2.
%% Both funs above take the parent Dir as the first Argument. Then, it will spawn an
%% erlang process that will spread the found Directory too in the same way as the parent directory
%% was spread. The process of spreading goes on and on until every File (wether its in a nested
%% Directory) is registered by its full path.
%% @end
%%
%% @spec spread_directory(dir(),dir(),funtion(),function())-> ok.

spread_directory(Dir,Top_Directory,ForEachFileFound,ForEachDirFound) when is_function(ForEachFileFound),is_function(ForEachDirFound) ->
    case ?IS_FOLDER(Dir) of
        false -> ?NOT_DIR(Dir);
        true ->
            F = fun(X)->
                    FileOrDir = filename:absname_join(Dir,X),
                    case ?IS_FOLDER(FileOrDir) of
                        true ->
                            (catch ForEachDirFound(Top_Directory,FileOrDir)),
                            spawn(fun() -> ?MODULE:spread_directory(FileOrDir,Top_Directory,ForEachFileFound,ForEachDirFound) end);
                        false ->
                            case ?IS_FILE(FileOrDir) of
                                false -> {error,not_a_file,FileOrDir};
                                true -> (catch ForEachFileFound(Top_Directory,FileOrDir))
                            end
                    end
                end,
            case file:list_dir(Dir) of      
                {error,_} -> ?FAILED_TO_LIST_DIR(Dir);
                {ok,List} -> lists:foreach(F,List)
            end
    end.    



The function spread_directory/4 is generic in a way that it takes two funs. One fun: ForEachFileFound/2 takes along with the Top Most Directory, the found file and does anything with it and the other fun: ForEachDirFound/2 takes along with the Top Most Directory, the folder it finds and uses it in any way it wants.

The start script i use for this application makes sure that erlang will be able to spawn as many processes as possible. Once a process finishes indexing a folder it exits.


#!/usr/bin/env sh
echo "Starting File Scavenger System. Layer 1 on the P2P File Sharing System....."
erl \
    -name file_scavenger@127.0.0.1 \
    +P 13421779 \
    -pa ./ebin ./lib/*/ebin ./include \
    -mnesia dir '"./database"' \
    -mnesia dump_log_write_threshold 10000 \
    -eval "application:load(file_scavenger)" \
    -eval "application:start(file_scavenger)"


There is a gen_server which interfaces the intensive module with the database in which i record all paths. A snippet of where it starts the spread_directory work is shown here below:


handle_cast(index_dirs,#scavenger{directory_paths = Dirs} = State)->
    {File,Folder} = case {State#scavenger.verbose,State#scavenger.verbose_to} of
                        {true,tty} ->
                            {
                            fun(TopDir,Fl)->
                                io:format(" File: ~p~n",[Fl]),
                                file_scavenger_database:insert_file(filename:basename(Fl),file,Fl,TopDir,filename:extension(Fl))
                            end,
                            fun(TopDir,Fd) ->
                                io:format(" Folder: ~p~n",[Fd]),
                                file_scavenger_database:insert_file(Fd,folder,Fd,TopDir,undefined)
                            end
                            };
                        {true,SomeFile}->
                            {
                            fun(TopDir,Fl)->
                                os:cmd("echo File: " ++ Fl ++ " >> " ++ SomeFile),
                                file_scavenger_database:insert_file(filename:basename(Fl),file,Fl,TopDir,filename:extension(Fl))
                            end,
                            fun(TopDir,Fd)->
                                os:cmd("echo Folder: " ++ Fd ++ " >> " ++ SomeFile),
                                file_scavenger_database:insert_file(Fd,folder,Fd,TopDir,undefined)
                            end
                            }                       
                    end,
    Main = fun(Dir) ->
                error_logger:info_msg("*** File scavenger Server indexing directory: ~p~n",[Dir]),
                spawn(fun() -> file_scavenger_utilities:spread_directory(Dir,Dir,File,Folder) end)
            end,
    lists:foreach(Main,Dirs),
    {noreply,State};    
handle_cast(stop, State) -> {stop, normal, State}.


More Source details can be found in the whole application.
The application entire Source and build can be found here: File_scavenger-1.0.zip.

Now, i start the application on the Server (HP Proliant G6, containing Intel processors (2 processors, each 4 cores, 2.4 GHz speed each core, 8 MB Cache size), 20 GB RAM size, 1.5 Terabytes disk space. Now, 2 of these high power machines are in our disposal. System Database should be replicated across the two. Each server runs Solaris 10, 64 bit), whose terminal now looks like this below:


bash-3.00# sh file_scavenger.sh
Starting File Scavenger System. Layer 1 on the P2P File Sharing System.....
Erlang R14B03 (erts-5.8.4) [source] [smp:8:8] [rq:8] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.4  (abort with ^G)
(file_scavenger@127.0.0.1)1>
=INFO REPORT==== 18-Aug-2011::09:36:04 ===
Starting File Scavenger Database......
=INFO REPORT==== 18-Aug-2011::09:36:04 ===
Database Successfully Started....

=INFO REPORT==== 18-Aug-2011::09:36:04 ===
Starting File Scavenger Database......
=INFO REPORT==== 18-Aug-2011::09:36:04 ===
Database Successfully Started....

=INFO REPORT==== 18-Aug-2011::09:36:04 ===
File Scavenger Server starting with default verbose settings....

(file_scavenger@127.0.0.1)1> file_scavenger_server:index_dirs().


The server starts to run and verboses to the terminal all files and folders it finds. The server is equipped with too much RAM (20 GB), and Swap space (Swap is 16 GB). However, it ran for about 18 hours and finally, the erlang Virtual machine reported this:


File: "/proc/4324/root/opt/csw/gcc4/share/locale/ja/LC_MESSAGES/gcc.mo"
 Folder: "/proc/4324/root/opt/csw/gcc4/share/locale/da"
 Folder: "/proc/4324/root/opt/csw/gcc4/share/locale/es/LC_MESSAGES"
 File: "/proc/4324/root/proc/4984/root/.thumbnails/normal/dc259e3897e8af4b379c6d956b6c1393.png"
 File: "/proc/4324/root/proc/4984/root/.thumbnails/fail/gnome-thumbnail-factory/223c19786421b7101d14075bdec46f61.png"
 File: "/proc/4324/root/opt/csw/gcc4/libexec/gcc/i386-pc-solaris2.10/4.5.1/install-tools/mkheaders"
 File: "/proc/4324/root/opt/csw/gcc4/libexec/gcc/i386-pc-solaris2.10/4.5.1/cc1plus"
 File: "/proc/4324/root/opt/csw/gcc4/lib/libsupc++.la"

Crash dump was written to: erl_crash.dump
eheap_alloc: Cannot allocate 153052320 bytes of memory (of type "heap").
Abort - core dumped
bash-3.00#



Question 1. With such a powerful server, why would the operating system fail to provide such memory to the application (it was the only application running)?

Question 2. The Erlang Emulator i start is instructed to be able to spawn as many processes as it may need. the value +P 13421779. Is Erlang VM failing to access this memory or failing to allocate it to its processes ?

Question 3. To Solaris, it sees one process: epmd, perhaps containing and starting thousands of micro threads. What configurations can i make to Solaris to be able to never stop my application however much "memory hungry" it may be? Swap space available is 16 GB, RAM 20 GB, honestly, there must be something wrong.

Question 4. Which configurations can i make to the Erlang Emulator, to avoid these heap memory crash dumps especially when all the memory it may need is available on the server? How will i run more memory consuming apps on this server if Erlang still fails to allocate such memory to a simple file system indexer (well its heavily concurrent)?

finally, all other tweaks i could do to avoid heap memory problems on such capable hardware are welcome. Thanks in advance
    
asked Sep 21, 2015 by abhimca2006
0 votes
10 views



Related Hot Questions



Government Jobs Opening


...